Is there anywhere I can read more about Tarsnap's features/architecture? You're ...

cperciva · on April 18, 2009

The tarsnap website links to several of my blog posts about tarsnap, but this one probably has the most information of interest to you: http://www.daemonology.net/blog/2008-11-10-tarsnap-public-be...

For example, what advantages does tarsnap have over some bash scripts I write in a few hours that give me off-site encrypted backups with the help of GPG and rsync?

It's hard to say without knowing exactly how your scripts work, but I'd guess that one big advantage tarsnap has is that it works with a snapshotted model of backups.

smanek · on April 18, 2009

Thanks, that answers my questions. Your snapshot system reminds of a reference counting GC - neat trick.

cperciva · on April 18, 2009

As it happens, tarsnap's snapshots work via reference counting -- fortunately, it works better for tarsnap than it does for garbage collection. (Reference counting breaks if you have circular references; this is a problem for garbage collection, but not for tarsnap.)

eru · on April 18, 2009

Why didn't you go for a 'real' GC?

cperciva · on April 18, 2009

I have no clue what your question is trying to ask. Can you clarify?

eru · on April 18, 2009

Why did you choose to do reference counting instead of more sophisticated techniques?

I can imagine that ref. counting is much easier to implement and the drawbacks are unimportant in your domain. However I'd still like to read about the reasons for your decision, since you will have thought about that issue much longer and clearer.

But I guess I should take a deeper look at http://www.daemonology.net/blog/2008-11-10-tarsnap-public-be... before writing anything..

cperciva · on April 18, 2009

Oh, now I understand. Methods such as reachability analysis require reading lots of memory locations; but for tarsnap, "memory locations" are blocks of data stored remotely, so this gets expensive (and slow) very quickly. With reference counting, the counts can be stored locally and no extraneous data needs to bs transferred to or from the server.

pmjordan · on April 18, 2009

I guess there's a precedent there in the fact that unix file systems use reference counting for file links, presumably for the same reasons. Considering how well-architected tarsnap seems to be, I suspect you've taken steps to avoid losing data via inconsistencies in the reference counter?

cperciva · on April 18, 2009

... you've taken steps to avoid losing data via inconsistencies in the reference counter?

There are some sanity checks built in, but in the extreme case what you're suggesting is impossible. Reference counts are managed on the client side, and the client has the keys necessary to delete blocks from the server; if the client is functioning correctly, it won't get the reference counts wrong, but if the client is malfunctioning then it could go berserk and delete blocks without even looking at the reference counts.

I have taken care of the obvious issues, though -- as long as the OS implements fsync() properly, there's no way that tarsnap or the client system crashing will result in corruption.