Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there anywhere I can read more about Tarsnap's features/architecture? You're obviously a bright guy, and you say you've been working on it for over 2 years, so I'm just curious about what exactly you built ...

For example, what advantages does tarsnap have over some bash scripts I write in a few hours that give me off-site encrypted backups with the help of GPG and rsync? (That's pretty close to the system I use now). I'm sure there are some, but I just don't see them enumerated tarsnap.com ...



The tarsnap website links to several of my blog posts about tarsnap, but this one probably has the most information of interest to you: http://www.daemonology.net/blog/2008-11-10-tarsnap-public-be...

For example, what advantages does tarsnap have over some bash scripts I write in a few hours that give me off-site encrypted backups with the help of GPG and rsync?

It's hard to say without knowing exactly how your scripts work, but I'd guess that one big advantage tarsnap has is that it works with a snapshotted model of backups.


Thanks, that answers my questions. Your snapshot system reminds of a reference counting GC - neat trick.


As it happens, tarsnap's snapshots work via reference counting -- fortunately, it works better for tarsnap than it does for garbage collection. (Reference counting breaks if you have circular references; this is a problem for garbage collection, but not for tarsnap.)


Why didn't you go for a 'real' GC?


I have no clue what your question is trying to ask. Can you clarify?


Why did you choose to do reference counting instead of more sophisticated techniques?

I can imagine that ref. counting is much easier to implement and the drawbacks are unimportant in your domain. However I'd still like to read about the reasons for your decision, since you will have thought about that issue much longer and clearer.

But I guess I should take a deeper look at http://www.daemonology.net/blog/2008-11-10-tarsnap-public-be... before writing anything..


Oh, now I understand. Methods such as reachability analysis require reading lots of memory locations; but for tarsnap, "memory locations" are blocks of data stored remotely, so this gets expensive (and slow) very quickly. With reference counting, the counts can be stored locally and no extraneous data needs to bs transferred to or from the server.


I guess there's a precedent there in the fact that unix file systems use reference counting for file links, presumably for the same reasons. Considering how well-architected tarsnap seems to be, I suspect you've taken steps to avoid losing data via inconsistencies in the reference counter?


... you've taken steps to avoid losing data via inconsistencies in the reference counter?

There are some sanity checks built in, but in the extreme case what you're suggesting is impossible. Reference counts are managed on the client side, and the client has the keys necessary to delete blocks from the server; if the client is functioning correctly, it won't get the reference counts wrong, but if the client is malfunctioning then it could go berserk and delete blocks without even looking at the reference counts.

I have taken care of the obvious issues, though -- as long as the OS implements fsync() properly, there's no way that tarsnap or the client system crashing will result in corruption.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: