"The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% - 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives."
I'm really impressed they even acknowledge the possibility of failures instead of just touting all availability and reliability stats.
Also, 10 times more reliable than a hard drive is pretty good. For me, the takeaway is that I should treat this as a hard drive and NOT as a magical solution that allows me to ignore everything I have learned about systems administration.
Babysitting servers at 2AM really drives home lessons for me and the urge to simply declare those problems moot is strong--but would be a mistake.
I don't see how this is different from the risks associated with any other storage service. Just be sure to have a proper backup strategy/maintenance plan.
It's unacceptable to the poster because they don't want to have to deal with the realities of computing infrastructure. They want a turn-key, no-hassle, no-think solution.
Which is fine to want.
The EBS failure rate is 0.1-0.5% annually. That's awesome. At 0.1%, the disk is likely (mean time to failure) to fail in 500 years on average. At 0.5%, it is likely to fail in 100 years on average. At either point, I'm more than comfortable. I can snapshot the drive daily/weekly/monthly and when you combine the snapshots as a backup with the likelyhood of failure in any given year being so low, I can rest easy.
Compare that with commercial drives which are likely to fail in a decade give or take. . . Well, I know where I'd rather have my data.
Getting back to the original poster, (s)he wants a system that takes care of the snapshots without intervention. So, package up EBS along with auto-snapshotting with an SLA saying that you won't loose more than a day of data and you have a business. You charge a premium because unlike EBS, you're reliable. All the while, you are just EBS with S3 - something that the individual could do themselves. EBS is unlikely to ever fail for your clients since I'm guessing no one is going to have a client for 100 years and even if that happens, you just restore one of the daily snapshots. Nice! You've lived up to your SLA while getting paid!
Think about it, the backups aren't where the cost is. 1000 PUT requests with 4MB chunks means 4GB is backed up for a penny there in terms of the number of requests. Data transfer from EC2 to S3 is free. So, those aren't the costs. As long as you can consolidate the delta backups so you aren't storing a version for every day since inception, I don't see why this service couldn't be offered for 2x the cost of EBS.
I realize there is no 0% failure, 100% guaranteed storage.
However, saying "yep, your storage space is now dead and you have lost all your files stored there" is not something I feel comfortable with. Why not offer something more resilient?
What magic storage device have you been using that has a failure rate better than the industry average for disk drives? If you are really concerned about this issue you can just run a software RAID over these raw block devices.
Stacking would be the wrong way to go about it. With ZFS, you could still get 99.99999...% but with only a slight percentage increase in necessary storage space.
I don't think you read closely enough: they define the failure rate as describing the likelihood of complete loss of the volume, not minor data corruption.
Unless you're mirroring your data across multiple drives, there is no way ZFS can magically recreate a volume that simply ceases to exist. Think of it this way: how would ZFS help you in your own server room if a non-mirrored disk caught fire and melted? Answer: it wouldn't. You'd go to backups, like you would with any other filesystem.
It is completely unacceptable for a normal FS. Something a bit more advanced like ZFS may be able to cope. What I am wondering about is why amazon didn't implement such a solution on a massive scale themselves and then run EBS on top of it.
Because some users don't need reliability beyond what a normal hard drive offers, and they really shouldn't be compelled to pay for it. If you need reliability on these devices you can run software RAID or ZFS over the raw block devices being offered and tune the cost/reliability equation in whatever manner makes the most sense for your application.
Sorry, this is unacceptable.