> There's not enough redundancy. You could raid1 those NVME's when before they g...

> There's not enough redundancy. You could raid1 those NVME's when before they get attached to a VM and that helps with hardware failures, but you get less of them to attach. Even if you RAID them, there's not a good way to move that VM to another host if there's a RAM or CPU or other hardware issue on that host.

The trick is building a block storage system that treats the local disk as write-back cache with async replication to networked storage. Like the blog post says they'll be doing.

The async replication has some integrity/recovery concerns for sure, but it the trick that enables local speeds. And people have been happy with async replication for their database for a very long time. Just need good observability for the durability delay.

Once you have that, you can do live VM migration if you're careful enough about dirty data. The new node just starts out with an empty cache.

It's not exactly trivial, but it's also probably not the biggest challenge if you're genuinely building a brand new cloud and going to compete against the hyperscalers. (Hell, hire me and I can write it for you. It'll take time and CPU hours to get stable, but the magic required is only mildly arcane.)

For example: https://dl.acm.org/doi/10.1145/3492321.3524271