Here’s an odd ball question: why did K8s devs go with etcd and not SQLite?

vkazanov · on Jan 19, 2023

Etcd is a key-value storage based on a distributed consensus algorithm. This (sort of) means that you can have a source of cluster truth in the db and make sure it lives through certain distributed system bad scenarios.

Sqlite is a (relatively) simple in-process sql rdbms. You get a proper reliable sql on a single computer, which is practically useful for many more cases.

It is possible to build distributed dbs on top of a bunch of sqlite instances but one would need to solve these very distributed system problems that etcd solves.

In fact, some of the popular distributed dbs use etcd or zookeeper or custom implementations of the same algos o top of traditional rdmss.

K8s is a distributed system that needs to have a reliable view of its nodes. Clearly, a concensus-based db is necessary.

gigatexal · on Jan 19, 2023

So that would make etcd consistent and available (CP of the CAP)?

edit: almost -- available and partition tolerant at least in the default config:

https://github.com/cloudfoundry-attic/etcd-release/blob/mast...

kevincox · on Jan 19, 2023

CP doesn't exist.

CAP is about what happens in case of a partition (P). You can either remain available (AP) our you can remain consistent (CP).

If you don't have partitions then CAP doesn't apply. But I wouldn't recommend depending on that. For example Google's Spanner is CP, but they work really hard to make partitions rare, giving them 99.999% availability.

remram · on Jan 19, 2023

etcd is strongly consistent, it will only be available if you have a majority of nodes up and reaching each other (CP over AP). If you don't have a majority it will not be available (but won't get inconsistent).

This is more resilient than SQLite of course, which runs on a single node, and therefore can't remain available if any node (= the only node) fails.

remram · on Jan 19, 2023

There are distributions of Kubernetes that use SQLite instead of etcd, for example k3s: https://docs.k3s.io/architecture

The drawback is that it's not replicated. With etcd, you set up multiple control-plane nodes (ex: 5) and you can tolerate a minority of them being down without any effect (ex: 2/3 down). With SQLite, you can only have one control-plane node, and if it's down your control-plane is unavailable. This is fine for small clusters where you don't want to run multiple control-plane nodes, or you don't think it will go down, or you don't mind fixing it.

ddorian43 · on Jan 19, 2023

Because they are different things.