I think it's crazy that database updates are still acceptable in the design of s...

Someone1234 · on Sept 1, 2020

Updates are a cost compromise.

Each record is stored in 2 * [# of nodes] * 3 + [backups] places. In our case that's 15 places before we talk actual backups (online, offline, and cold storage).

I'm actually a huge fan of immutable records conceptually, but I am unable to make a business case for it outside the most vital tables, that need amazing audit histories. The costs are just too high, and a lot of data not valuable enough.

I'll acknowledge that if engineering time was lower it would be an easier sell, but in large databases you're still talking about at least a quadrupling of space through every layer: What's the dollars and cents argument for that?

josephg · on Sept 1, 2020

Most software isn’t FAANG. In the long tail of database size, I’d guess 95%+ of databases are smaller than a few hundred megs. And at that size, storage is cheap and storing a full historical record of changes should absolutely be the default.

Rich Hickey has a great talk on this a few years ago talking about the difference between Place and Value. He says accountants learned this lesson hundreds of years ago with bookkeeping. Users just don’t generate much data relative to the size of our disks. Well worth a watch, especially if you disagree:

https://youtu.be/-6BsiVyC1kM

ascar · on Sept 1, 2020

> Most software isn’t FAANG. In the long tail of database size, I’d guess 95%+ of databases are smaller than a few hundred megs.

I think that statement is plain and simply untrue for most databases that generate enough value to pay for a developer working on it. I would bet it can be reversed to at least 95% of business relevant databases are larger than a few hundred mb.

I manage a small 200 player old-school browser game for a few years and our database creates a few hundred megabyte per month. A major source of this data is a user trace-log ('what action they performed when' compact as a timestamp and 2 integers), which we clear on every new round (about 2 months). Keeping every update (instead of the small trace log) would easily scale to gigabytes per month. And we're talking 200 users. I also worked for a small 150 employee web service company, where the development database snapshot was about 5gb (a heavily trimmed down and anonymised version of the production db).

Now, cold storage is cheap. But fast access database storage isn't the same as a disk. That's why we keep hot data in memory, only write to disk when necessary and backup to cheaper storage.

Lastly, a relational databases WAL is a complete historical record of all writes and reads, not just changes. It's used by default and in fact necessary to accomplish ACID and consistency guarantees. Keeping this log gives you the full history that allows to restore the full database state to any given point in time without polluting the actual data with historical records. Granted, access to this data is much harder than a simple query.

There are also many other options to keep historical data rather than keeping it in the OLTP database, like OLAP data lakes.

Someone1234 · on Sept 1, 2020

So immutable records only makes business sense in databases too small to benefit from immutable records (where you can just version the entire database, for a "few hundred megs")?

I don't understand what FAANG has to do with this. Medium or large non-tech companies commonly have business critical databases in the hundreds of gig range.

Acting like medium to large databases aren't common in business is just outright odd. The argument also doesn't address what I asked (business case for this vis-a-vis cost).

josephg · on Sept 3, 2020

> So immutable records only makes business sense in databases too small to benefit from immutable records

My claim is that discarding historical data is an optimization. Its an optimization that should be off by default and turned on when needed. Archiving and compacting history should be something you do only need to do when your database size gets out of control.

For small databases there's no reason to throw away historical records at all - and having an immutable log of records and updates should be the default.

dan-robertson · on Sept 1, 2020

GDPR requires you to delete the information you have about people under some common circumstances, if they ask you to.

andriosr · on Sept 1, 2020

Sure. Immutable doesn't mean you can't delete things:

> True. And what makes it easy is it's immutability. When you change history in Git, you rewrite it (rebase). But re-writing is only possible if you know every bit of the journey up to now. When you have actual updates, history is lost.

https://news.ycombinator.com/reply?id=24340659&goto=threads%...

AriaMinaei · on Sept 1, 2020

> GDPR requires you to delete the information you have about people...

And so does common sense.