A lot of frameworks that use variants of "mark and sweep" garbage collection ins...

pjc50 · 2026-03-27T13:30:30 1774618230

Reference counting in multithreaded systems is much more expensive than it sounds because of the synchronization overhead. I don't see it coming back. I don't think it saves massive amounts of memory, either, especially given my observation with vmmap upthread that in many cases the code itself is a dominant part of the (virtual) memory usage.

zozbot234 · 2026-03-27T13:45:48 1774619148

If you use an ownership/lifetime system under the hood you only pay that synchronization overhead when ownership truly changes, i.e. when a reference is added or removed that might actually impact the object's lifecycle. That's a rare case with most uses of reference counting; most of the time you're creating a "sub"-reference and its lifetime is strictly bounded by some existing owning reference.

cogman10 · 2026-03-27T15:47:27 1774626447

There are 2 unavoidable atomic updates for RC, the allocation and the free event. That alone will significantly increase the amount of traffic per thread back to main memory.

A lifetime system could possibly eliminate those, but it'd be hard to add to the JVM at this point. The JVM sort of has it in terms of escape analysis, but that's notoriously easy to defeat with pretty typical java code.

ridiculous_fish · 2026-03-27T20:27:00 1774643220

Why would an allocation require an atomic write for a reference count?

Swift routinely optimizes out reference count traffic.

cogman10 · 2026-03-27T22:38:34 1774651114

> Why would an allocation require an atomic write for a reference count?

It won't always require it, but it usually will because you have to ensure the memory containing the reference count is correctly set before handing off a pointer to the item. This has to be done almost first thing in the construction of the item.

It's not impossible that a smart compiler could see and remove that initialization and destruction if it can determine that the item never escapes the current scope. But if it does escape it by, for example, being added to a list or returned from a function, then those two atomic writes are required.

adrian_b · 2026-03-27T14:23:55 1774621435

Incrementing or decrementing a shared counter is done with an atomic instruction, not with a locked critical section.

This has negligible overhead in most cases. For instance, if the shared counter is already in some cache memory the overhead is smaller than a normal non-atomic access to the main memory. The intrinsic overhead of an atomic instruction is typically about the same as that of a simple memory access to data that is stored in the L3 cache memory, e.g. of the order of 10 nanoseconds at most.

Moreover, many memory allocators use separate per-core memory heaps, so they avoid any accesses to shared memory that need atomic instructions or locking, except in the rare occasions when they interact with the operating system.

usrnm · 2026-03-27T14:36:14 1774622174

Atomic operations, especially RMW operations are very expensive, though. Not as expensive as a syscall, of course, but still a lot more expensive than non-atomic ones. Exactly because they break things like caches

cogman10 · 2026-03-27T15:27:03 1774625223

Not only that, they write back to main memory. There's limited bandwidth between the CPU and main memory and with multithreading you are looking at pretty significantly increasing the amount of data transferred between the CPU and memory.

This is such a problem that the JVM gives threads their own allocation pools to write to before flushing back to the main heap. All to reduce the number of atomic writes to the pointer tracking memory in the heap.

gwbas1c · 2026-03-27T16:46:07 1774629967

That's why Rust has Rc<> for single-threaded structs, and Arc<> for thread-safe structs.

vaylian · 2026-03-27T13:24:18 1774617858

Unlikely. Maybe I'm overly optimistic, but I think it's fairly likely that the RAM situation will have sorted itself out in a few years. Adding reference counting to the JVM and .NET would also take considerable time.

It makes more sense for application developers to think about the unnecessary complexity that they add to software.

xyzzy_plugh · 2026-03-27T13:25:23 1774617923

That's not strictly true. Mark and sweep is tunable in ways ARC is not. You can increase frequency, reducing memory at the cost of increased compute, for example.

cogman10 · 2026-03-27T15:38:59 1774625939

M&S also doesn't necessitate having a moving and compacting GC. That's the thing that actually makes the JVM's heap greedy.

Go also does M&S and yet uses less memory. Why? Because go isn't compacting, it's instead calling malloc and free based on the results of each GC. This means that go has slower allocation and a bigger risk of memory fragmentation, but also it keeps the go memory usage reduced compared to the JVM.

astrange · 2026-03-28T20:04:21 1774728261

Compacting reduces memory usage - that's why it's called compacting.

The JVM uses a lot of memory a) because it's tuned for servers and not for low memory usage and b) because Java is a poorly designed language without value types.

cogman10 · 2026-03-29T15:48:33 1774799313

> Compacting reduces memory usage

No, it reduces memory fragmentation, which is why it's called compacting and not compression.

I do agree that the lack of value types is a big contributor to why Java uses so much memory. But it's not a server tuning thing that makes the JVM lean memory heavy.

The JVM uses moving collectors and that is the big reason why it prefers having so much memory available. Requesting and freeing memory blocks from the OS is an expensive operation which the JVM avoids by grabbing very large blocks of memory all at once. If you have a JVM with 75% old gen and 25% new gen, half that new gen will always be empty because the JVM during collection moves live data from one side of the new gen to the next. And while it does that, it slowly fills up old gen with data.

Even more modern collectors like G1 prefer a large set of empty space because it's moving portions of old gen to empty regions while it does young collection.

As I mentioned, the difference here between the JVM and python or go is that python and go do no moving. They rely heavily on the malloc implementation to handle grabbing right sized blocks from the OS and combating memory fragmentation. But, because they aren't doing any sort of moving, they can get away with having more "right sized" heaps.

astrange · 2026-03-29T16:20:48 1774801248

> No, it reduces memory fragmentation, which is why it's called compacting and not compression.

…which reduces memory usage because you don't have to waste it on free holes in the allocated pages.

cogman10 · 2026-03-29T16:40:03 1774802403

But then increases memory usage because you need more pages allocated to move memory to.

It's an allocation and cache optimization more than a memory saving optimization.