Don't most people use C++11 atomics now? You have SeqCst, Release, Acquire, and Relaxed (with Consume deprecated due to the difficulty of implementing it). You can do loads, stores, and exchanges with each ordering type. Zig, Rust, and C all use the same orderings. I guess Java has its own memory model since it's been around a lot longer, but most people have standardized around C++'s design.
Which is a slight shame since Load-Linked/Store-Conditional is pretty cool, but I guess that's limited to ARM anyways, and now they've added extensions for CAS due to speed.
I've taken an interest in lock-free queues for ultra-low power embedded... think Cortex-m0, or even avr/pic.
Things get interesting when you're working with a cpu that lacks the ldrex/strem assembly instructions that makes this all work. I think youre only options at that point are disable/enable interrupts. IF anyone has any insights into this constraint I'd love to hear it.
My impression was LL/SC had forward progress issues due to the difficulties of preventing false sharing of the locked memory reservation region. Updates into that region would keep invalidating the lock.
I had a version of atomic* reference counting that used LL/SC on a ppc mac mini along side x86 versions using cmpxchg16b. Code used to be sourceforge before it went to the dark side.
* Std::shared_ptr and Rust ARC aren't actually atomic. You have to own a reference to do a copy. The are what POSIX calls thread-safe. With atomic reference counting, if you copy a reference, you either get a valid reference or null. Like Java references.
Really? Pretty much all atomics i’ve used have load,
store of various integer sizes. I wrote a ring buffer in Go that’s very similar to the final design here using similar atomics.
They generally map directly to concepts in the CPU architecture. On many architectures, load/store instructions are already guaranteed to be atomic as long as the address is properly aligned, so atomic load/store is just a load/store. Non-relaxed ordering may emit a variant load/store instruction or a separate barrier instruction. Compare-exchange will usually emit a compare and swap, or load-linked/store-conditional sequence. Things like atomic add/subtract often map to single instructions, or might be implemented as a compare-exchange in a loop.
The exact syntax and naming will of course differ, but any language that exposes low-level atomics at all is going to provide a pretty similar set of operations.
Right, that's what the various memory ordering constants are for in C++ atomics, and other languages will likely have an equivalent. On such architectures, those will emit special instructions or barriers.