Introduction to Memory Management (2018)

mattnewport · on March 11, 2019

From what I've skimmed so far this isn't a very good or accurate resource. In particular it doesn't mention RAII in its discussion of C++, doesn't mention Rust at all, neglects the role of stack allocation, glosses over the common use of object pooling in GC languages as a workaround for performance issues, has little on the extra issues introduced by multithreaded code and understates the performance issues of GC.

pcwalton · on March 11, 2019

It's actually a great resource. The glossary covers all these issues, except for Rust, which the site content predates. It tends to use more academic terminology instead of the current industry terminology, but the academic terminology is often more precise anyway. Yes, it's opinionated in favor of GC, but that's a perfectly defensible position (which may sound odd coming from me as a Rust developer, but I actually like GC in many—just not all—circumstances).

RAII is covered under "smart pointer": https://www.memorymanagement.org/glossary/s.html#term-smart-...

Stack allocation is here: https://www.memorymanagement.org/glossary/s.html#term-stack-...

Object pooling is covered under "segregated free list": https://www.memorymanagement.org/glossary/s.html#term-segreg...

Multithreaded GC is covered under "parallel garbage collection": https://www.memorymanagement.org/glossary/p.html#term-parall...

mattnewport · on March 11, 2019

I disagree. A few short more or less related entries in the glossary is not really "covering". A paragraph on smart pointers that conflates the concept with reference counting is certainly not covering RAII. None of these adequately address the deficiencies I mentioned.

clappski · on March 11, 2019

That paragraph on smart pointers doesn’t have anything to do with RAII - it’s not even accurate concerning C++ smart pointers, where the standard ‘has a single owner’ smart pointer doesn’t use ref counting, it uses RAII.

adrianratnapala · on March 11, 2019

Probably the resource is older than 2018, and is taking aim shared_ptr (or maybe even auto_ptr?) rather than unique_ptr.

mattnewport · on March 11, 2019

Both unique_ptr and shared_ptr have been standard since C++11 which is also when auto_ptr was deprecated (auto_ptr is/was also not reference counted).

lrsjng · on March 11, 2019

do you know some good resources on the things you mentioned?

mattnewport · on March 11, 2019

I don't know of one central resource that covers all of these topics. There are good articles on jemalloc and tcmalloc that discuss modern allocation strategies for general purpose allocators. There are articles from games development that discuss object pooling both in the context of C++ and in the context of optimizing GC languages like C# and Java as well as other specialized strategies like bump allocators, frame allocators, etc.

Resources on learning modern C++ cover RAII and resources on learning Rust cover its approach. There are good talks on the fairly new polymorphic allocator model for C++ that cover the motivations for it and use cases. Experience with modern C++ is valuable for understanding how much you can do with mostly stack allocation and how you can write non library code without ever writing a raw new or delete.

pcwalton · on March 11, 2019

I don't think this is from 2018; the contents are mostly unchanged since 2014 [1].

Even though it's a bit dated, memorymanagement.org is one of my favorite resources. It's opinionated, in a good way. The techniques that were developed in the 2000s, which this site thoroughly covers, are still the state of the art for garbage collection in my opinion.

[1]: http://web.archive.org/web/20140616011030/https://www.memory...

keldaris · on March 11, 2019

Even though many of the technical points are highly debatable, my biggest issue is with the sentence "the programmer is freed to work on the actual problem" listed among the "advantages" of automatic memory management, and with how ubiquitous that mindset has become.

Insofar as computer programming is concerned, the "actual problem" is always the transformation of specific data on a specific set of computing hardware. Automating memory management doesn't imply working on the actual problem, it's just outsourcing a part of the actual problem in the pursuit of other objectives - decreasing development time, reducing the likelihood of specific bugs, etc. I'm not arguing that people shouldn't use automatic memory management, it often makes perfectly good technical and business sense to do so - but pretending that memory management is not a crucial part of solving any real problem in computer programming (even if you're not doing it explicitly) does no good to anyone.

marvy · on March 11, 2019

I think maybe your definition of "actual problem" is not the same as their definition. Let me try to defend their definition with an analogy.

If I write in C, I don't have to worry about register allocation (which variable goes in which register, when should I spill to the stack, etc.). I can just write things like (a+b+c)/3 and it "just works". Perhaps the compiler will do a bad job with allocating the registers, and a good assembly programmer would have done far better, even to the extent that the program goes from "way too slow" to "plenty fast". (Unlikely these days, but in the past knowing when to use the "register" keyword was worth knowing.) But even then register allocation is incidental complexity, in that they are an artifact of the solution, not of the problem itself.

But even though C programmers don't have to worry which registers are callee-save vs caller-save every time they call of function, they do have to worry about who is supposed to free every allocated data structure. Java programmers don't; they can just write

    ("1+2="+3).getBytes()

and it "just works". The garbage collector may be so inefficient that you wish it weren't there, but even then, deciding exactly when to deallocate every temporary string is incidental complexity, in that they are an artifact of the solution, not of the problem itself.

Of course, just because the problem is incidental doesn't mean it need not be solved! You really do need to free those strings or else you'll run out of memory and crash. (but not too soon, or else you have a use-after-free problem.) And likewise, at the end of the day, the values better end up in the right registers. But whether you're writing a game or web app or an FPGA simulator, memory management is not the "actual problem" you care about solving. The proof is that you don't think about when you're writing the user guide to your software.

keldaris · on March 12, 2019

Well, that's exactly the point - I was arguing against the (by now ubiquitous) definition of the "actual problem". Your analogy further illustrates exactly that.

The actual problem (as in, the thing you are actually solving whether you realize it or not) of computer programming is not creating a product (that's business) or even data transformation (that's applied math). All computer programming is ultimately the transformation of data on real hardware (however abstracted by everything from firmware to virtual machines). Obviously, computer programmers don't exist in a vacuum and have to take into account technical (including mathematical) and social (including business) concerns, make tradeoffs, etc., but that doesn't change the basic facts. I'm just arguing we should be more explicit about them, not write them off.

And to address your analogy directly, register allocation is as much part of the actual problem as memory management. Even though most of us rarely resort to actually doing it by hand, it's still there and examining it in the resulting assembly is still often an important part of optimizing code. My argument is that regardless of whether you directly encounter something like register allocation every day, it is both conceptually important and useful to recognize it as part of what computer programming actually is, the technical and social properties that lead to the degree to which it is automated and the associated tradeoffs. Pretending that the actual problem of getting computers to do what we want is some annoying series of "incidental" problems is a recipe for bad software.

marvy · on March 13, 2019

So, I guess what you're saying (correct me if I'm wrong) is that even though memory management (and register allocation, and other things) can be classified as "incidental", this is a dangerous label, because people are prone to then think "and therefore I can outsource these problems to a GC/compiler/whatever and not need to think or worry about them anymore" and then wonder why performance is so bad.

If that's your point then I agree, but I still think it's worth at least making a vague mental note that the problem is still incidental (and then carefully avoid the above conclusion).

mattnewport · on March 13, 2019

Not the OP but I think you're still missing his point. The argument is that it is fundamentally wrong headed to argue that the process of realizing software on actual hardware is incidental or not part of the actual problem in the same way that it would be wrong to argue that the laws of physics are not part of the actual problem for engineering.

If you're building a road bridge you could say that the actual problem is getting vehicles from one side of the river to the other and not the engineering details required to realize a physical structure capable of doing that. Things like the tensile strength of steel would be no more important to the bridge builders than mere memory allocation is to the average programmer in a GC'd language. You could also argue that taking that approach would lead to bridges that fail as badly at their primary function as much modern software does.

keldaris · on March 13, 2019

That's exactly right, thank you. It's a fundamentally simple point often made by embedded and game programmers (perhaps most famously in recent years in Mike Acton's keynote at CppCon [1]), but most programmers seem to have completely written off most of what computer programming actually is as "incidental".

[1] https://www.youtube.com/watch?v=rX0ItVEVjHc

marvy · on March 14, 2019

Hmm... maybe I better watch this talk at some point and try to get this through my head better... When I saw the bridge example my reaction was basically "why would that result in bad bridges?"

I feel slightly attacked when you say "written off". I don't mean to "write off" memory management as something undeserving of attention. I think that if performance matters at all, then even if you program in a GC'd language you better know roughly how your GC works and what it's good and bad at, and preferably have a plan for switching to a different GC if need be.

More generally, "incidental" stuff deserves attention. It may very well deserve the MAJORITY of your attention. I don't want to "write it off". I just want to classify it properly. Am I increasing my own risk of creating bad software by thinking this way?

keldaris · on March 16, 2019

Well, there are two at least partly orthogonal points here - the teleological discussion over what computer programming actually is and the practical point regarding which mindset is more likely to lead to better software. I don't know that I can express my position on the former issue better than I already have in the form of a short comment suitable for this format (though I do recommend the talk I linked), so I'll leave that aside for now.

On the latter point, however - and to answer your question regarding bad software - all other factors being equal, I think the answer is "probably, yes". The fundamental reason is that someone who is used to regarding memory management (which is really just one example among many) as incidental complexity to be outsourced away by default is less likely to reason carefully and intelligently about the actual implications of that decision. Just to pick a trivial example, someone who has internalized the RAII pattern in C++ without really understanding its implications may find it difficult to solve (or even find) memory fragmentation issues leading to bad performance - not something easily found in current profilers if you don't know what you're looking for already. There's a lot of issues like that which may seem either obvious or subtle depending on whether your mindset is to properly understand how the solution you've chosen actually works or regard it as incidental until proven otherwise.

In the modern computer programming industry you can go a long way without really dealing with any of the issues I've mentioned. However, the fact that much of modern software is unbelievably bloated and slow for no good technical reason is indicative that perhaps it would be better if more programmers took direct responsibility not just for the code they actually write, but for the code they outsource to others, often without truly grappling with the implications of those decisions.

marvy · on March 19, 2019

> perhaps it would be better if more programmers took direct responsibility not just for the code they actually write, but for the code they outsource to others, often without truly grappling with the implications of those decisions.

Now THERE I agree completely! (Will watch the talk next weekend I hope.)

marvy · on March 15, 2019

before I forget, thank you both for the discussion; reminded me why I signed up for HN :)

mattnewport · on March 11, 2019

I don't think register allocation is a great analogy since it is largely a "solved problem" - there are well known techniques that allow compilers to do as good or better job of it than a programmer in most situations. This is not yet true for memory management.

You give "writing a game" as an example where memory management is not the actual problem. Games however often care a lot about performance and achieving good performance is part of the actual problem you're solving. For many games projects on current hardware you won't be able to solve the performance problem without thinking about memory allocation and blindly relying on the garbage collector will result in a game that doesn't solve the actual problem of performing adequately on the target hardware.

marvy · on March 13, 2019

Sure, register allocation is mostly solved and memory management is far from solved; thus memory management gets far more of the game programmer's attention (or else your game will play at two frames per second, plus occasional longer pauses).

But the analogy is otherwise fine: reviews of your game will mention the performance (because it matters), but they will not mention your memory management strategy (because the don't care and may not even know). They will mention the rules, the mechanics, the story, the graphics and music and sound effects. Those are all part of the actual problem.

Memory management is an implementation detail. Don't get me wrong: this implementation detail is so vital that it may very well make sense to structure the entire code base around it, but that doesn't mean it's part of the "actual" problem, at least if you use my definition.

(See also my other reply in this thread.)

pizlonator · on March 10, 2019

I get that it’s well intentioned but I wish folks stopped talking about the buddy allocator. It’s so obsolete. It’s not the fastest and it sure has some of the worst fragmentation due to very imprecise sizing.

wuxb · on March 10, 2019

I wish my professors talked more about their opinions on these textbook techniques, especially like you said that some are not that great so it belongs to a history lesson rather than something that you should get your hands on.

pcwalton · on March 11, 2019

I agree that buddy allocators are bad, though they're charmingly simple.

In any case, the glossary is up front about their disadvantages: https://www.memorymanagement.org/glossary/b.html#term-buddy-...

twtw · on March 11, 2019

Can something be obsolete if it is still extremely widely used?

IIRC, Linux and FreeBSD both use a buddy allocator for physical memory, and I'm pretty sure it is used (in addition to other allocators) in jemalloc as well.

pcwalton · on March 11, 2019

I don't think jemalloc uses any buddy allocators. It's a segregated free list allocator, like most C mallocs and tenured generation allocators.

anitil · on March 11, 2019

When reading the language section [0] I was surprised to see that COBOL had no heap allocation until 2002. Searching around, the only comments I could find amounted to "We just didn't need it.". Can anyone fill me in?

[0]: https://www.memorymanagement.org/mmref/lang.html

vajrabum · on March 11, 2019

Historically, there weren't recursive procedures in Cobol and no concept of new(). Instead all the data was statically initialized at application start. The compiler did the storage allocation.

adrianratnapala · on March 11, 2019

What needs filling in? It didn't have a heap. It's not like you really need one.

dataking · on March 11, 2019

Expected to see Rust's approach mentioned since it occupies an interesting point in the design space, i.e., automatic memory management where lifetimes are (mostly) determined at compile-time vs. at run-time (RC and/or GC).