Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anonymous mappings are backed by swap and may be overcommitted, it's still possible to catch signals in a wide variety of circumstances

There is probably enough evidence in this thread to use it as a reference for why typical apps should avoid mmap whenever possible -- it's clear almost nobody fully understands it



> Anonymous mappings are backed by swap and may be overcommitted, it's still possible to catch signals in a wide variety of circumstances

Anonymous mappings won't cause signals, they'll trigger the OOM killer. Remember that malloc() is just a fancy wrapper for mmap() (and sbrk()).


If a process was swapped out and a fault fails to bring a page back due to an IO error, you can at least catch (I think) SIGBUS. But this just reinforces the point: nobody really understands virtual memory, even people like us that think they do


So should we extend your conclusion above to the following?

"There is probably enough evidence in this thread to use it as a reference for why typical apps should avoid virtual memory whenever possible -- it's clear almost nobody fully understands it"

I'd suggest that is ludicrous, and for the same reason your original conclusion is also excessive.


It is not constructive to form a sweeping generalization from a statement and then claim the sweeping generalization is ludicrous, implying the original statement is ludicrous. :)

My first comment was in reply to one claiming anonymous memory did not have the same problems as file-backed memory, indicating the parent did not understand they are the same thing. The subsequent reply was to another comment continuing to claim anonymous memory was somehow safer, both instances supporting the notion that most people in this thread don't seem to understand mmap at all.

What we're examining is a powerful (and consequently hazardous) OS feature that often provides only marginal performance improvement, yet introduces many exotic error paths into a program that have their own exotic problems (memory access in thread A can raise SEGV in thread B, async-signal safety), that 7 hours' commenting has not been sufficient to fully capture. This thread is full of upvoted miscomprehension, bad advice (spawn a child to deal with SEGV!?), obviously incorrect solutions (signalfd), and yet still manages to completely omit some critical characteristics of mmap, for example, that faults take a VM-global semaphore -- mmap can easily destroy multithreaded app performance in a way read() is immune to, because nobody expects file IO in one thread to cause malloc() latency in another.

If this isn't evidence for "avoid this feature wherever possible", I really don't know what is.


As if "avoid virtual memory" is substantially more of a "sweeping generalization" than "avoid mmap."

If you apply the same reasoning that you've used to conclude that everyone should avoid mmap, than you are led directly to the conclusion that everyone should avoid virtual memory.

The "same problems" that you are pointing out are possible with anonymous memory aren't unique to memory you get directly from mmap, they also apply to all memory, period. mmap'ed anonymous memory might have th same problems as file-backed memory, but those are the same problems that .text and .bss have.

The "powerful (and consequently hazardous) OS feature" here isn't mmap, it's literally virtual memory. At the moment you concede that memory may be backed by something besides physical memory at any point in time, you get the possibility of all those "exotic error paths."


This is spot on.

The exotic error paths are always there, but you don't always need to handle them. You can push some error handling to other systems, such as clients and supervisors / orchestrators. The reason why mmap with files is tractable is because we have a good error handling strategy (remap with zeroes, mark error) and we have a few understandable reasons why we might expect the error (IO errors, lost media/network failures, or even truncate). In general the problem that IO is done outside of direct syscalls like write() can be difficult even when you're not using mmap, like when Postgres was losing errors when calling fsync.

But when you have an IO error in your swap file, go ahead, eat the SIGBUS and die. This is fine.


Let's try and simplify things here: the post we're both currently commenting on relates to doing file IO via nmap. Of course 'avoiding all use of virtual memory' is ridiculous, but nobody except you is suggesting that, and you continue to suggest it even after a long reply.

The "powerful (and consequently hazardous) OS feature" here is using mmap for general file IO, it:

- introduces resources leaks many developers can't profile

- introduces VM bottlenecks 99% of developers can't profile

- introduces random segfaults delivered to arbitrary threads in the running process, leading to crashes many developers can't diagnose

- the mmap() interface itself is fundamentally unsafe in that it allows partially overwriting random bits of VM (MAP_FIXED) with file views, and worse still, allows those mappings to be read-write

Once again, nobody has ever suggested avoiding virtual memory except you -- once again, that is impossible in a modern environment, but it is more than possible, and ultimately incredibly sensible, not to mention entirely on topic with regards to this thread and the article it is attached to, to suggest avoiding use of mmap for general file IO


- What resource leaks are introduced by mmap?

- Nobody ever said mmap was always faster than the alternative. If you care about performance then you should do whole-application performance testing with and without features enabled (like mmap IO). This is not unusual, there are plenty of aspects of performance that are counter-intuitive, where speeding up one part of your program causes a seemingly unrelated part of your part of your program to slow down.

- The signals are SIGBUS, and they can be intercepted, mapped with zeroes, and the errors can be propagated back to your app code later. This is not trivial but neither is it outrageous.

- You can overwrite arbitrary memory with read(), too, you just have to pass it a pointer to something you want to overwrite. mmap() is not any less safe. Recall that typical use of MAP_FIXED is so you can overwrite an existing mmap() region with something else, not so you can nuke random parts of your address space.

Keep in mind that you are, if nothing else, an indirect user of mmap(). The question is whether using mmap() directly is advantageous for your applicaiton. "Yes" is not an unreasonable nor outrageous answer for some applications.


> here is using mmap for general file IO

> nobody has ever suggested avoiding virtual memory except you

Do you know what the "anonymous" in "anonymous mapping" means? You are the one that started asserting that anonymous mapping from mmap have the same difficulties as mapping of normal files and therefore too dangerous to use.

https://news.ycombinator.com/item?id=19807322


I hate to do this, but:

> paragraph, n.: a distinct section of a piece of writing, usually dealing with a single theme and indicated by a new line, indentation, or numbering

In the original comment you will find two of these, the former correcting an error in the parent comment, the latter making an observation based on the obvious brainwrong riddled throughout this thread

I'm done replying, you're of course free to continue checking in hazardous and suspect file IO code, as the rest of us are free to giggle at such things before ripping them out


> nobody except you is suggesting [avoiding all use of virtual memory]

https://news.ycombinator.com/item?id=19807322

> Anonymous mappings

> typical apps should avoid mmap whenever possible

All virtual memory is either a user-mode wrapper around mmap, or sbrk (which is functionally a kernel-mode wrapper around mmap).


This simply won't die, will it? I mean, while we're at it, let's advocate abandonment of all higher level languages because essentially they all boil down to machine code, and nobody could recommend working directly with machine code any more, could they.

(But that would be a sweeping generalization)


> This simply won't die, will it?

Please clarify how you think that statement is incorrect. As far as I'm concerned, it won't die because that's how virtual memory works, and you have something going on in your head that is either wrong or massive hairsplitting.

I would have expected a better understanding from someone bloviating about how the try of the commenters are too thick to understand mmap and virtual memory.


Okay, but I'd say the correct thing to do is let SIGBUS kill the process. You can at least expect to handle SIGBUS for a file you've mapped.


> Anonymous mappings are backed by swap and may be overcommitted,

So is normal memory. Many allocators today even use mmap internally.


All* allocation is mmap, really. All mmap does is dedicate a region of address space to some kind of backing storage. The particular kind of backing storage makes all the difference. The problem is that people colloquially use "mmap" to mean "mmap of a conventional disk file" and don't mean all the other kinds of mmap out there, so discussions can become confusing.

* Ther's sbrk too, but it's just a fancy legacy path that amounts to the same thing as anonymous mmap




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: