The vast majority of UB usually considered problematic has been in the standards...

gavinhoward · on Aug 4, 2024

Yes, but that does not change the fact that compilers writers have control of the standard, have had that control since probably C99, and have introduced new UB along with pushing the 00UB worldview.

dzaima · on Aug 4, 2024

What introduced UB are you thinking of? I'll admit I don't know how much has changed, but the usually-complained-about things (signed overflow, null pointer dereferencing, strict aliasing) are clearly listed as UB in some C89 draft I found.

C23's introduced stdc_trailing_zeros & co don't even UB on 0, even though baseline x86-64's equivalent instructions are literally specified to leave their destination undefined on such!

00UB is something one can argue about, but I can't think of a meaningful way to define UB that doesn't impose significant restrictions on even basic compilers, without precisely defining how UB-result values are allowed to propagate.

e.g. one might expect that 'someFloat == (float)(int8_t)someFloat' give false on an input of 1000, but guaranteeing that takes intentional effort - namely, on hardware whose int↔float conversions only operate on ≥32-bit integers (i.e. everything - x86, ARM, RISC-V), there'd need to be an explicit 8-to-32-bit sign-extend, and the most basic compiler just emitting the two f32→i32 & i32→f32 instructions would fail (but is imo pretty clearly within "ignoring the situation completely with unpredictable results" that the C89 draft contains). Sure it doesn't summon cthulhu, but it'll quite likely break things very badly anyway. (whether it'd be useful to not have UB here in the first place is a separate question)

Even for 'x+100 < x' one can imagine a similar case where the native addition & comparison instructions operate on inputs wider than int; using such for assuming-no-signed-wrap addition always works, but would mean that the comparison wouldn't detect overflow. Though here x86-64, aarch64, and RISC-V all do provide instructions for 32-bit arith, matching their int. This would be a bigger thing if it were possible to have sub-int-sized arith.

pertymcpert · on Aug 4, 2024

Which UB upsets you? Can you be specific so we can revert it?

gavinhoward · on Aug 4, 2024

All of it. But especially anything added after C89 that was not already there implicitly.

Edit: okay, not all of it. I was hyperbolic. Race conditions and data races should be UB. But anything that can be implementation-defined should be.

dzaima · on Aug 4, 2024

So your issue is not at all any specific thing or action anyone took, but just in general having UB in places not strictly necessary. And "Especially anything [different from The Golden Days]", besides being extremely cliche, is a completely arbitrary cutoff point.

A given compiler is free to define specific behavior for UB (and indeed you can add compiler flags to do that for many things); the standard explicitly acknowledges that with "Possible undefined behavior ranges from […], to behaving during translation or program execution in a documented manner characteristic of the environment".

gavinhoward · on Aug 4, 2024

Sigh...yes, I don't want any UB where it's not necessary.

But if you must have a concrete example, how about realloc?

In C89 [1] (page 155), realloc with a 0 size and a non-NULL pointer was defined as free:

> If size is zero and ptr is not a null pointer, the object it points to is freed.

In C99 [2] (page 314), that sentence was removed, making it undefined behavior when it wasn't before. This is a pure example of behavior becoming undefined when it was not before.

In C11 [3] (page 349), that sentence remains gone.

In C17 [4] (page 254), we get an interesting addition:

> If size is zero and memory for the new object is not allocated, it is implementation-defined whether the old object is deallocated. If the old object is not deallocated, its value shall be unchanged.

So the behavior switches from undefined to implementation-defined.

In C23 [5] (page 357), the wording completely changes to:

> ...or if the size is zero, the behavior is undefined.

So WG14 made it UB again after making implementation-defined.

SQLite targets C89, but people compile it with modern compilers all the time, and those modern compilers generally default to at least C99, where the behavior is UB. I don't know if SQLite uses realloc that way, but if it does, are you going to call it buggy just because the authors stick to C89 and their users use later standards?

[1]: https://web.archive.org/web/20200909074736if_/https://www.pd...

[2]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

[3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

[4]: https://web.archive.org/web/20181230041359if_/http://www.ope...

[5]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf

dzaima · on Aug 4, 2024

If SQLite wants exactly C89, it can just require -std=c89, and then people compiling it with a different standard target are to blame. This is just standard backwards incompatibility, nothing about UB (in other languages requiring specific compiler/language versions is routine). Problems would arise even if it was changed from being a defined 'free(x)' to being a defined 'printf("here's the thing you realloc(x,0)'d: %p",x)'. (whether the C standard should always be backwards compatible is a more interesting question, but is orthogonal to UB)

I do remember reading somewhere that a real platform in fact not handling size 0 properly (or having explicitly-defined behavior going against what the standard allowed?) being an argument for changing the standard requirement. It's certainly not because compiler developers had big plans for optimizing around it, given that both gcc and clang don't: https://godbolt.org/z/jjcGYsE7W. and I'm pretty sure there's no way this could amount to any optimization on non-extremely-contrived examples anyway.

I had edited one of my parent comments to mention realloc, so if we both landed on the same example, there's probably not that many significant other cases.

gavinhoward · on Aug 4, 2024

> If SQLite wants exactly C89, it can just require -std=c89, and then people compiling it with a different standard target are to blame.

Backwards compatibility? I thought that was a target for WG14.

> This is just standard backwards incompatibility, nothing about UB

But UB is insidious and can bite you with implicit compiler settings, like the default to C99 or C11.

> whether the C standard should always be backwards compatible is more interesting, but is a question orthogonal to UB

If it's a target, then it should be.

And on the contrary, UB is not orthogonal to backwards compatibility.

Any UB could have been made implementation-defined and still be backwards compatible. But it's backwards-incompatible to make anything UB that wasn't UB. These count as examples of WG14 screwing over its users.

> I do remember some mention somewhere of a real platform in fact not handling size 0 properly being an argument for reducing the standard requirement.

So WG14 just decides to screw over users from other platforms? Just keep it implementation-defined! It already was! And that's still a concession from the pure defined behavior of C89!

> I had edited one of my parent comments to mention realloc, so if we both landed on the same example, there's probably not that many significant other cases.

I beg to differ. Any case where UB was implicit just because it wasn't defined in the standard could have easily been made implementation-defined instead.

Anytime WG14 adds UB that doesn't need to be UB, it is screwing over users.

dzaima · on Aug 4, 2024

> Backwards compatibility? I thought that was a target for WG14.

C23 removed K&R function declarations. Indeed backwards-compatibility is important for them, but it's not the be-all end-all.

Having a standard state exact possible behavior is meaningless if in practice it isn't followed. And it wasn't just implementation-defined, it had a specific set of options for what it could do.

> Any case where UB was implicit just because it wasn't defined in the standard could have easily been made implementation-defined instead. Any UB could have been made implementation-defined and still be backwards compatible. But anything that wasn't UB that now is counts as an example of WG14 screwing over its users.

If this is such a big issue for you, you could just name another example. It'd take, like, 5 words to say another feature in question unnecessarily changed. I'll happily do the research on how it changed over time.

It's clear that you don't like UB, but I don't think you've said anything more than that. I quite like that my compiler will optimize out dead null comparisons or some check that collapses to a 'a + C1 < a' after inlining/constant propagation. I think it's quite neat that not being able to assume signed wrapping means that one can run sanitizers that warn on such, without heaps of false-positives from people doing wrapping arith with it. If anything, I'd want some unsigned types with no unsigned wrapping (though I'd of course still want some way to do wrapping arith where needed)

gavinhoward · on Aug 4, 2024

> Having a standard state exact possible behavior is meaningless if in practice it isn't followed.

No, it means that the bug is documented to be in the platform, not the program.

> If this is such a big issue for you, you could just name another example. It'd take, like, 5 words to say another feature in question unnecessarily changed.

Okay, how about `signal()` being called in a multi-threaded program? Why couldn't they define it in C11 such that it could be called? Obviously, such a thing didn't really exist in C99, but it did in POSIX, and in POSIX, it wasn't, and still isn't, undefined. Why couldn't WG14 have simply made it implementation-defined?

> I quite like that my compiler will optimize out dead null comparisons or some check that collapses to a 'a + C1 < a' after inlining/constant propagation.

I'd rather not be forced to be a superhuman programmer.

dzaima · on Aug 5, 2024

> No, it means that the bug is documented to be in the platform, not the program.

Yes, it means that the platform is buggy, but that doesn't help anyone wanting to write portable-in-practice code. The standard specifying specific behavior is just giving a false sense of security.

> Okay, how about `signal()` being called in a multi-threaded program? Why couldn't they define it in C11 such that it could be called?

This is even more definitely not a case of compiler developer conflict of interest. And it's not a case of previously-defined behavior changing, so that set remains still at just realloc. (I wouldn't be surprised if there are more, but if it's not a thing easily listed off I find it hard to believe it's a real significant worry)

But POSIX defines it anyway; and as signals are rather pointless without platform-specific assumptions, it's not like it matters for portability. Honestly, having signals as-is in the C standard feels rather useless to me in general. And 'man 2 signal' warns to not use 'signal()', recommending the non-standard sigaction instead.

And, as far as I can tell, implementation-defined vs undefined barely matters, given that a platform may choose to define the implementation-defined thing as doing arbitrary things anyway, or, conversely, indeed document specific behavior for undefined things. The most significant thing I can tell from the wording is that implementation-defined requires the behavior to be documented, but I am fairly sure there are many C compilers that don't document everything implementation-defined.

> I'd rather not be forced to be a superhuman programmer.

All you have to do is not use signed integers for doing modular/bitwise arithmetic just as much as you don't use integers for doing floating-point arithmetic. It's not much to ask. And the null pointer thing isn't even an issue for userspace code (i.e. what 99.99% of programmers write).

I do think think configuring behavior of various things should be more prevalent & nicer to do; even in cases where a language/platform does define specific behavior, it may nevertheless be undesired (e.g. a+1<a might not work for overflow checking if signed addition was implementation-defined (and, say, a platform defines it as saturating), and so portable projects still couldn't use it for such).