From 012c4051ecc64d759d9cfb821841fe20dbba3aef Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 10 Dec 2020 20:07:41 +0100 Subject: [PATCH 1/1] minor tweaks --- personal/_drafts/provenance-matters.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/personal/_drafts/provenance-matters.md b/personal/_drafts/provenance-matters.md index 959aadd..7ad9f1a 100644 --- a/personal/_drafts/provenance-matters.md +++ b/personal/_drafts/provenance-matters.md @@ -236,9 +236,10 @@ But in a language such as Rust, C, or C++ that supports pointer-integer casts, t We also learned that LLVM has a bug, but that was *not* the point of this blog post. The GCC developers [made exactly the same mistake](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82282), and I got word that MSVC and ICC have the same issue (though I do not know how to verify this). And I cannot blame them; the way compiler development typically works, I think bugs like this are inevitable: when exactly UB arises in an IR is often only loosely specified, so evaluating whether some optimization is *correct* in the sense defined above is basically impossible. -Pointer provenance is just a particularly good example of where the current approach failed. +Pointer provenance is just a particularly good (and subtle) example. +The warm-up above is another trivial case of this (albeit one where the existing specification is sufficient): loop-invariant code motion of arithmetic operations and UB on arithmetic overflow can both be *correct*, but not in the same IR. For another example, see [§2.3 of this paper](https://plv.mpi-sws.org/validc/paper.pdf) (Figure 3 contains the code) which shows how a sequence of two optimizations can lead to a miscompilation, where the first optimization is *correct* under the LLVM concurrency model, and the second optimization is *correct* under the C++11 concurrency model---but there is no concurrency model under which *both* optimizations are correct, so each compiler (or rather, each compiler IR) needs to pick one or the other. -The warm-up example is another trivial case of this: loop-invariant code motion of arithmetic operations and UB on arithmetic overflow can both be *correct*, but not in the same IR. +Finally, this [paper on `undef` and `poison`](https://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf) gives examples for optimizations that are broken by the presence of `undef` in LLVM, and describes some of the trade-offs that arise when defining the semantics of `poison`. Which brings me to my main conclusion for this post: to avoid the problem of incompatible optimizations, I think we need to take compiler IRs more serious as programming languages in their own right, and give them a precise specification---including all the UB. Now, you may object by saying that LLVM has an [extensive LangRef](https://llvm.org/docs/LangRef.html), and still, by reading the LLVM specification one could convince oneself that each of the three optimizations above is correct, which as we have seen is contradictory. -- 2.30.2