+What did we learn?
+First of all, pointers are complicated.
+Precisely describing their semantics in a way that is consistent with common alias analyses requires adding a notion of "provenance".
+In a language such as Java or ML where pointers are opaque types whose representation cannot be observed, this is actually fairly easy to do.
+But in a language such as Rust, C or C++ that supports pointer-integer casts, the introduction of provenance poses some really tricky questions, and at least one of the commonly performed optimizations in this space has to give.
+
+We also learned that LLVM has a bug, but that was *not* the point of this blog post.
+The GCC developers [made exactly the same mistake](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82282), and I got word that MSVC and ICC have the same issue (though I do not know how to verify this).
+And I cannot blame them; the way compiler development typically works nowadays, I think bugs like this are inevitable.
+Pointer provenance is just a particularly good example of where the current approach failed, but it is not the only case.
+For example, [ยง2.3 of this paper](https://plv.mpi-sws.org/validc/paper.pdf) (see Figure 3 for the code) shows how a sequence of two optimizations can lead to a miscompilation, where the first optimization is correct under the LLVM concurrency model, and the second optimization is correct under the C++11 concurrency model---but there is no concurrency model under which *both* optimizations are correct, so each compiler (or rather, each compiler IR) needs to pick one or the other.
+
+Which brings me to my main conclusion for this post: I think the way optimizing compilers for these low-level languages are built is fundamentally flawed.
+The current approach, which one might call "optimizations-first", is to largely think of the semantics of a compiler IR in terms of the set of optimizations that it enables.[^weak-mem]
+However, each additional optimization puts another constraint on the IR semantics---how can we be sure that there even is a way to satisfy all constraints in a single language?
+The only way I know to ensure that the constraint set remains satisfiable, i.e., to ensure that performing all these optimizations in any order does not introduce bugs, is to pick a consistent semantics for the IR and then show each optimization to be *correct* under those semantics, in the sense defined above.
+
+[^weak-mem]: We can also clearly see this approach in most discussions around weak memory concurrency effects, which typically are all about which reorderings the compiler is allowed to perform and how barriers can be used to prevent the reorderings. This is the optimizations-first approach in action.
+
+To avoid the problem of incompatible optimizations, I think we need to take compiler IRs more serious as programming languages in their own right, and properly define the Abstract Machine that describes their semantics---including all the UB!
+I call this "language-first".
+This language definition does not need to be a formal artifact in a proof assistant, but the description needs to be precise enough such that there are no ambiguities, and such that for each desired optimization we can evaluate whether it is *correct* for this IR or not.
+One great way to achieve this is to implement a *reference interpreter* for the IR, an interpreter that not only evaluates IR programs but also says if there was any UB triggered by this execution.
+Writing such an interpreter is a great exercise because it requires explicitly writing out what one could call the "state space" of the Abstract Machine:
+just what *is* a value, which pieces of data are needed to fully describe it, what kind of information is stored in memory, and so on.
+Doing so makes it *immediately obvious* that a pointer has provenance, since otherwise it is impossible to correctly check for out-of-bounds accesses.
+
+This is really my main motivation for working on the [Miri interpreter](https://github.com/rust-lang/miri/).
+Of course, practically speaking, its main purpose is to help unsafe code authors avoid UB, but for me personally, I find it equally important that it helps us think about the semantics of Rust and MIR in an *operational* way.
+This helps shift the discussion around MIR from "optimizations-first" more towards "language-first".
+Of course, optimizations are still the key motivating factor for defining the language semantics this way or that way, but ultimately it is the language semantics that serves as "ground truth" and keeps it all together, ensuring everything is mutually compatible.
+
+I hope this was educational, and thanks for reading. :)
+As usual, this post can be discussed in the Rust forums.
+I am curious what your thoughts are on how we can build compilers that do not suffer from these issues.
+