+What did we learn?
+First of all, pointers are complicated.
+Precisely describing their semantics in a way that is consistent with common alias analyses requires adding a notion of "provenance".
+In a language such as Java or ML where pointers are opaque types whose representation cannot be observed, this is actually fairly easy to do.
+But in a language such as Rust, C or C++ that supports pointer-integer casts, the introduction of provenance poses some really tricky questions, and at least one of the commonly performed optimizations in this space has to give.
+
+We also learned that LLVM has a bug, but that was *not* the point of this blog post.
+The GCC developers [made exactly the same mistake](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82282), and I got word that MSVC and ICC have the same issue (though I do not know how to verify this).
+And I cannot blame them; the way compiler development typically works, I think bugs like this are inevitable: when exactly UB arises in an IR is often only loosely specified, so evaluating whether some optimization is *correct* in the sense defined above is basically impossible.
+Pointer provenance is just a particularly good example of where the current approach failed, but it is not the only case.
+For example, [ยง2.3 of this paper](https://plv.mpi-sws.org/validc/paper.pdf) (see Figure 3 for the code) shows how a sequence of two optimizations can lead to a miscompilation, where the first optimization is correct under the LLVM concurrency model, and the second optimization is correct under the C++11 concurrency model---but there is no concurrency model under which *both* optimizations are correct, so each compiler (or rather, each compiler IR) needs to pick one or the other.
+
+Which brings me to my main conclusion for this post: to avoid the problem of incompatible optimizations, I think we need to take compiler IRs more serious as programming languages in their own right, and give them a precise specification---including all the UB.
+Now, you may object by saying that LLVM has an extensive LangRef.
+And still, by reading the LLVM specification one could convince oneself that each of the three optimizations above is correct, which as we have seen is contradictory.
+What is missing? Do we need a formal mathematical definition to avoid such ambiguities?
+I do not think so; I think there is something simpler that will already help a lot.
+The source of the contradiction is that some parts of the specification *implicitly* assume that pointers have provenance, which is easy to forget when considering other operations.
+That's why I think it is very important to make such assumptions more explicit: the specification should *explicitly* describe the information that "makes up" a value, which will include things like provenance and whether the value is (wholly or partially) initialized.[^C]
+This information needs to be extensive enough such that a hypothetical interpreter can use it to detect all the UB.
+Doing so makes it obvious that a pointer has provenance, since otherwise it is impossible to correctly check for out-of-bounds pointer arithmetic while also permitting one-past-the-end pointers.
+
+[^C]: The astute reader will notice that this information is also absent in the C and C++ specifications. That is indeed one my main criticisms of these specifications. It leads to many open questions not only when discussing provenance but also when discussing indeterminate and unspecified values.
+
+This is my bar for what I consider a sufficiently precise language specification: it needs to contain all the information needed such that writing a UB-checking interpreter is just a matter of "putting in the work", but does not require introducing anything new that is not described in the specification.
+
+Ideally, the interpreter is not just hypothetical.
+This is really my main motivation for working on the [Miri interpreter](https://github.com/rust-lang/miri/).
+Of course, practically speaking, its main purpose is to help unsafe code authors avoid UB, but for me personally, I find it equally important that it helps us think about the semantics of Rust and MIR in a different way.
+
+I hope this was educational, and thanks for reading. :)
+As usual, this post can be discussed in the Rust forums.
+I am curious what your thoughts are on how we can build compilers that do not suffer from the issues I have discussed here.
+