X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/663f0b41cbd81bf1e419ae911bd21fc88d2e3046..9624b33175bf0e44d6d39da90f91c19473d8353a:/ralf/_posts/2020-12-14-provenance.md?ds=inline diff --git a/ralf/_posts/2020-12-14-provenance.md b/ralf/_posts/2020-12-14-provenance.md index 9d4ef2c..4e95665 100644 --- a/ralf/_posts/2020-12-14-provenance.md +++ b/ralf/_posts/2020-12-14-provenance.md @@ -1,7 +1,9 @@ --- title: "Pointers Are Complicated II, or: We need better language specs" -categories: rust +categories: rust research forum: https://internals.rust-lang.org/t/pointers-are-complicated-ii-or-we-need-better-language-specs/13562 +license: CC BY-SA 4.0 +license-url: https://creativecommons.org/licenses/by-sa/4.0/ --- Some time ago, I wrote a blog post about how [there's more to a pointer than meets the eye]({% post_url 2018-07-24-pointers-and-bytes %}). @@ -187,7 +189,7 @@ The second optimization gives us a clue into this aspect of LLVM IR semantics: c To see why, consider the two expressions `(char*)(uintptr_t)(p+1)` and `(char*)(uintptr_t)q`: if the optimization of removing pointer-integer-pointer roundtrips is correct, the first operation will output `p+1` and the second will output `q`, which we just established are two different pointers (they differ in their provenance). The only way to explain this is to say that the input to the `(char*)` cast is different, since the program state is otherwise identical in both cases. -But we know that the integer values computed by `(uintptr_t)(p+1)` and `(uintptr_t)q` (i.e., the bit pattern of length 32 that serve as input to the `(char*)` casts) are the same, and hence a difference can only arise if these integers consist of more than just this bit pattern---just like pointers, integers have provenance. +But we know that the integer values computed by `(uintptr_t)(p+1)` and `(uintptr_t)q` (i.e., the bit pattern as stored in some CPU register) are the same, and hence a difference can only arise if these integers consist of more than just this bit pattern---just like pointers, integers have provenance. Finally, let us consider the first optimization. Here, a successful equality test `iq == ip` prompts the optimizer to replace one value by the other. @@ -262,7 +264,7 @@ Nowadays, its main purpose is to help unsafe code authors avoid UB, but for me p It also feeds back into the design of the UB rules by discovering patterns that people want or need to use but that are not currently accepted by Miri. On the LLVM side, the main development in this area is [Alive](https://blog.regehr.org/archives/1722), a tool that can automatically validate[^validate] optimizations performed by LLVM. -Alive has found [many bugs in LLVM optimizations](https://github.com/AliveToolkit/alive2/blob/master/BugList.md), and indeed much of the recent dialog with the LLVM community aimed at a more precise IR semantics is pushed by the people building Alive, lead by Nuno P. Lopes and John Regehr. +Alive has found [many bugs in LLVM optimizations](https://github.com/AliveToolkit/alive2/blob/master/BugList.md), and indeed much of the recent dialog with the LLVM community aimed at a more precise IR semantics is pushed by the people building Alive, led by Nuno P. Lopes and John Regehr. [^validate]: A note on terminology: "validating" an optimization means that given the program text before and after the optimization, the tool will try to prove that this particular transformation is correct. This is in contrast to "verifying" an optimization where a once-and-forall proof is carried out showing that the optimization will always perform a correct transformation. Verification gives a much stronger result, but is also extremely difficult to carry out, so validation is a great middle-ground that is still able to find plenty of bugs. @@ -270,7 +272,7 @@ Progress on these specification efforts is slow, though, in particular when it t I hope this post can raise awareness for the subtle problems optimizing compilers are facing, and convince some people that figuring out the specification of compiler IRs is an important and interesting problem to work on. :) That's all I have for today, thanks for sticking with me! -As usual, this post can be [discussed in the Rust forums](https://internals.rust-lang.org/t/pointers-are-complicated-ii-or-we-need-better-language-specs/13562). +As usual, this post can be [discussed in the Rust forums](https://internals.rust-lang.org/t/pointers-are-complicated-ii-or-we-need-better-language-specs/13562) and [on Reddit](https://www.reddit.com/r/rust/comments/kd157i/pointers_are_complicated_ii_or_we_need_better/). I am curious what your thoughts are on how we can build compilers that do not suffer from the issues I have discussed here. #### Footnotes