From: Ralf Jung Date: Tue, 12 Apr 2022 12:32:27 +0000 (-0400) Subject: credit where credit is due X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/b6cc5bfc0c026d395f232a3cb5a6fdc429c295e0?ds=inline credit where credit is due --- diff --git a/personal/_posts/2022-04-11-provenance-exposed.md b/personal/_posts/2022-04-11-provenance-exposed.md index 44ccd86..b37344c 100644 --- a/personal/_posts/2022-04-11-provenance-exposed.md +++ b/personal/_posts/2022-04-11-provenance-exposed.md @@ -196,10 +196,12 @@ This may sound like bad news for low-level coding tricks like pointer tagging (s Do we have to optimize this code less just because of corner cases like the above? As it turns out, no we don't -- there are some situations where it is perfectly fine to do a pointer-integer cast *without* having the "exposure" side-effect. Specifically, this is the case if we never intend to cast the integer back to a pointer! -That might seem like a niche case, but it turns out that most of the time, we can avoid 'bare' integer-pointer casts, and instead use an operation like [`with_addr`](https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.with_addr) that explicitly specifies which provenance to use for the newly created pointer. +That might seem like a niche case, but it turns out that most of the time, we can avoid 'bare' integer-pointer casts, and instead use an operation like [`with_addr`](https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.with_addr) that explicitly specifies which provenance to use for the newly created pointer.[^with_addr] This is more than enough for low-level pointer shenanigans like pointer tagging, as [Gankra demonstrated](https://gankra.github.io/blah/tower-of-weakenings/#strict-provenance-no-more-getting-lucky). Rust's [Strict Provenance experiment](https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance) aims to determine whether we can use operations like `with_addr` to replace basically all integer-pointer casts. +[^with_addr]: `with_addr` has been unstably added to the Rust standard library very recently. Such an operation has been floating around in various discussions in the Rust community for quite a while, and it has even made it into [an academic paper](https://iris-project.org/pdfs/2022-popl-vip.pdf) under the name of `copy_alloc_id`. Who knows, maybe one day it will find its way into the C standard as well. :) + As part of Strict Provenance, Rust now has a second way of casting pointers to integers, `ptr.addr()`, which does *not* "expose" the permission of the underlying pointer, and hence can be treated like a pure operation![^experiment] We can do shenanigans on the integer representation of a pointer *and* have all these juicy optimizations, as long as we don't expect bare integer-pointer casts to work. As a bonus, this also makes Rust work nicely on CHERI *without* a 128bit wide `usize`, and it helps Miri, too. @@ -343,7 +345,8 @@ Here, too, my vision for Rust aligns very well with the direction C is taking. (The set of valid guesses in C is just a lot more restricted since they do not have `wrapping_offset`, and the model does not cover `restrict`. That means they can actually feasibly give an algorithm for how to do the guessing. They don't have to invoke scary terms like "angelic non-determinism", but the end result is the same -- and to me, the fact that it is equivalent to angelic non-determinism is what justifies this as a reasonable semantics. -Presenting this as a concrete algorithm to pick a suitable provenance is then just a stylistic choice.) +Presenting this as a concrete algorithm to pick a suitable provenance is then just a stylistic choice. +Kudos go to Michael Sammler for opening my eyes to this interpretation of "user disambiguation".) What is left is the question of how to handle pointer-integer transmutation, and this is where the roads are forking. PNVI-ae-udi explicitly says loading from a union field at integer type exposes the provenance of the pointer being loaded, if any. @@ -431,6 +434,9 @@ Here's the list of open problems I am aware of: So far, this all applies to LLVM as a Rust and C backend equally, so I don't think there are any good alternatives. On the plus side, adapting this strategy for `inttoptr` and `ptrtoint` means that the recent LLVM ["Full Restrict Support"](https://lists.llvm.org/pipermail/llvm-dev/2019-March/131127.html) can also handle pointer-integer round-trips "for free"! +Adding `with_addr`/`copy_alloc_id` to LLVM is not strictly necessary, since it can be implemented with `getelementptr` (without `inbounds`). +However, optimizations don't seem to always deal well with that pattern, so it might still be a good idea to add this as a primitive operation to LLVM. + Where things become more subtle is around pointer-integer transmutation. If LLVM wants to keep doing replacement of `==`-equal integers (which I strongly assume to be the case), *something* needs to give: my first example, with casts replaced by transmutation, shows a miscompilation. If we focus on doing an `i64` load of a pointer value (e.g. as in the LLVM IR produced by `transmute_union`, or pointer-based transmutation in Rust), what are the options?