This is comparable to the promise that a `&mut T` in Rust is unique.
However, just like last time, we want to consider the limits that `restrict` combined with integer-pointer casts put on an optimizing compiler -- so the actual programming language that we have to be concerned with is the IR of that compiler.
Nevertheless I will use the more familiar C syntax to write down this example; you should think of this just being notation for the "obvious" equivalent function in LLVM IR, where `restrict` is expressed via `noalias`.
Of course, if we learn that the IR has to put some limitations on what code may do, this also applies to the surface language -- so we will be talking about all three (Rust, C, LLVM) quite a bit.
This is comparable to the promise that a `&mut T` in Rust is unique.
However, just like last time, we want to consider the limits that `restrict` combined with integer-pointer casts put on an optimizing compiler -- so the actual programming language that we have to be concerned with is the IR of that compiler.
Nevertheless I will use the more familiar C syntax to write down this example; you should think of this just being notation for the "obvious" equivalent function in LLVM IR, where `restrict` is expressed via `noalias`.
Of course, if we learn that the IR has to put some limitations on what code may do, this also applies to the surface language -- so we will be talking about all three (Rust, C, LLVM) quite a bit.
We started out with a program that always prints `1`, and ended up with a program that always prints `0`.
This is bad news. Our optimizations changed program behavior. That must not happen! What went wrong?
We started out with a program that always prints `1`, and ended up with a program that always prints `0`.
This is bad news. Our optimizations changed program behavior. That must not happen! What went wrong?
-Fundamentally, this is the same situation as in the previous blog post: this
-example demonstrates that either the original program already had Undefined
-Behavior, or (at least) one of the optimizations is wrong. However, the only possibly suspicious part of the original program is a pointer-integer-pointer round-trip -- and if casting integers to pointers is allowed, *surely* that must work.
+Fundamentally, this is the same situation as in the previous blog post: this example demonstrates that either the original program already had Undefined Behavior, or (at least) one of the optimizations is wrong.
+However, the only possibly suspicious part of the original program is a pointer-integer-pointer round-trip -- and if casting integers to pointers is allowed, *surely* that must work.
+I will, for the rest of this post, assume that replacing `x` by `(int*)(uintptr_t)x` is always allowed.
Specifically, `x` has permission to access `i[0]` (declared in `main`), and `y` has permission to access `i[1]`.[^dyn]
`y2` just inherits the permission from `y`.
Specifically, `x` has permission to access `i[0]` (declared in `main`), and `y` has permission to access `i[1]`.[^dyn]
`y2` just inherits the permission from `y`.
-[^dyn]: Actually, this is not quite how `restrict` works. The exact set of locations these pointers can access is determined *dynamically*, and the only constraint is that they cannot be used to access *the same location* (except if both are just doing a load). However, I carefully picked this example so that these subtleties do not change anything.
+[^dyn]: As mentioned in a previous footnote, this is not actually how `restrict` works. The exact set of locations these pointers can access is determined *dynamically*, and the only constraint is that they cannot be used to access *the same location* (except if both are just doing a load). However, I carefully picked this example so that these subtleties should not change anything.
But which permission does `ptr` get?
Since integers do not carry provenance, the details of this permission information are lost during a pointer-integer cast, and have to somehow be 'restored' at the integer-pointer cast.
But which permission does `ptr` get?
Since integers do not carry provenance, the details of this permission information are lost during a pointer-integer cast, and have to somehow be 'restored' at the integer-pointer cast.
This may sound like bad news for low-level coding tricks like pointer tagging (storing a flag in the lowest bit of a pointer).
Do we have to optimize this code less just because of corner cases like the above?
As it turns out, no we don't -- there are some situations where it is perfectly fine to do a pointer-integer cast *without* having the "exposure" side-effect.
This may sound like bad news for low-level coding tricks like pointer tagging (storing a flag in the lowest bit of a pointer).
Do we have to optimize this code less just because of corner cases like the above?
As it turns out, no we don't -- there are some situations where it is perfectly fine to do a pointer-integer cast *without* having the "exposure" side-effect.
That might seem like a niche case, but it turns out that most of the time, we can avoid 'bare' integer-pointer casts, and instead use an operation like [`with_addr`](https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.with_addr) that explicitly specifies which provenance to use for the newly created pointer.
This is more than enough for low-level pointer shenanigans like pointer tagging, as [Gankra demonstrated](https://gankra.github.io/blah/tower-of-weakenings/#strict-provenance-no-more-getting-lucky).
Rust's [Strict Provenance experiment](https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance) aims to determine whether we can use operations like `with_addr` to replace basically all integer-pointer casts.
That might seem like a niche case, but it turns out that most of the time, we can avoid 'bare' integer-pointer casts, and instead use an operation like [`with_addr`](https://doc.rust-lang.org/nightly/std/primitive.pointer.html#method.with_addr) that explicitly specifies which provenance to use for the newly created pointer.
This is more than enough for low-level pointer shenanigans like pointer tagging, as [Gankra demonstrated](https://gankra.github.io/blah/tower-of-weakenings/#strict-provenance-no-more-getting-lucky).
Rust's [Strict Provenance experiment](https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance) aims to determine whether we can use operations like `with_addr` to replace basically all integer-pointer casts.
Either the original program already has Undefined Behavior, or one of the optimizations is incorrect.
Previously, we resolved this conundrum by saying that removing the "dead cast" `(uintptr_t)x` whose result is unused was incorrect, because that cast had the side-effect of "exposing" the permission of `x` to be picked up by future integer-pointer casts.
Either the original program already has Undefined Behavior, or one of the optimizations is incorrect.
Previously, we resolved this conundrum by saying that removing the "dead cast" `(uintptr_t)x` whose result is unused was incorrect, because that cast had the side-effect of "exposing" the permission of `x` to be picked up by future integer-pointer casts.
-We could apply the same solution again, but this time, we would have to say that a `union` access or a `memcpy` has an "expose" side-effect and hence cannot be entirely removed even if its result is unused.
+We could apply the same solution again, but this time, we would have to say that a `union` access (at integer type) or a `memcpy` (to an integer) can have an "expose" side-effect and hence cannot be entirely removed even if its result is unused.
And that sounds quite bad!
`(uintptr_t)x` only happens in code that does tricky things with pointers, so urging the compiler to be careful and optimize a bit less seems like a good idea (and at least in Rust, `x.addr()` even provides a way to opt-out of this side-effect).
However, `union` and `memcpy` are all over the place.
Do we now have to treat *all* of them as having side-effects?
And that sounds quite bad!
`(uintptr_t)x` only happens in code that does tricky things with pointers, so urging the compiler to be careful and optimize a bit less seems like a good idea (and at least in Rust, `x.addr()` even provides a way to opt-out of this side-effect).
However, `union` and `memcpy` are all over the place.
Do we now have to treat *all* of them as having side-effects?
-In Rust, due to the lack of a strict aliasing restriction, things get even worse, since literally *any* load of an integer from a raw pointer might be doing a pointer-integer transmutation and thus have the "expose" side-effect!
+In Rust, due to the lack of a strict aliasing restriction (or in C with `-fno-strict-aliasing`), things get even worse, since literally *any* load of an integer from a raw pointer might be doing a pointer-integer transmutation and thus have the "expose" side-effect!
To me, and speaking from a Rust perspective, that sounds like bad idea.
Sure, we want to make it as easy as possible to write low-level code in Rust, and that code sometimes has to do unspeakable things with pointers.
To me, and speaking from a Rust perspective, that sounds like bad idea.
Sure, we want to make it as easy as possible to write low-level code in Rust, and that code sometimes has to do unspeakable things with pointers.
Because of that, I think we should move towards discouraging, deprecating, or even entirely disallowing pointer-integer transmutation in Rust.
That means a cast is the only legal way to turn a pointer into an integer, and after the discussion above we got our casts covered.
Because of that, I think we should move towards discouraging, deprecating, or even entirely disallowing pointer-integer transmutation in Rust.
That means a cast is the only legal way to turn a pointer into an integer, and after the discussion above we got our casts covered.
What is left is the question of how to handle pointer-integer transmutation, and this is where the roads are forking.
PNVI-ae-udi explicitly says loading from a union field at integer type exposes the provenance of the pointer being loaded, if any.
What is left is the question of how to handle pointer-integer transmutation, and this is where the roads are forking.
PNVI-ae-udi explicitly says loading from a union field at integer type exposes the provenance of the pointer being loaded, if any.
-So, the example with `transmute_union` would be allowed, meaning the optimization of removing the "dead" load from the `union` would *not* be allowed.
-Same for `transmute_memcpy`, where the proposal says that `memcpy` should *preserve* provenance, but later when we access the contents of `ret` at type `uintptr_t`, that will again implicitly expose the provenance of the pointer.
+So, the example with `transmute_union` would be allowed, meaning the optimization of removing the "dead" load from the `union` would *not* (in general) be allowed.
+Same for `transmute_memcpy`, where the proposal says that when we access the contents of `ret` at type `uintptr_t`, that will again implicitly expose the provenance of the pointer.
I think there are several reasons why this choice makes sense for C, that do not apply to Rust:
- There is a *lot* of legacy code. A *LOT*.
- There is no alternative like `MaybeUninit` that could be used to hold data without losing provenance.
- Strict aliasing means that not *all* loads at integer type have to worry about provenance; only loads at character type are affected.
I think there are several reasons why this choice makes sense for C, that do not apply to Rust:
- There is a *lot* of legacy code. A *LOT*.
- There is no alternative like `MaybeUninit` that could be used to hold data without losing provenance.
- Strict aliasing means that not *all* loads at integer type have to worry about provenance; only loads at character type are affected.
As the example above shows, the compiler has to be very careful when removing any operation that can expose a provenance, since there might be integer-pointer casts later that rely on this.
As the example above shows, the compiler has to be very careful when removing any operation that can expose a provenance, since there might be integer-pointer casts later that rely on this.