---
title: "Pointers Are Complicated, or: What's in a Byte?"
-categories: internship rust
+categories: internship rust programming
forum: https://internals.rust-lang.org/t/pointers-are-complicated-or-whats-in-a-byte/8045
---
}
{% endhighlight %}
It would be beneficial to be able to optimize the final read of `y[0]` to just return `42`.
+C++ compilers regularly perform such optimizations as they are crucial for generating high-quality assembly.[^perf]
The justification for this optimization is that writing to `x_ptr`, which points into `x`, cannot change `y`.
+[^perf]: To be fair, the are *claimed* to be crucial for generating high-quality assembly. The claim sounds plausible to me, but unfortunately, I do not know of a systematic study exploring the performance benefits of such optimizations.
+
However, given how low-level a language C++ is, we can actually break this assumption by setting `i` to `y-x`.
Since `&x[i]` is the same as `x+i`, this means we are actually writing `23` to `&y[0]`.
Of course, that does not stop C++ compilers from doing these optimizations.
-To allow this, the standard declares our code to have [undefined behavior]({% post_url 2017-07-14-undefined-behavior %}).
+To allow this, the standard declares our code to have [undefined behavior]({% post_url 2017-07-14-undefined-behavior %}).[^0]
+
+[^0]: An argument could be made that compilers should just not do such optimizations to make the programming model simpler. This is a discussion worth having, but the point of this post is not to explore this trade-off, it is to explore the consequences of the choices made in C++.
First of all, it is not allowed to perform pointer arithmetic (like `&x[i]` does) that goes [beyond either end of the array it started in](https://timsong-cpp.github.io/cppwp/n4140/expr.add#5).
Our program violates this rule: `x[i]` is outside of `x`, so this is undefined behavior.
(And this is an entirely separate issue from the problem with multiplication that came up in the last section. We just assume some abstract type `Pointer`.)
We cannot represent a byte of a pointer as an element of `0..256`.
-Essentially, if we use a naive model of memory, the extra "hidden" part of a pointer (the one that makes it more than just an integer) would be lost whne a pointer is stored to memory and loaded again.
+Essentially, if we use a naive model of memory, the extra "hidden" part of a pointer (the one that makes it more than just an integer) would be lost when a pointer is stored to memory and loaded again.
We have to fix this, so we have to extend our notion of a "byte" to accomodate that extra state.
So, a byte is now *either* an element of `0..256` ("raw bits"), *or* the n-th byte of some abstract pointer.
If we were to implement our memory model in Rust, this might look as follows:
Such interpreters have a hard time dealing with operations of the form "just choose any of these values" (i.e., non-deterministic operations), because if they want to fully explore all possible program executions, that means they have to try every possible value.
Using `Uninit` instead of an arbitrary bit pattern means miri can, in a single execution, reliably tell you if your programs uses uninitialized values incorrectly.
+**Update:** Since writing this section, I have written an entire [post dedicated to uninitialized memory and "real hardware"]({% post_url 2019-07-14-uninit %}) with more details, examples and references. **/Update**
+
## Conclusion
We have seen that in languages like C++ and Rust (unlike on real hardware), pointers can be different even when they point to the same address, and that a byte is more than just a number in `0..256`.