## Pointers Are Complicated
What is the problem with "pointers are just integers"? Let us consider the following example:<br>
-(I am using C++ code here mostly because writing unsafe code is easier in C++, and unsafe code is where these problems really show up. C has all the same issues, as does unsafe Rust.)
+(I am using C++ code here mostly because writing unsafe code is easier in C++ than in Rust, and unsafe code is where these problems really show up. C has all the same issues, as does unsafe Rust.)
{% highlight c++ %}
int test() {
auto x = new int[8];
auto x = new int[8];
auto y = new int[8];
y[0] = 42;
- auto x_ptr = &x[8]; // one past the end
+ auto x_ptr = x+8; // one past the end
if (x_ptr == &y[0])
*x_ptr = 23;
return y[0];
When formally describing what the programmer may and may not do in these languages, as we have seen, the model of pointers as integers falls apart, so we have to look for something else.
This is another example of using a "virtual machine" that's different from the real machine for specification purposes, which is an idea [I have blogged about before]({% post_url 2017-06-06-MIR-semantics %}).
-Here's a simple proposal (in fact, this is the model of pointers used in [CompCert](https://hal.inria.fr/hal-00703441/document) and my [RustBelt work]({% post_url 2017-07-08-rustbelt %}), and it is also how [miri](https://github.com/solson/miri/) implements pointers):
+Here's a simple proposal (in fact, this is the model of pointers used in [CompCert](https://hal.inria.fr/hal-00703441/document) and my [RustBelt work]({% post_url 2017-07-08-rustbelt %}), and it is also how [miri](https://github.com/solson/miri/) implements [pointers](https://github.com/rust-lang/rust/blob/fefe81605d6111faa8dbb3635ab2c51d59de740a/src/librustc/mir/interpret/mod.rs#L121-L124)):
A pointer is a pair of some kind of ID uniquely identifying the *allocation*, and an *offset* into the allocation.
Adding/subtracting an integer to/from a pointer just acts on the offset, and can thus never leave the allocation.
Subtracting a pointer from another is only allowed when both point to the same allocation (matching [C++](https://timsong-cpp.github.io/cppwp/n4140/expr.add#6)).[^2]
A full definition of a language like C++ or Rust of course cannot take this shortcut, it has to explain what really happens here.
To my knowledge, no satisfying solution exists, but academic research is [getting closer](http://sf.snu.ac.kr/publications/llvmtwin.pdf).
+**Update:** This was by no means meant to be an exhaustive list of academic research on C in general. I do not know of other work that focuses directly on the interplay of integer-pointer casts and optimizations, but other noteworthy work on formalizing C includes [KCC](https://github.com/kframework/c-semantics), [Robbert Krebber's PhD thesis](https://robbertkrebbers.nl/thesis.html) and [Cerberus](https://www.cl.cam.ac.uk/~pes20/cerberus/). **/Update**
+
This is why pointers are not simple, either.
## From Pointers to Bytes