## Pointers Are Complicated
What is the problem with "pointers are just integers"? Let us consider the following example:<br>
-(I am using C++ code here mostly because writing unsafe code is easier in C++, and unsafe code is where these problems really show up. C has all the same issues, as does unsafe Rust.)
+(I am using C++ code here mostly because writing unsafe code is easier in C++ than in Rust, and unsafe code is where these problems really show up. C has all the same issues, as does unsafe Rust.)
{% highlight c++ %}
int test() {
auto x = new int[8];
auto x = new int[8];
auto y = new int[8];
y[0] = 42;
- auto x_ptr = &x[8]; // one past the end
+ auto x_ptr = x+8; // one past the end
if (x_ptr == &y[0])
*x_ptr = 23;
return y[0];
When formally describing what the programmer may and may not do in these languages, as we have seen, the model of pointers as integers falls apart, so we have to look for something else.
This is another example of using a "virtual machine" that's different from the real machine for specification purposes, which is an idea [I have blogged about before]({% post_url 2017-06-06-MIR-semantics %}).
-Here's a simple proposal (in fact, this is the model of pointers used in [CompCert](https://hal.inria.fr/hal-00703441/document) and my [RustBelt work]({% post_url 2017-07-08-rustbelt %}), and it is also how [miri](https://github.com/solson/miri/) implements pointers):
+Here's a simple proposal (in fact, this is the model of pointers used in [CompCert](https://hal.inria.fr/hal-00703441/document) and my [RustBelt work]({% post_url 2017-07-08-rustbelt %}), and it is also how [miri](https://github.com/solson/miri/) implements [pointers](https://github.com/rust-lang/rust/blob/fefe81605d6111faa8dbb3635ab2c51d59de740a/src/librustc/mir/interpret/mod.rs#L121-L124)):
A pointer is a pair of some kind of ID uniquely identifying the *allocation*, and an *offset* into the allocation.
+If we defined this in Rust, we might write
+{% highlight rust %}
+struct Pointer {
+ alloc_id: usize,
+ offset: isize,
+}
+{% endhighlight %}
Adding/subtracting an integer to/from a pointer just acts on the offset, and can thus never leave the allocation.
Subtracting a pointer from another is only allowed when both point to the same allocation (matching [C++](https://timsong-cpp.github.io/cppwp/n4140/expr.add#6)).[^2]
In miri, casting a pointer to an integer does not actually do anything, we now just have an integer variable (i.e., its *type* says it is an integer) whose *value* is a pointer (i.e., an allocation-offset pair).
However, multiplying that "integer" by 2 leads to an error, because it is entirely unclear what it means to multiply such an abstract pointer by 2.
-This is the most lazy thing to do, and we do it because it is not clear what else to do (other than not supporting these casts at all -- but this way, miri can run more programs).
-In our abstract machine, there is no single coherent "address space" that all allocations live in, that we could use to map every pointer to a distinct integer.
+I should clarify that this is *not* a good solution when defining language semantics.
+It works fine for an interpreter though.
+It is the most lazy thing to do, and we do it because it is not clear what else to do (other than not supporting these casts at all -- but this way, miri can run more programs):
+In our abstract machine, there just is no single coherent "address space" that all allocations live in, that we could use to map every pointer to a distinct integer.
Every allocation is just identified by an (unobservable) ID.
We could now start to enrich this model with extra data like a base address for each allocation, and somehow use that when casting an integer back to a pointer... but that's where it gets really complicated, and anyway discussing such a model is not the point of this post.
The point it to discuss the *need* for such a model.
A full definition of a language like C++ or Rust of course cannot take this shortcut, it has to explain what really happens here.
To my knowledge, no satisfying solution exists, but academic research is [getting closer](http://sf.snu.ac.kr/publications/llvmtwin.pdf).
+**Update:** This was by no means meant to be an exhaustive list of academic research on C in general. I do not know of other work that focuses directly on the interplay of integer-pointer casts and optimizations, but other noteworthy work on formalizing C includes [KCC](https://github.com/kframework/c-semantics), [Robbert Krebber's PhD thesis](https://robbertkrebbers.nl/thesis.html) and [Cerberus](https://www.cl.cam.ac.uk/~pes20/cerberus/). **/Update**
+
This is why pointers are not simple, either.
## From Pointers to Bytes