X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/b98a2b58c57e48dbf601033e0e8e0ba6fe06720d..4c2e82a3f7cc25046ca2984c1d93f19c212236f2:/personal/_posts/2018-07-24-pointers-and-bytes.md?ds=inline diff --git a/personal/_posts/2018-07-24-pointers-and-bytes.md b/personal/_posts/2018-07-24-pointers-and-bytes.md index 53daa57..68a2cb0 100644 --- a/personal/_posts/2018-07-24-pointers-and-bytes.md +++ b/personal/_posts/2018-07-24-pointers-and-bytes.md @@ -56,7 +56,7 @@ int test() { auto x = new int[8]; auto y = new int[8]; y[0] = 42; - auto x_ptr = &x[8]; // one past the end + auto x_ptr = x+8; // one past the end if (x_ptr == &y[0]) *x_ptr = 23; return y[0]; @@ -119,8 +119,15 @@ C++ and Rust employ a more "high-level" view of memory and pointers, restricting When formally describing what the programmer may and may not do in these languages, as we have seen, the model of pointers as integers falls apart, so we have to look for something else. This is another example of using a "virtual machine" that's different from the real machine for specification purposes, which is an idea [I have blogged about before]({% post_url 2017-06-06-MIR-semantics %}). -Here's a simple proposal (in fact, this is the model of pointers used in [CompCert](https://hal.inria.fr/hal-00703441/document) and my [RustBelt work]({% post_url 2017-07-08-rustbelt %}), and it is also how [miri](https://github.com/solson/miri/) implements pointers): +Here's a simple proposal (in fact, this is the model of pointers used in [CompCert](https://hal.inria.fr/hal-00703441/document) and my [RustBelt work]({% post_url 2017-07-08-rustbelt %}), and it is also how [miri](https://github.com/solson/miri/) implements [pointers](https://github.com/rust-lang/rust/blob/fefe81605d6111faa8dbb3635ab2c51d59de740a/src/librustc/mir/interpret/mod.rs#L121-L124)): A pointer is a pair of some kind of ID uniquely identifying the *allocation*, and an *offset* into the allocation. +If we defined this in Rust, we might write +{% highlight rust %} +struct Pointer { + alloc_id: usize, + offset: isize, +} +{% endhighlight %} Adding/subtracting an integer to/from a pointer just acts on the offset, and can thus never leave the allocation. Subtracting a pointer from another is only allowed when both point to the same allocation (matching [C++](https://timsong-cpp.github.io/cppwp/n4140/expr.add#6)).[^2] @@ -166,7 +173,7 @@ We have to say what the value of `v` is, so we have to find some way to answer t (And this is an entirely separate issue from the problem with multiplication that came up in the last section. We just assume some abstract type `Pointer`.) We cannot represent a byte of a pointer as an element of `0..256`. -Essentially, if we use a naive model of memory, the extra "hidden" part of a pointer (the one that makes it more than just an integer) would be lost whne a pointer is stored to memory and loaded again. +Essentially, if we use a naive model of memory, the extra "hidden" part of a pointer (the one that makes it more than just an integer) would be lost when a pointer is stored to memory and loaded again. We have to fix this, so we have to extend our notion of a "byte" to accomodate that extra state. So, a byte is now *either* an element of `0..256` ("raw bits"), *or* the n-th byte of some abstract pointer. If we were to implement our memory model in Rust, this might look as follows: @@ -230,6 +237,8 @@ Finally, `Uninit` is also a better choice for interpreters like miri. Such interpreters have a hard time dealing with operations of the form "just choose any of these values" (i.e., non-deterministic operations), because if they want to fully explore all possible program executions, that means they have to try every possible value. Using `Uninit` instead of an arbitrary bit pattern means miri can, in a single execution, reliably tell you if your programs uses uninitialized values incorrectly. +**Update:** Since writing this section, I have written an entire [post dedicated to uninitialized memory and "real hardware"]({% post_url 2019-07-14-uninit %}) with more details, examples and references. **/Update** + ## Conclusion We have seen that in languages like C++ and Rust (unlike on real hardware), pointers can be different even when they point to the same address, and that a byte is more than just a number in `0..256`.