X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/9a3fbb8fa4bb0533e1995f8539714f26371044a3..ff66af132f9391aeb1ef74920e2cd07f654949f0:/ralf/_posts/2018-07-24-pointers-and-bytes.md diff --git a/ralf/_posts/2018-07-24-pointers-and-bytes.md b/ralf/_posts/2018-07-24-pointers-and-bytes.md index 4a9406f..05690b3 100644 --- a/ralf/_posts/2018-07-24-pointers-and-bytes.md +++ b/ralf/_posts/2018-07-24-pointers-and-bytes.md @@ -19,7 +19,7 @@ I hope that by the end of this post, you will agree with me on both of these sta ## Pointers Are Complicated What is the problem with "pointers are just integers"? Let us consider the following example:
-(I am using C++ code here mostly because writing unsafe code is easier in C++, and unsafe code is where these problems really show up.) +(I am using C++ code here mostly because writing unsafe code is easier in C++, and unsafe code is where these problems really show up. C has all the same issues, as does unsafe Rust.) {% highlight c++ %} int test() { auto x = new int[8]; @@ -137,8 +137,9 @@ However, this simple model starts to fall apart once you consider pointer-intege In miri, casting a pointer to an integer does not actually do anything, we now just have an integer variable (i.e., its *type* says it is an integer) whose *value* is a pointer (i.e., an allocation-offset pair). However, multiplying that "integer" by 2 leads to an error, because it is entirely unclear what it means to multiply such an abstract pointer by 2. -This is the most lazy thing to do, and we do it because it is not clear what else to do -- in our abstract machine, there is no single coherent "address space" that all allocations live in, that we could use to map every pointer to a distinct integer. -Every allocation is just identified by a (unobservable) ID. +This is the most lazy thing to do, and we do it because it is not clear what else to do (other than not supporting these casts at all -- but this way, miri can run more programs). +In our abstract machine, there is no single coherent "address space" that all allocations live in, that we could use to map every pointer to a distinct integer. +Every allocation is just identified by an (unobservable) ID. We could now start to enrich this model with extra data like a base address for each allocation, and somehow use that when casting an integer back to a pointer... but that's where it gets really complicated, and anyway discussing such a model is not the point of this post. The point it to discuss the *need* for such a model. If you are interested, I suggest you read [this paper](http://www.cis.upenn.edu/%7Estevez/papers/KHM+15.pdf) that explores the above idea of adding a base address. @@ -227,6 +228,7 @@ Using `Uninit` instead of an arbitrary bit pattern means miri can, in a single e ## Conclusion We have seen that in languages like C++ and Rust (unlike on real hardware), pointers can be different even when they point to the same address, and that a byte is more than just a number in `0..256`. +This is also why calling C "portable assembly" may have been appropriate in 1978, but is a dangerously misleading statement nowadays. With this, I think we are ready to look at a first draft of my "2018 memory model" (working title ;) -- in the next post. :) Thanks to @rkruppe and @nagisa for help in finding arguments for why `Uninit` is needed.