X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/15ae719d84c34d6d85332db62e1d512559e66926..2ab939be0875b44d4acbe6d812d5ec19155d51ac:/personal/_posts/2018-07-24-pointers-and-bytes.md?ds=sidebyside diff --git a/personal/_posts/2018-07-24-pointers-and-bytes.md b/personal/_posts/2018-07-24-pointers-and-bytes.md index 5d6b60d..85ba4aa 100644 --- a/personal/_posts/2018-07-24-pointers-and-bytes.md +++ b/personal/_posts/2018-07-24-pointers-and-bytes.md @@ -19,7 +19,7 @@ I hope that by the end of this post, you will agree with me on both of these sta ## Pointers Are Complicated What is the problem with "pointers are just integers"? Let us consider the following example:
-(I am using C++ code here mostly because writing unsafe code is easier in C++, and unsafe code is where these problems really show up. C has all the same issues, as does unsafe Rust.) +(I am using C++ code here mostly because writing unsafe code is easier in C++ than in Rust, and unsafe code is where these problems really show up. C has all the same issues, as does unsafe Rust.) {% highlight c++ %} int test() { auto x = new int[8]; @@ -56,7 +56,7 @@ int test() { auto x = new int[8]; auto y = new int[8]; y[0] = 42; - auto x_ptr = &x[8]; // one past the end + auto x_ptr = x+8; // one past the end if (x_ptr == &y[0]) *x_ptr = 23; return y[0]; @@ -137,8 +137,10 @@ However, this simple model starts to fall apart once you consider pointer-intege In miri, casting a pointer to an integer does not actually do anything, we now just have an integer variable (i.e., its *type* says it is an integer) whose *value* is a pointer (i.e., an allocation-offset pair). However, multiplying that "integer" by 2 leads to an error, because it is entirely unclear what it means to multiply such an abstract pointer by 2. -This is the most lazy thing to do, and we do it because it is not clear what else to do (other than not supporting these casts at all -- but this way, miri can run more programs). -In our abstract machine, there is no single coherent "address space" that all allocations live in, that we could use to map every pointer to a distinct integer. +I should clarify that this is *not* a good solution when defining language semantics. +It works fine for an interpreter though. +It is the most lazy thing to do, and we do it because it is not clear what else to do (other than not supporting these casts at all -- but this way, miri can run more programs): +In our abstract machine, there just is no single coherent "address space" that all allocations live in, that we could use to map every pointer to a distinct integer. Every allocation is just identified by an (unobservable) ID. We could now start to enrich this model with extra data like a base address for each allocation, and somehow use that when casting an integer back to a pointer... but that's where it gets really complicated, and anyway discussing such a model is not the point of this post. The point it to discuss the *need* for such a model. @@ -150,6 +152,8 @@ We mostly just ignore the problem in miri and opportunistically do as much as we A full definition of a language like C++ or Rust of course cannot take this shortcut, it has to explain what really happens here. To my knowledge, no satisfying solution exists, but academic research is [getting closer](http://sf.snu.ac.kr/publications/llvmtwin.pdf). +**Update:** This was by no means meant to be an exhaustive list of academic research on C in general. I do not know of other work that focuses directly on the interplay of integer-pointer casts and optimizations, but other noteworthy work on formalizing C includes [KCC](https://github.com/kframework/c-semantics), [Robbert Krebber's PhD thesis](https://robbertkrebbers.nl/thesis.html) and [Cerberus](https://www.cl.cam.ac.uk/~pes20/cerberus/). **/Update** + This is why pointers are not simple, either. ## From Pointers to Bytes