From 663f0b41cbd81bf1e419ae911bd21fc88d2e3046 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 14 Dec 2020 18:49:22 +0100 Subject: [PATCH 1/1] replace int by uintptr_t --- ralf/_posts/2020-12-14-provenance.md | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/ralf/_posts/2020-12-14-provenance.md b/ralf/_posts/2020-12-14-provenance.md index 2f8c23c..9d4ef2c 100644 --- a/ralf/_posts/2020-12-14-provenance.md +++ b/ralf/_posts/2020-12-14-provenance.md @@ -106,15 +106,14 @@ The sequence of examples is taken from [this talk](https://sf.snu.ac.kr/llvmtwin Here is the source program: {% highlight c %} char p[1], q[1] = {0}; -int ip = (int)(p+1); -int iq = (int)q; +uintptr_t ip = (uintptr_t)(p+1); +uintptr_t iq = (uintptr_t)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); } {% endhighlight %} I am using C syntax here just as a convenient way to write programs in LLVM IR. -For simplicity, we assume that `int` has the right size to hold a pointer value; just imagine we used `uintptr_t` if you want to be more general. This program has two possible behaviors: either `ip` (the address one-past-the-end of `p`) and `iq` (the address of `q`) are different, and nothing is printed. Or the two are equal, in which case the program will print "10" (`iq` is the result of casting `q` to an integer, so casting it back will yield the original pointer, or at least a pointer pointing to the same object / location in memory). @@ -123,10 +122,10 @@ The first "optimization" we will perform is to exploit that if we enter the `if` Subsequently the definition of `ip` is inlined: {% highlight c %} char p[1], q[1] = {0}; -int ip = (int)(p+1); -int iq = (int)q; +uintptr_t ip = (uintptr_t)(p+1); +uintptr_t iq = (uintptr_t)q; if (iq == ip) { - *(char*)(int)(p+1) = 10; // <- This line changed + *(char*)(uintptr_t)(p+1) = 10; // <- This line changed print(q[0]); } {% endhighlight %} @@ -134,8 +133,8 @@ if (iq == ip) { The second optimization notices that we are taking a pointer `p+1`, casting it to an integer, and casting it back, so we can remove the cast roundtrip: {% highlight c %} char p[1], q[1] = {0}; -int ip = (int)(p+1); -int iq = (int)q; +uintptr_t ip = (uintptr_t)(p+1); +uintptr_t iq = (uintptr_t)q; if (iq == ip) { *(p+1) = 10; // <- This line changed print(q[0]); @@ -145,8 +144,8 @@ if (iq == ip) { The final optimization notices that `q` is never written to, so we can replace `q[0]` by its initial value `0`: {% highlight c %} char p[1], q[1] = {0}; -int ip = (int)(p+1); -int iq = (int)q; +uintptr_t ip = (uintptr_t)(p+1); +uintptr_t iq = (uintptr_t)q; if (iq == ip) { *(p+1) = 10; print(0); // <- This line changed @@ -185,10 +184,10 @@ In a flat memory model where pointers are just integers (such as most assembly l Now that we know that provenance exists in pointers, we have to also consider what happens to provenance when a pointer gets cast to an integer and back. The second optimization gives us a clue into this aspect of LLVM IR semantics: casting a pointer to an integer and back is optimized away, which means that *integers have provenance*. -To see why, consider the two expressions `(char*)(int)(p+1)` and `(char*)(int)q`: +To see why, consider the two expressions `(char*)(uintptr_t)(p+1)` and `(char*)(uintptr_t)q`: if the optimization of removing pointer-integer-pointer roundtrips is correct, the first operation will output `p+1` and the second will output `q`, which we just established are two different pointers (they differ in their provenance). The only way to explain this is to say that the input to the `(char*)` cast is different, since the program state is otherwise identical in both cases. -But we know that the integer values computed by `(int)(p+1)` and `(int)q` (i.e., the bit pattern of length 32 that serve as input to the `(char*)` casts) are the same, and hence a difference can only arise if these integers consist of more than just this bit pattern---just like pointers, integers have provenance. +But we know that the integer values computed by `(uintptr_t)(p+1)` and `(uintptr_t)q` (i.e., the bit pattern of length 32 that serve as input to the `(char*)` casts) are the same, and hence a difference can only arise if these integers consist of more than just this bit pattern---just like pointers, integers have provenance. Finally, let us consider the first optimization. Here, a successful equality test `iq == ip` prompts the optimizer to replace one value by the other. -- 2.30.2