@@ -15,10+15,10 @@ There is also a larger message here about how we could prevent such issues from
<!-- MORE -->
I will show a series of three compiler transformations that each seem "intuitively justified", but when taken together they lead to a clearly incorrect result.
<!-- MORE -->
I will show a series of three compiler transformations that each seem "intuitively justified", but when taken together they lead to a clearly incorrect result.
-We will use LLVM for these examples, but the goal is not to pick on LLVM---other compilers suffer from similar issues.
+I will use LLVM for these examples, but the goal is not to pick on LLVM---other compilers suffer from similar issues.
The goal is to convince you that to build a correct compiler for languages permitting unsafe pointer manipulation such as C, C++, or Rust,
we need to take IR semantics (and specifically provenance) more seriously.
The goal is to convince you that to build a correct compiler for languages permitting unsafe pointer manipulation such as C, C++, or Rust,
we need to take IR semantics (and specifically provenance) more seriously.
-We use LLVM for the examples because it is particularly easy to study with its single, (comparatively) well-documented IR that a lot of infrastructure evolved around.
+I use LLVM for the examples because it is particularly easy to study with its single, (comparatively) well-documented IR that a lot of infrastructure evolved around.
-We are using C syntax here just as a convenient way to write programs in LLVM IR.
+I am using C syntax here just as a convenient way to write programs in LLVM IR.
For simplicity, we assume that `int` has the right size to hold a pointer value; just imagine we used `uintptr_t` if you want to be more general.
This program has two possible behaviors: either `ip` (the address one-past-the-end of `p`) and `iq` (the address of `q`) are different, and nothing is printed.
For simplicity, we assume that `int` has the right size to hold a pointer value; just imagine we used `uintptr_t` if you want to be more general.
This program has two possible behaviors: either `ip` (the address one-past-the-end of `p`) and `iq` (the address of `q`) are different, and nothing is printed.
@@ -167,7+167,7 @@ That's why the LLVM LangRef will not give us a clear answer here, and indeed obt
To proceed, we will use the three optimizations that we considered above as cues: assuming that the optimization is correct for LLVM IR, what does that tell us about the semantics?
To proceed, we will use the three optimizations that we considered above as cues: assuming that the optimization is correct for LLVM IR, what does that tell us about the semantics?
-We start with the last optimization, where the `print` argument is changed from `q[0]` to `0`.
+Let us start with the last optimization, where the `print` argument is changed from `q[0]` to `0`.
This optimization is based on alias analysis:
`q[0]` gets initialized to `0` at the beginning of the program, and the only write between that initialization and the `print` is to the pointer `p+1`.
Since `q` and `p` point to different local variables, a pointer derived from `p` cannot alias `q[0]`, and hence we know that this write cannot affect the value stored at `q[0]`.
This optimization is based on alias analysis:
`q[0]` gets initialized to `0` at the beginning of the program, and the only write between that initialization and the `print` is to the pointer `p+1`.
Since `q` and `p` point to different local variables, a pointer derived from `p` cannot alias `q[0]`, and hence we know that this write cannot affect the value stored at `q[0]`.