X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/2ccd2fcebe3e1773b600e80fc512e304fd3ac500..f57e2656060c843b93ccbc124ee0ee839569befc:/ralf/_posts/2017-07-14-undefined-behavior.md diff --git a/ralf/_posts/2017-07-14-undefined-behavior.md b/ralf/_posts/2017-07-14-undefined-behavior.md index ae5463e..851f737 100644 --- a/ralf/_posts/2017-07-14-undefined-behavior.md +++ b/ralf/_posts/2017-07-14-undefined-behavior.md @@ -8,7 +8,7 @@ So, finally, just one year later, this post is my take at what the purpose of th Warning: This post may contain opinions. You have been warned. -## When are optimizations legal? +## When are Optimizations Legal? Currently, we have a pretty good understanding of what the intended behavior of *safe* Rust is. That is, there is general agreement (modulo some [bugs](https://github.com/rust-lang/rust/issues/27868)) about the order in which operations are to be performed, and about what each individual operation does. @@ -37,9 +37,9 @@ After all, if there is any execution for which the assumption does *not* hold, t Now, it turns out that it is often really hard to obtain precise aliasing information. This could be the end of the game: No alias information, no way to verify our assumptions, no optimizations. -## Shifting responsibility +## Shifting Responsibility -However, it turns out that compilers writers consider these optimizations important enough that they came up with an alternative solution: +However, it turns out that compiler writers consider these optimizations important enough that they came up with an alternative solution: Instead of having the compiler verify such assumptions, they declared the programmer responsible. For example, the C standard says that memory accesses have to happen with the right "effective type": If data was stored with a `float` pointer, it must not be read with an `int` pointer. @@ -52,10 +52,15 @@ However, if they *do* alias, that would violate the effective type restriction, As we have seen, in both of the possible cases, the reordering is correct; the compiler is thus free to perform the transformation. Undefined behavior moves the burden of proving the correctness of this optimization from the compiler to the programmer. +In the example above, what the "effective type" rule really means is that every single memory read of a `float` comes with a *proof obligation*: +The programmer has to show that that the last write to this memory actually happened through a `float` pointer (baring some exceptions around union and character pointers). +Similarly, the (in)famous rule that [signed integer overflow is undefined behavior](https://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c) means that every single arithmetic operation on signed integers comes with the proof obligation that this operation will never, ever, overflow. +The compiler performs its optimization under the assumption that the programmer actually went through the effort and convinced itself that this is the case. + Considering that the compiler can only be so smart, this is a great way to justify optimizations that would otherwise be difficult or impossible to perform. Unfortunately, it is often not easy to say whether a program has undefined behavior or not -- after all, such an analysis being difficult is the entire reason compilers have to rely on UB to perform their optimizations. Furthermore, while C compilers are happy to exploit the fact that a particular program *has* UB, they do not provide a way to test that executing a program *does not* trigger UB. -It also turns out that programmers' intuition often [does not match what the compiler does](https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-discussion.html), which leads to miscompilations (in the eye of the programmer) and sometimes to security [vulerabilities](https://lwn.net/Articles/342330/). +It also turns out that programmers' intuition often [does not match what the compiler does](https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-discussion.html), which leads to miscompilations (in the eye of the programmer) and sometimes to security [vulnerabilities](https://lwn.net/Articles/342330/). As a consequence, UB has a pretty bad reputation. There are various sanitizers that watch a program while it is being executed and try to detect UB, but they are not able to catch all possible sources of UB. @@ -87,4 +92,8 @@ Such a specification would describe the additional state that is needed at run-t It is with such considerations in my mind that I have previously written about [miri as an executable specification]({{ site.baseurl }}{% post_url 2017-06-06-MIR-semantics %}). Coming up next on this channel: During my [internship]({{ site.baseurl }}{% post_url 2017-05-23-internship-starting %}), I am working on such a specification. -I have a draft ready now, and I want to share it with the world to see what the world thinks about it. +My ideas are concrete enough now that I can write down a draft, which I will share with the world to see what the world thinks about it. + +**Update:** [Writing down has happened]({{ site.baseurl }}{% post_url 2017-07-17-types-as-contracts %}). + +**Update:** Clarified "Shifting Responsibility".