From: Ralf Jung Date: Sat, 15 Jul 2017 04:04:09 +0000 (-0700) Subject: tone it down a little X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/2ccd2fcebe3e1773b600e80fc512e304fd3ac500?hp=d6bd0e0a4b8ab6091b42f6d5ea45f4582d1f4f0a tone it down a little --- diff --git a/ralf/_posts/2017-07-14-undefined-behavior.md b/ralf/_posts/2017-07-14-undefined-behavior.md index 6ff8e5f..ae5463e 100644 --- a/ralf/_posts/2017-07-14-undefined-behavior.md +++ b/ralf/_posts/2017-07-14-undefined-behavior.md @@ -37,12 +37,12 @@ After all, if there is any execution for which the assumption does *not* hold, t Now, it turns out that it is often really hard to obtain precise aliasing information. This could be the end of the game: No alias information, no way to verify our assumptions, no optimizations. -## But we want to optimize anyway! +## Shifting responsibility -However, it turns out that compilers writers want these optimizations *so badly* that they came up with an alternative solution: -Instead of putting the burden for verifying such assumptions on the compiler, they put it on the programmer. +However, it turns out that compilers writers consider these optimizations important enough that they came up with an alternative solution: +Instead of having the compiler verify such assumptions, they declared the programmer responsible. -To this end, the C standard says that memory accesses have to happen with the right "effective type": If data was stored with a `float` pointer, it must not be read with an `int` pointer. +For example, the C standard says that memory accesses have to happen with the right "effective type": If data was stored with a `float` pointer, it must not be read with an `int` pointer. If you violate this rule, your program has *undefined behavior* (UB) -- which is to say, the program may do *anything* when executed. Now, if the compiler wants to make a transformation like reordering the two stores in our example, it can argue as follows: In any particular execution of the given function, either `x` and `y` alias or they do not. @@ -52,13 +52,17 @@ However, if they *do* alias, that would violate the effective type restriction, As we have seen, in both of the possible cases, the reordering is correct; the compiler is thus free to perform the transformation. Undefined behavior moves the burden of proving the correctness of this optimization from the compiler to the programmer. -Unfortunately, it is often not easy to say whether a program has undefined behavior or not -- after all, such an analysis being difficult is the entire reason compilers rather rely on UB to perform their optimizations. +Considering that the compiler can only be so smart, this is a great way to justify optimizations that would otherwise be difficult or impossible to perform. +Unfortunately, it is often not easy to say whether a program has undefined behavior or not -- after all, such an analysis being difficult is the entire reason compilers have to rely on UB to perform their optimizations. Furthermore, while C compilers are happy to exploit the fact that a particular program *has* UB, they do not provide a way to test that executing a program *does not* trigger UB. -In fact, it is pretty hard to perform such a test; the standard has not been written with such considerations in mind. -This has given UB a very bad reputation, and mostly rightly so, I would say. -[A lengthy blog post](https://blog.regehr.org/archives/1520) summarizes the situation quite well: -There is progress being made with various sanitizers, but e.g. for the effective type restriction we discussed above, the mitigation (i.e., the way to check or otherwise make sure your programs are not affected) is "turn off optimizations that rely on this". -That's not very satisfying. +It also turns out that programmers' intuition often [does not match what the compiler does](https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-discussion.html), which leads to miscompilations (in the eye of the programmer) and sometimes to security [vulerabilities](https://lwn.net/Articles/342330/). +As a consequence, UB has a pretty bad reputation. + +There are various sanitizers that watch a program while it is being executed and try to detect UB, but they are not able to catch all possible sources of UB. +Part of the reason this is so hard is that the standard has not been written with such sanitizers in mind. +This [recent blog post](https://blog.regehr.org/archives/1520) discusses the situation in much more detail. +For example, for the effective type restriction (also sometimes called "strict aliasing" or "type-based alias analysis") we discussed above, the mitigation -- the way to check or otherwise make sure your programs are not affected -- is to turn off optimizations that rely on this. +That is not very satisfying. ## Undefined Behavior in Rust @@ -74,9 +78,10 @@ This is what the unsafe code guidelines strike team set out to do. We could of course just copy what C does, but I hope I convinced you that this is not a great solution. When defining UB for Rust, I hope we can do better than C. I think we should strive for programmers' intuition agreeing with the standard and the compiler on what the rules are. -It turns out that is [not the case for C](https://www.cl.cam.ac.uk/~pes20/cerberus/notes50-survey-discussion.html), which leads to miscompilations and sometimes to security [vulerabilities](https://lwn.net/Articles/342330/). +If that means we have to be a little more conservative around our optimizations, that seems to be a prize worth paying for more confidence in the compiled program. -I also think that tooling to *detect* UB is of paramount importance, and that the specification should be written in a way that such tooling is feasible. +I also think that tooling to *detect* UB is of paramount importance, and can help shape intuition and maybe even permit us to be less conservative. +To this end, the specification should be written in a way that such tooling is feasible. In fact, specifying a dynamic UB checker is a very good way to specify UB! Such a specification would describe the additional state that is needed at run-time to then *check* at every operation whether we are running into UB. It is with such considerations in my mind that I have previously written about [miri as an executable specification]({{ site.baseurl }}{% post_url 2017-06-06-MIR-semantics %}).