From 7f79b2443f56a844eb41561dbed9fa3991a5b5b7 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 15 Jul 2020 12:17:12 +0200 Subject: [PATCH 1/1] add rust validity post draft --- ralf/_drafts/unused-data.md | 61 +++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 ralf/_drafts/unused-data.md diff --git a/ralf/_drafts/unused-data.md b/ralf/_drafts/unused-data.md new file mode 100644 index 0000000..2c06b68 --- /dev/null +++ b/ralf/_drafts/unused-data.md @@ -0,0 +1,61 @@ +--- +title: "Why even unused data needs to be valid" +categories: rust +--- + +`unsafe` code in Rust has a few ways that it can trigger [Undefined Behavior][ub], i.e., there are a few assumptions that the compiler makes about all code, and that for `unsafe` code the programmer is responsible for upholding. +Those assumptions are [listed in the Rust reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html). +The one that seems to be most surprising to many people is the clause which says that unsafe code may not produce "[...] an invalid value, even in private fields and locals". +The reference goes on to explain that "*producing* a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation". +In other words, even just *constructing*, for example, an invalid `bool`, is Undefined Behavior---no matter whether that `bool` is ever actually "used" by the program. +The purpose of this post is to explain why. + +[ub]: https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#undefined-behavior + + + +First of all, let me clarify what is meant by "used" here, as that term is used to mean very different things. +The following code "uses" `b`: + +{% highlight rust %} +fn example(b: bool) -> i32 { + if b { 42 } else { 23 } +} +{% endhighlight %} + +I hope it is not very surprising that calling `example` on, e.g., `3` transmuted to `bool` is Undefined Behavior (UB). +When compiling `if`, the compiler assumes that `0` and `1` are the only possible values; there is no saying what could go wrong when that assumption is violated. + +What is less obvious is why calling `example` on `3` is UB even when there is no such `if` being executed. +To understand why that is important, let us consider the following example: + +{% highlight rust %} +fn example(b: bool, num: u32) -> i32 { + let mut acc = 0; + for _i in 0..num { + acc += if b { 42 } else { 23 }; + } + acc +} +{% endhighlight %} + +Now assume we were working in a slightly different version of Rust, where transmuting `3` to a `bool` is fine as long as you do not "use" the `bool`. +That would mean that calling `example(transmute(3u8), 0)` is actually allowed, because in that case the loop never gets executed, so we never "use" `b`. + +However, this is a problem for a very important transformation called [loop-invariant code motion](https://en.wikipedia.org/wiki/Loop-invariant_code_motion). +That transformation can be used to turn our `example` function into the following: + +{% highlight rust %} +fn example(b: bool, num: u32) -> i32 { + let mut acc = 0; + let incr = if b { 42 } else { 23 } + for _i in 0..num { + acc += incr; + } + acc +} +{% endhighlight %} + +The increment `if b { 42 } else { 23 }` is "invariant" during the execution of the loop, and thus computing the increment can be moved out. +Why is this a good transformation? +Instead of determining the increment each time around the loop, we do that just once, thus saving a lot of conditional jumps that the CPU is unhappy about. -- 2.39.5