X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/df1b9dd71b82cc1a61310c41eb84fecc8f8f3d64..ee2006a2b0f250c65c4de9b4f10a5a78e2bd22e0:/ralf/_posts/2019-07-14-uninit.md diff --git a/ralf/_posts/2019-07-14-uninit.md b/ralf/_posts/2019-07-14-uninit.md index c5043ff..1847e5e 100644 --- a/ralf/_posts/2019-07-14-uninit.md +++ b/ralf/_posts/2019-07-14-uninit.md @@ -54,7 +54,7 @@ However, if you [run the example](https://play.rust-lang.org/?version=stable&mod ## What *is* uninitialized memory? How is this possible? -The answer is that every byte in memory cannot just have a value in `0..256` (this is Rust/Ruby syntax for a left-inclusive right-exclusive range), it can also be "uninitialized". +The answer is that, in the "abstract machine" that is used to specify the behavior of our program, every byte in memory cannot just have a value in `0..256` (this is Rust/Ruby syntax for a left-inclusive right-exclusive range), it can also be "uninitialized". Memory *remembers* if you initialized it. The `x` that is passed to `always_return_true` is *not* the 8-bit representation of some number, it is an uninitialized byte. Performing operations such as comparison on uninitialized bytes is undefined behavior. @@ -65,11 +65,13 @@ Compilers don't just want to annoy programmers. Ruling out operations such as comparison on uninitialized data is useful, because it means the compiler does not have to "remember" which exact bit pattern an uninitialized variable has! A well-behaved (UB-free) program cannot observe that bit pattern anyway. So each time an uninitialized variable gets used, we can just use *any* machine register---and for different uses, those can be different registers! +[This LLVM document](http://nondot.org/sabre/LLVMNotes/UndefinedValue.txt) gives some more motivation for "unstable" uninitialized memory. So, one time we "look" at `x` it can be at least 150, and then when we look at it again it is at most 120, even though `x` did not change. `x` was just uninitialized all the time. That explains why our compiled example program behaves the way it does. -When thinking about Rust (or C, or C++), you have to imagine that every byte in memory is either initialized to some value in `0..256`, or *uninitialized*. +When thinking about Rust (or C, or C++), you have to think in terms of an "abstract machine", not the real hardware you are using. +Imagine that every byte in memory is either initialized to some value in `0..256`, or *uninitialized*. You can think of memory as storing an `Option` at every location.[^pointers] When new memory gets allocated for a local variable (on the stack) or on the heap, there is actually nothing random happening, everything is completely deterministic: every single byte of this memory is marked as *uninitialized*. Every location stores a `None`. @@ -77,7 +79,7 @@ Every location stores a `None`. When writing safe Rust, you do not have to worry about this, but this is the model you should have in your head when dealing with uninitialized memory in unsafe code. Alexis wrote a [great post](https://gankro.github.io/blah/initialize-me-maybe/) on which APIs to use for that in Rust; there is no need for me to repeat all that here. -(In that post, Alexis says that every *bit* can be either 0, 1 or uninitialized, as opposed to every *byte* being initialized or not. Given that memory accesses happen at byte granularity, these two models are actually equivalent.) +(In that post, Alexis says that every *bit* can be either 0, 1 or uninitialized, as opposed to every *byte* being initialized or not. Given that memory accesses happen at byte granularity, these two models are actually equivalent, at least in Rust which does not have C-style bitfields.) [^pointers]: In fact, [bytes are even more complicated than that]({% post_url 2018-07-24-pointers-and-bytes %}), but that is another topic.