X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/25cdafc51c73f06f32f5ef5c84fa01fe98c63de4..b1fb7e6b7275d6c6341ecec5aa372f246e863f97:/personal/_posts/2024-08-14-places.md?ds=inline diff --git a/personal/_posts/2024-08-14-places.md b/personal/_posts/2024-08-14-places.md index 107a169..4b93ee5 100644 --- a/personal/_posts/2024-08-14-places.md +++ b/personal/_posts/2024-08-14-places.md @@ -11,6 +11,7 @@ However, when it comes to unsafe code, a proper understanding of this dichotomy Consider the following [example](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=9a8802d20da16d6569510124c5827794): ```rust +// As a "packed" struct, this type has alignment 1. #[repr(packed)] struct MyStruct { field: i32 @@ -21,10 +22,12 @@ let ptr = &raw const x.field; // This line is fine. let ptr_copy = &raw const *ptr; // But this line has UB! +// `ptr` is a pointer to `i32` and thus requires 4-byte alignment on +// memory accesses, but `x` is just 1-aligned. let val = *ptr; ``` -Here I am using the unstable but soon-to-be-stabilized "raw borrow" operator, `&raw const`. +Here I am using the unstable but [soon-to-be-stabilized](https://github.com/rust-lang/rust/pull/127679) "raw borrow" operator, `&raw const`. You may know it in its stable form as a macro, `ptr::addr_of!`, but the `&` syntax makes the interplay of places and values more explicit so we will use it here. The last line has Undefined Behavior (UB) because `ptr` points to a field of a packed struct, which is not sufficiently aligned. @@ -35,6 +38,9 @@ That is the topic of this post. +(You might have already encountered the distinction of place expressions and value expressions in C and C++, where they are called lvalue expressions and rvalue expressions, respectively. +While the basic syntactic concept is the same as in Rust, the exact cases that are UB are different, so we will focus entirely on Rust here.) + ### Making the implicit explicit The main reason why this dichotomy of place expressions and value expressions is so elusive is that it is entirely implicit. @@ -68,7 +74,7 @@ However, the expression `my_var` (referencing a local variable), according to th This is because `my_var` actually denotes a place in memory, and there's multiple things one can do with a place: one can load the contents of the place from memory (which produces a value), one can create a pointer to the place (which also produces a value, but does not access memory at all), or one can store a value into this place (which in Rust produces the `()` value, but the side-effect of changing the contents of memory is more relevant). -Besides local variable, the other main example of a place expression is the result of the `*` operator, which takes a *value* (of pointer type) and turns it into a place. +Besides local variables, the other main example of a place expression is the result of the `*` operator, which takes a *value* (of pointer type) and turns it into a place. Furthermore, given a place of struct type, we can use a field projection to obtain a place just for that field. This may sound odd, because it means that `let new_var = my_var;` is not actually a valid statement in our grammar! @@ -120,10 +126,16 @@ let _ = *ptr; // This is fine! let _val = *ptr; // This is UB. ``` -The reason for this is that the `_` pattern does *not* incur a place-to-value coercion. +Note that the grammar above cannot represent this program: in the full grammar of Rust, the `let` syntax is something like "`let` _Pattern_ `=` _PlaceExpr_ `;`", +and then pattern desugaring decides what to do with that place expression. +If the pattern is a binder (the common case), a `load` gets inserted to compute the initial value for the local variable that this binder refers to. +However, if the pattern is `_`, then the place expression still gets evaluated---but the result of that evaluation is simply discarded. +MIR uses a `PlaceMention` statement to indicate these semantics. + +In particular, this means that the `_` pattern does *not* incur a place-to-value coercion! The desugared form of the relevant part of this code is: ```rust -let _ = *(load ptr); // This is fine! +PlaceMention(*(load ptr)); // This is fine! let _val = load *(load ptr); // This is UB. ``` As you can see, the first line does not actually load from the pointer (the only `load` is there to load the pointer itself from the local variable that stores it). @@ -139,6 +151,17 @@ match *ptr { _val => "not happy" } // This is UB. The scrutinee of a `match` expression is a place expression, and if the pattern is `_` then a value is never constructed. However, when an actual binder is present, this introduces a local variable and a place-to-value coercion is inserted to compute the value that will be stored in that local variable. +**Note on `unsafe` blocks.** +Note that wrapping an expression in a block forces it to be a value expression. +This means that `unsafe { *ptr }` always loads from the pointer! +In other words: +```rust +let ptr = std::ptr::null::(); +let _ = *ptr; // This is fine! +let _ = unsafe { *ptr }; // This is UB. +``` +The fact that braces force a value expression can occasionally be useful, but the fact that `unsafe` blocks do that is definitely quite unfortunate. + ### Are there also value-to-place coercions? So far, we have discussed what happens when a place expression is encountered in a spot where a value expression was expected.