X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/c79cd1578b237a9347c413fa53a39c5ae4372a03..777cc00cc42cae2a2974e8ad34fc8d61d447d757:/personal/_posts/2018-04-05-a-formal-look-at-pinning.md diff --git a/personal/_posts/2018-04-05-a-formal-look-at-pinning.md b/personal/_posts/2018-04-05-a-formal-look-at-pinning.md index 989dae5..0093a21 100644 --- a/personal/_posts/2018-04-05-a-formal-look-at-pinning.md +++ b/personal/_posts/2018-04-05-a-formal-look-at-pinning.md @@ -55,27 +55,27 @@ use std::boxed::PinBox; struct SelfReferential { data: i32, - self_ref: Option>, + self_ref: *const i32, } impl SelfReferential { - fn new() -> Self { - SelfReferential { data: 42, self_ref: None } + fn new() -> SelfReferential { + SelfReferential { data: 42, self_ref: ptr::null() } } - fn init(mut self: Pin) { - unsafe { - let this = Pin::get_mut(&mut self); - // Set up self_ref to point to this.data - this.self_ref = ptr::NonNull::new(&mut this.data as *mut _); - } + fn init(mut self: Pin) { + let this : &mut SelfReferential = unsafe { Pin::get_mut(&mut self) }; + // Set up self_ref to point to this.data. + this.self_ref = &mut this.data as *const i32; } - - fn read_ref(mut self: Pin) -> Option { - unsafe { - let this = Pin::get_mut(&mut self); - // Dereference self_ref if it is set - this.self_ref.map(|self_ref| *self_ref.as_ptr()) + + fn read_ref(mut self: Pin) -> Option { + let this : &mut SelfReferential = unsafe { Pin::get_mut(&mut self) }; + // Dereference self_ref if it is non-NULL. + if this.self_ref == ptr::null() { + None + } else { + Some(unsafe { *this.self_ref }) } } } @@ -86,8 +86,10 @@ fn main() { println!("{:?}", s.as_pin().read_ref()); // prints Some(42) } {% endhighlight %} -The most intersting piece of code here is `read_ref`, which dereferences a raw pointer. -The reason this is legal is that we can rely on the entire `SelfReferential` not having been moved since `init()` was called (which is the only place that would set the pointer to `Some`). +**Update:** Previously, the example code used `Option>`. I think using raw pointers directly makes the code easier to follow. **/Update** + +The most intersting piece of code here is `read_ref`, which dereferences a raw pointer, `this.self_ref`. +The reason this is legal is that we can rely on the entire `SelfReferential` not having been moved since `init()` was called (which is the only place that would set the pointer to something non-NULL). In particular, if we changed the signature to `fn init(&mut self)`, we could easily cause UB by writing the following code: {% highlight rust %} @@ -177,9 +179,9 @@ exists |data: U| bytes.try_into() == Ok(data) && T.own(data) ## Extending Types to Verify `SelfReferential` -What would it take to *prove* that `SelfReferential` can be used by arbitrary safe code? +Coming back to our example, what would it take to *prove* that `SelfReferential` can be used by arbitrary safe code? We have to start by writing down the private invariants (for all typestates) of the type. -We want to say that if `self.read_ref` it set to `Some(data_ptr)`, then `data_ptr` is the address of `self.data`. +We want to say that if `self.read_ref` is not NULL, then it is the address of `self.data`. However, if we go back to our notion of Rust types that I laid out at the beginning of this post, we notice that it is *impossible* to refer to the "address of `self.data`" in `T.own`! And that's not even surprising; this just reflects the fact that in Rust, if we own a type, we can always move it to a different location---and hence the invariant must not depend on the location. @@ -193,23 +195,22 @@ We will add a new, *third* typestate on top of the existing owned and shared typ Notice that this state talks about a *pointer* being valid, in contrast to `T.own` which talks about a *sequence of bytes*. This gives us the expressivity we need to talk about immovable data: -`SelfReferential.pin(ptr)` says that `ptr` points to some memory we own, and that memory stores some pair `(data, self_ref)`. -(In terms of memory layout, `SelfReferential` is the same as a pair of `i32` and `Option>`.) -Moreover, if `self_ref` is set to `Some(data_ptr)`, then `data_ptr` is the address of the first field of the pair: +`SelfReferential.pin(ptr)` says that `ptr` points to some memory we own, and that memory stores some pair `(data, self_ref)`, and `self_ref` is either NULL or the address of the first field, `data`, at offset `0`: ``` -SelfReferential.pin(ptr) := exists |data: i32, self_ref: Option>| +SelfReferential.pin(ptr) := exists |data: i32, self_ref: *const i32| ptr.points_to_owned((data, self_ref)) && - match self_ref { Some(data_ptr) => data_ptr.as_ptr() == ptr.offset(0), None => True } + (self_ref == ptr::null() || self_ref == ptr.offset(0)) ``` -The most important part of this is the last line, saying that if `data_ptr` is a `Some`, it actually points to the first field (at offset `0`). -(I am of course glossing over plenty of details here, like handling of padding, but those details are not relevant right now. +(In terms of memory layout, `SelfReferential` is the same as a pair of `i32` and `*const i32`. +I am of course glossing over plenty of details here, but those details are not relevant right now. Moreover, `SelfReferential` also has an owned and a shared typestate, but nothing interesting happens there.) With this choice, it is easy to justify that `read_ref` is safe to execute: When the function begins, we can rely on `SelfReferential.pin(self)`. -If the closure in `self_ref.map` runs, we are in the `Some` case of the `match` so the deref of the pointer obtained from `self_ref` is fine. +If we enter the `else` branch, we know `self_ref` is not NULL, hence it must be `ptr.offset(0)`. +As a consequence, the deref of `self_ref` is fine. Before we go on, I have to explain what I did with `points_to_owned` above. -Before I said that this predicate operates on `List`, but now I am using it on a pair of an `i32` and an `Option`. +Before I said that this predicate operates on `List`, but now I am using it on a pair of an `i32` and a raw pointer. Again this is an instance of using a higher-level view of memory than a raw list of bytes. For example, we might want to say that `ptr` points to `42` of type `i32` by saying `ptr.points_to_owned(42i32)`, without worrying about how to turn that value into a sequence of bytes. It turns out that we can define `points_to_owned` in terms of a lower-level `points_to_owned_bytes` that operates on `List` as follows: