forum: https://internals.rust-lang.org/t/a-formal-look-at-pinning/7236
---
-Recently, a new API for "pinned references" has [landed as a new unstable feature](https://github.com/rust-lang/rust/pull/49058) in the standard library.
+Recently, a new [API for "pinned references"](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md) has [landed as a new unstable feature](https://github.com/rust-lang/rust/pull/49058) in the standard library.
The purpose of these references is to express that the data at the memory it points to will not, ever, be moved elsewhere.
-Others have [written](https://boats.gitlab.io/blog/post/2018-03-20-async-vi/) about why this is important in the context of async IO.
-The purpose of this post is to take a closer, more formal look at that API: We are going to take a stab at extending the RustBelt model of types with support for pinning.
+@withoutboats has written about how this [solves](https://boats.gitlab.io/blog/post/2018-03-20-async-vi/) a [problem in the context of async IO](https://boats.gitlab.io/blog/post/2018-01-25-async-i-self-referential-structs/).
+In this post, we take a closer, more formal look at that API: We are going to take a stab at extending the RustBelt model of types with support for pinning.
<!-- MORE -->
struct SelfReferential {
data: i32,
- self_ref: Option<ptr::NonNull<i32>>,
+ self_ref: *const i32,
}
impl SelfReferential {
- fn new() -> Self {
- SelfReferential { data: 42, self_ref: None }
+ fn new() -> SelfReferential {
+ SelfReferential { data: 42, self_ref: ptr::null() }
}
- fn init(mut self: Pin<Self>) {
- unsafe {
- let this = Pin::get_mut(&mut self);
- // Set up self_ref to point to this.data
- this.self_ref = ptr::NonNull::new(&mut this.data as *mut _);
- }
+ fn init(mut self: Pin<SelfReferential>) {
+ let this : &mut SelfReferential = unsafe { Pin::get_mut(&mut self) };
+ // Set up self_ref to point to this.data.
+ this.self_ref = &mut this.data as *const i32;
}
-
- fn read_ref(mut self: Pin<Self>) -> Option<i32> {
- unsafe {
- let this = Pin::get_mut(&mut self);
- // Dereference self_ref if it is set
- this.self_ref.map(|self_ref| *self_ref.as_ptr())
+
+ fn read_ref(mut self: Pin<SelfReferential>) -> Option<i32> {
+ let this : &mut SelfReferential = unsafe { Pin::get_mut(&mut self) };
+ // Dereference self_ref if it is non-NULL.
+ if this.self_ref == ptr::null() {
+ None
+ } else {
+ Some(unsafe { *this.self_ref })
}
}
}
println!("{:?}", s.as_pin().read_ref()); // prints Some(42)
}
{% endhighlight %}
-The most intersting piece of code here is `read_ref`, which dereferences a raw pointer.
-The reason this is legal is that we can rely on the entire `SelfReferential` not having been moved since `init()` was called (which is the only place that would set the pointer to `Some`).
+**Update:** Previously, the example code used `Option<ptr::NonNull<i32>>`. I think using raw pointers directly makes the code easier to follow. **/Update**
-In particular, if we changed the signature of `init()` to take `&mut`, we could easily cause UB by writing the following code:
+The most intersting piece of code here is `read_ref`, which dereferences a raw pointer, `this.self_ref`.
+The reason this is legal is that we can rely on the entire `SelfReferential` not having been moved since `init()` was called (which is the only place that would set the pointer to something non-NULL).
+
+In particular, if we changed the signature to `fn init(&mut self)`, we could easily cause UB by writing the following code:
{% highlight rust %}
fn main() {
// Create an initialize a SelfReferential in a Box, move it out, and drop the Box
For example, the axiom (a) stated above would look as follows:
```
forall |'a, ptr|
- borrowed('a, exists |bytes| mem_own(ptr, bytes) && T.own(bytes))
+ borrowed('a, exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes))
-> T.shr('a, ptr)
```
-I am using the usual mathematical quantifiers, with a Rust-y syntax (`forall |var| P` and `exists |var| P`), and `->` for implication.
+This is a formal statement that we have to prove whenever we define `T.own` and `T.shr` for our own type `T`.
+It says that for all lifetimes `'a` and pointers `ptr`, if `borrowed('a, ...)` holds, then `T.shr('a, ptr)` holds.
+I am using the usual mathematical quantifiers, with a Rust-y syntax (`forall |var| ...` and `exists |var| ...`), and `->` for implication.
For disjunction (`||`) and conjunction (`&&`), I follow Rust.
-We also need some way to talk about ownership and contents of memory:
-I will write `mem_own_bytes(ptr, bytes)` (where `ptr: Pointer` and `bytes: List<Byte>`) to express that we own `bytes.len()` many bytes of memory pointed to by `ptr`.
-Ownership here means that we are free to read, write and deallocate that memory.
+Because Rust types are a lot about pointers and what they point to, we also need some way to talk about memory:
+`ptr.points_to_owned(bytes)` (where `ptr: Pointer` and `bytes: List<Byte>`) means that `ptr` points to `bytes.len()` many bytes of memory containing the given `bytes` of data, and that moreover, we *own* that memory.
+*Ownership* here means that we are free to read, write and deallocate that memory, and that we are the only party that can do so.
-Frequently, it is more convenient to not talk about raw lists of bytes but about some higher-level representation of the bytes, and not care about how exactly that data is laid out in memory.
-For example, we might want to say that `ptr` points to `42` of type `i32` by saying `mem_own(ptr, 42i32)`.
-We can define a suitable `mem_own` as
-```
-mem_own<U>(ptr: Pointer, data: U) where List<Bytes>: TryInto<U> :=
- exists |bytes| bytes.try_into() == Ok(data) && mem_own(ptr, bytes)
-```
-Here, we (ab)use the `TryInto` trait to convert a properly laid out list of bytes into something of type `U`.
-The conversion fails, in particular, if the list does not have exactly the right length.
-If `U == List<Bytes>`, this is just the same as `mem_own_bytes`.
+**Update:** In a previous version I called this predicate `mem_own`. I decided that `points_to_owned` was actually easier to understand, and it also matches the standard terminology in the literature, so I switched to using that term instead. **/Update**
Finally, `borrowed('a, P)` says that `P` is only temporarily owned, i.e., borrowed, for lifetime `'a`.
-`P` here is a *proposition* or *assertion*, i.e., a statement about what we expect to own. The axiom above is a proposition, as is `mem_own(ptr, 42i32)`.
-You can think of propositions as a fancy version of `bool` where we can use things like quantifiers or borrowing, and talk about memory and ownership.
+`P` here is a *proposition* or *assertion*, i.e., a statement about what we expect to own. The axiom above is a proposition, as is `ptr.points_to_owned([0, 0])`.
+You can think of propositions as a fancy version of `bool` where we can use things like quantifiers and borrowing, and talk about memory and ownership.
Let us see how we can define the owned typestate of `Box` and mutable reference using this notation:
`&'a mut T` is almost the same, except that memory and `T.own` are only borrowed for lifetime `'a`.
Formally:
```
-Box<T>.own(bytes) := exists |ptr, inner_bytes|
- bytes.try_into() == Ok(ptr) && mem_own(ptr, inner_bytes) && T.own(inner_bytes)
+Box<T>.own(bytes) := exists |ptr, inner_bytes| bytes.try_into() == Ok(ptr) &&
+ ptr.points_to_owned(inner_bytes) && T.own(inner_bytes)
+```
+```
(&'a mut T).own(bytes) := exists |ptr| bytes.try_into() == Ok(ptr) &&
- borrowed('a, exists |inner_bytes| mem_own(ptr, inner_bytes) && T.own(inner_bytes))
+ borrowed('a,
+ exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes))
```
-It turns out that using `try_into` like we do above is actually a common pattern.
-We usually do not want to talk directly about the `List<Byte>` but convert them to some higher-level representation first.
+The `:=` means "is defined as"; this is a lot like a function definition in Rust where the part after the `:=` is the function body.
+Notice how we use `try_into` to try to convert a sequence of bytes into a higher-level representation, namely a pointer.
+This relies in `TryInto<U>` being implemented for `List<Byte>`.
+The conversion fails, in particular, if the list of bytes does not have exactly the right length.
+
+It turns out that using `try_into` like we do above is actually a common pattern:
+Often, when we define a predicate on bytes, we do not want to talk directly about the `List<Byte>` but convert them to some higher-level representation first.
To facilitate this, we will typically define `T.own(data: U)` for some `U` such that `List<Byte>: TryInto<U>`, the idea being that the raw list of bytes is first converted to a `U` and the predicate can then directly access the higher-level data in `U`.
This could look as follows:
```
Box<T>.own(ptr: Pointer) :=
- exists |inner_bytes| mem_own(ptr, inner_bytes) && T.own(inner_bytes)
+ exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes)
+```
+```
(&'a mut T).own(ptr: Pointer) :=
- borrowed('a, exists |inner_bytes| mem_own(ptr, inner_bytes) && T.own(inner_bytes))
+ borrowed('a, exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes))
```
The actual ownership predicate is then defined as
```
## Extending Types to Verify `SelfReferential`
-What would it take to *prove* that `SelfReferential` can be used by arbitrary safe code?
+Coming back to our example, what would it take to *prove* that `SelfReferential` can be used by arbitrary safe code?
We have to start by writing down the private invariants (for all typestates) of the type.
-We want to say that if `self.read_ref` it set to `Some(data_ptr)`, then `data_ptr` is the address of `self.data`.
+We want to say that if `self.read_ref` is not NULL, then it is the address of `self.data`.
However, if we go back to our notion of Rust types that I laid out at the beginning of this post, we notice that it is *impossible* to refer to the "address of `self.data`" in `T.own`!
And that's not even surprising; this just reflects the fact that in Rust, if we own a type, we can always move it to a different location---and hence the invariant must not depend on the location.
Notice that this state talks about a *pointer* being valid, in contrast to `T.own` which talks about a *sequence of bytes*.
This gives us the expressivity we need to talk about immovable data:
-`SelfReferential.pin(ptr)` says that `ptr` points to some memory we own, and that memory stores some pair `(data, self_ref)`.
-(In terms of memory layout, `SelfReferential` is the same as a pair of `i32` and `Option<ptr::NonNull<i32>>`.)
-Moreover, if `self_ref` is set to `Some(data_ptr)`, then `data_ptr` is the address of the first field of the pair:
+`SelfReferential.pin(ptr)` says that `ptr` points to some memory we own, and that memory stores some pair `(data, self_ref)`, and `self_ref` is either NULL or the address of the first field, `data`, at offset `0`:
```
-SelfReferential.pin(ptr) := exists |data: i32, self_ref: Option<ptr::NonNull<i32>>|
- mem_own(ptr, (data, self_ref)) &&
- match self_ref { Some(data_ptr) => data_ptr.as_ptr() == ptr.offset(0), None => True }
+SelfReferential.pin(ptr) := exists |data: i32, self_ref: *const i32|
+ ptr.points_to_owned((data, self_ref)) &&
+ (self_ref == ptr::null() || self_ref == ptr.offset(0))
```
-The most important part of this is the last line, saying that if `data_ptr` is a `Some`, it actually points to the first field (at offset `0`).
-(I am of course glossing over plenty of details here, like handling of padding, but those details are not relevant right now.
+(In terms of memory layout, `SelfReferential` is the same as a pair of `i32` and `*const i32`.
+I am of course glossing over plenty of details here, but those details are not relevant right now.
Moreover, `SelfReferential` also has an owned and a shared typestate, but nothing interesting happens there.)
With this choice, it is easy to justify that `read_ref` is safe to execute: When the function begins, we can rely on `SelfReferential.pin(self)`.
-If the closure in `self_ref.map` runs, we are in the `Some` case of the `match` so the deref of the pointer obtained from `self_ref` is fine.
+If we enter the `else` branch, we know `self_ref` is not NULL, hence it must be `ptr.offset(0)`.
+As a consequence, the deref of `self_ref` is fine.
+
+Before we go on, I have to explain what I did with `points_to_owned` above.
+Before I said that this predicate operates on `List<Byte>`, but now I am using it on a pair of an `i32` and a raw pointer.
+Again this is an instance of using a higher-level view of memory than a raw list of bytes.
+For example, we might want to say that `ptr` points to `42` of type `i32` by saying `ptr.points_to_owned(42i32)`, without worrying about how to turn that value into a sequence of bytes.
+It turns out that we can define `points_to_owned` in terms of a lower-level `points_to_owned_bytes` that operates on `List<Byte>` as follows:
+```
+ptr.points_to_owned(data: U) where List<Bytes>: TryInto<U> :=
+ exists |bytes| bytes.try_into() == Ok(data) && ptr.points_to_owned_bytes(bytes)
+```
+Just like before, we (ab)use the `TryInto` trait to convert a properly laid out list of bytes into something of type `U`.
## Verifying the Pin API
For `Pin<'a, T>`, `ptr` is valid if we have borrowed `T.pin(ptr)` for lifetime `'a`.
```
PinBox<T>.own(ptr) := T.pin(ptr)
+```
+```
Pin<'a, T>.own(ptr) := borrowed('a, T.pin(ptr))
```
> {:type="a" start="2"}
2. If we own a pointer `ptr` pointing to a list of bytes `bytes` such that `T.own(bytes)` holds, then we can move to the pinned typestate and obtain `T.pin(ptr)`. Formally:
```
-forall |ptr, bytes| (mem_own(ptr, bytes) && T.own(bytes)) -> T.pin(ptr)
+forall |ptr, bytes| (ptr.points_to_owned(bytes) && T.own(bytes)) -> T.pin(ptr)
```
`PinBox::new` can now be easily implemented using `Box::new` and `PinBox<T>::from`.
For `PinBox::as_pin`, we start with a `&'a mut PinBox<T>`, which is a pointer-to-pointer, and return the inner pointer as `Pin<'a, T>`.
This is justified because we start with a borrow for lifetime `'a` of a pointer to some `bytes` that satisfy (for the given lifetime) `PinBox<T>.own(bytes)`:
```
-borrowed('a, exists |bytes| mem_own(ptr, bytes) && PinBox<T>.own(bytes))
+borrowed('a, exists |bytes| ptr.points_to_owned(bytes) && PinBox<T>.own(bytes))
```
Unfolding the `PinBox<T>.own`, we can deduce that `bytes` actually form a pointer `inner_ptr` such that `T.pin(inner_ptr)` (for the given lifetime).
This exactly matches our return type, so we're good!
In other words, a shared `PinBox<T>` is just a read-only pointer to a shared `T`:
```
PinBox<T>.shr('a, ptr) := exists |inner_ptr|
- mem_read_only('a, ptr, inner_ptr) && T.shr('a, inner_ptr)
+ ptr.points_to_read_only('a, inner_ptr) && T.shr('a, inner_ptr)
```
This requires a way to talk about memory that is read-only for the duration of some lifetime.
-We assume we have a predicate `mem_read_only_bytes('a: Lifetime, ptr: Pointer, bytes: List<Byte>)` for this purpose, and then define a typed variant as usual:
+We assume we have a predicate `ptr.points_to_read_only_bytes('a: Lifetime, bytes: List<Byte>)` saying that `ptr` points to `bytes.len()` many bytes of valid memory containing `bytes`, and that for lifetime `'a`, we are free to read that memory and it will not change.
+We then define a convenient variant based on higher-level memory representations as usual:
```
-mem_read_only<U>('a: Lifetime, ptr: Pointer, data: U) where List<Bytes>: TryInto<U> :=
- exists |bytes| bytes.try_into() == Ok(data) && mem_read_only_bytes('a, ptr, bytes)
+ptr.points_to_read_only('a: Lifetime, data: U) where List<Bytes>: TryInto<U> :=
+ exists |bytes| bytes.try_into() == Ok(data) && ptr.points_to_read_only_bytes('a, bytes)
```
Remember that there is an axiom (a) connecting the owned and shared typestate; we have to make sure that this axiom is satisfied for `PinBox`---it turns out that is the case, and the proof relies on the new axiom (c) we just added.
> **Definition 6: `Unpin`.** A type `T` is `Unpin` if from `T.pin(ptr)` we can deduce that we own the pointer `ptr` and it points to a list of bytes `bytes` such that `T.own(bytes)` holds.
Formally:
```
-forall |ptr| T.pin(ptr) -> (exists |bytes| mem_own(ptr, bytes) && T.own(bytes))
+forall |ptr| T.pin(ptr) -> (exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes))
```
Note that this is exactly the inverse direction of axiom (b) added in definition 2b: For `Unpin` types, we can freely move between the owned and pinned typestate.
Existing types (and new types) that are designed without considering `Pin` remain sound in the presence of this new typestate, even if we automatically derive the `Unpin` trait for these types.
The reason this works is that we can always define `T.pin(ptr)` as saying that we fully own `ptr` pointing to `bytes` such that we have `T.own(bytes)`:
```
-T.pin(ptr) := exists |bytes| mem_own(ptr, bytes) && T.own(bytes)
+T.pin(ptr) := exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes)
```
This satisfies the additional axioms (b) and (c) connecting the pinned typestate to the others, and it also satisfies `Unpin`.
The latter is crucial, because it means we can automatically derive `Unpin` instances through the auto trait mechanism and do not have to review the entire ecosystem for `Pin`-compatibility.
## Conclusion
-We have seen how the new `Pin` type can be used to give safe APIs to types like `SelfReferential`, and how we can (semi-)formally argue for the correctness of `SelfReferential` and the methods on `Pin` and `PinBox`.
+We have seen how the new `Pin` type can be used to give safe APIs to types like `SelfReferential` (which, previously, was not possible), and how we can (semi-)formally argue for the correctness of `SelfReferential` and the methods on `Pin` and `PinBox`.
I hope I was able to shed some light both on how pinning is useful, and how we can reason about safety of a typed API in general.
Next time, we are going to look at an extension to the pinning API proposed by @cramertj which guarantees that `drop` will be called under some circumstances, and how that is useful for intrusive collections.