X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/d469ac5b4c5e04a4e05f62bd958f3268b9e4c19d..a18cd587de26cd1e796f04270becd6009a60b287:/ralf/_posts/2018-04-05-a-formal-look-at-pinning.md?ds=sidebyside diff --git a/ralf/_posts/2018-04-05-a-formal-look-at-pinning.md b/ralf/_posts/2018-04-05-a-formal-look-at-pinning.md index e3c7412..989dae5 100644 --- a/ralf/_posts/2018-04-05-a-formal-look-at-pinning.md +++ b/ralf/_posts/2018-04-05-a-formal-look-at-pinning.md @@ -4,10 +4,10 @@ categories: research rust forum: https://internals.rust-lang.org/t/a-formal-look-at-pinning/7236 --- -Recently, a new API for "pinned references" has [landed as a new unstable feature](https://github.com/rust-lang/rust/pull/49058) in the standard library. +Recently, a new [API for "pinned references"](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md) has [landed as a new unstable feature](https://github.com/rust-lang/rust/pull/49058) in the standard library. The purpose of these references is to express that the data at the memory it points to will not, ever, be moved elsewhere. -Others have [written](https://boats.gitlab.io/blog/post/2018-03-20-async-vi/) about why this is important in the context of async IO. -The purpose of this post is to take a closer, more formal look at that API: We are going to take a stab at extending the RustBelt model of types with support for pinning. +@withoutboats has written about how this [solves](https://boats.gitlab.io/blog/post/2018-03-20-async-vi/) a [problem in the context of async IO](https://boats.gitlab.io/blog/post/2018-01-25-async-i-self-referential-structs/). +In this post, we take a closer, more formal look at that API: We are going to take a stab at extending the RustBelt model of types with support for pinning. @@ -89,7 +89,7 @@ fn main() { The most intersting piece of code here is `read_ref`, which dereferences a raw pointer. The reason this is legal is that we can rely on the entire `SelfReferential` not having been moved since `init()` was called (which is the only place that would set the pointer to `Some`). -In particular, if we changed the signature of `init()` to take `&mut`, we could easily cause UB by writing the following code: +In particular, if we changed the signature to `fn init(&mut self)`, we could easily cause UB by writing the following code: {% highlight rust %} fn main() { // Create an initialize a SelfReferential in a Box, move it out, and drop the Box @@ -118,32 +118,23 @@ The full formalism we use in the paper is probably overkill, so I will go with a For example, the axiom (a) stated above would look as follows: ``` forall |'a, ptr| - borrowed('a, exists |bytes| ptr.points_to(bytes) && T.own(bytes)) + borrowed('a, exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes)) -> T.shr('a, ptr) ``` -I am using the usual mathematical quantifiers, with a Rust-y syntax (`forall |var| P` and `exists |var| P`), and `->` for implication. +This is a formal statement that we have to prove whenever we define `T.own` and `T.shr` for our own type `T`. +It says that for all lifetimes `'a` and pointers `ptr`, if `borrowed('a, ...)` holds, then `T.shr('a, ptr)` holds. +I am using the usual mathematical quantifiers, with a Rust-y syntax (`forall |var| ...` and `exists |var| ...`), and `->` for implication. For disjunction (`||`) and conjunction (`&&`), I follow Rust. -We also need some way to talk about ownership and contents of memory: -I will write `ptr.points_to_bytes(bytes)` (where `ptr: Pointer` and `bytes: List`) to express that `ptr` points to `bytes.len()` many bytes of memory containing the given `bytes` of data, and that moreover, we *own* that memory. -Ownership here means that we are free to read, write and deallocate that memory. +Because Rust types are a lot about pointers and what they point to, we also need some way to talk about memory: +`ptr.points_to_owned(bytes)` (where `ptr: Pointer` and `bytes: List`) means that `ptr` points to `bytes.len()` many bytes of memory containing the given `bytes` of data, and that moreover, we *own* that memory. +*Ownership* here means that we are free to read, write and deallocate that memory, and that we are the only party that can do so. -Frequently, it is more convenient to not talk about raw lists of bytes but about some higher-level representation of the bytes, and not care about how exactly that data is laid out in memory. -For example, we might want to say that `ptr` points to `42` of type `i32` by saying `ptr.points_to(42i32)`. -We can define a suitable `points_to` as -``` -ptr.points_to(data: U) where List: TryInto := - exists |bytes| bytes.try_into() == Ok(data) && ptr.points_to_bytes(bytes) -``` -Here, we (ab)use the `TryInto` trait to convert a properly laid out list of bytes into something of type `U`. -The conversion fails, in particular, if the list does not have exactly the right length. -If `U == List`, this is just the same as `ptr.points_to_bytes`. - -**Update:** In a previous version I called these predicates `mem_own_bytes` and `mem_own`. I decided that `points_to` was actually easier to understand, and it also matches the standard terminology in the literature, so I switched to using that term instead. **/Update** +**Update:** In a previous version I called this predicate `mem_own`. I decided that `points_to_owned` was actually easier to understand, and it also matches the standard terminology in the literature, so I switched to using that term instead. **/Update** Finally, `borrowed('a, P)` says that `P` is only temporarily owned, i.e., borrowed, for lifetime `'a`. -`P` here is a *proposition* or *assertion*, i.e., a statement about what we expect to own. The axiom above is a proposition, as is `ptr.points_to(42i32)`. -You can think of propositions as a fancy version of `bool` where we can use things like quantifiers or borrowing, and talk about memory and ownership. +`P` here is a *proposition* or *assertion*, i.e., a statement about what we expect to own. The axiom above is a proposition, as is `ptr.points_to_owned([0, 0])`. +You can think of propositions as a fancy version of `bool` where we can use things like quantifiers and borrowing, and talk about memory and ownership. Let us see how we can define the owned typestate of `Box` and mutable reference using this notation: @@ -152,21 +143,31 @@ A `Box` is a list of bytes that make up a pointer, such that we own the memor `&'a mut T` is almost the same, except that memory and `T.own` are only borrowed for lifetime `'a`. Formally: ``` -Box.own(bytes) := exists |ptr, inner_bytes| - bytes.try_into() == Ok(ptr) && ptr.points_to(inner_bytes) && T.own(inner_bytes) +Box.own(bytes) := exists |ptr, inner_bytes| bytes.try_into() == Ok(ptr) && + ptr.points_to_owned(inner_bytes) && T.own(inner_bytes) +``` +``` (&'a mut T).own(bytes) := exists |ptr| bytes.try_into() == Ok(ptr) && - borrowed('a, exists |inner_bytes| ptr.points_to(inner_bytes) && T.own(inner_bytes)) + borrowed('a, + exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes)) ``` -It turns out that using `try_into` like we do above is actually a common pattern. -We usually do not want to talk directly about the `List` but convert them to some higher-level representation first. +The `:=` means "is defined as"; this is a lot like a function definition in Rust where the part after the `:=` is the function body. +Notice how we use `try_into` to try to convert a sequence of bytes into a higher-level representation, namely a pointer. +This relies in `TryInto` being implemented for `List`. +The conversion fails, in particular, if the list of bytes does not have exactly the right length. + +It turns out that using `try_into` like we do above is actually a common pattern: +Often, when we define a predicate on bytes, we do not want to talk directly about the `List` but convert them to some higher-level representation first. To facilitate this, we will typically define `T.own(data: U)` for some `U` such that `List: TryInto`, the idea being that the raw list of bytes is first converted to a `U` and the predicate can then directly access the higher-level data in `U`. This could look as follows: ``` Box.own(ptr: Pointer) := - exists |inner_bytes| ptr.points_to(inner_bytes) && T.own(inner_bytes) + exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes) +``` +``` (&'a mut T).own(ptr: Pointer) := - borrowed('a, exists |inner_bytes| ptr.points_to(inner_bytes) && T.own(inner_bytes)) + borrowed('a, exists |inner_bytes| ptr.points_to_owned(inner_bytes) && T.own(inner_bytes)) ``` The actual ownership predicate is then defined as ``` @@ -197,7 +198,7 @@ This gives us the expressivity we need to talk about immovable data: Moreover, if `self_ref` is set to `Some(data_ptr)`, then `data_ptr` is the address of the first field of the pair: ``` SelfReferential.pin(ptr) := exists |data: i32, self_ref: Option>| - ptr.points_to((data, self_ref)) && + ptr.points_to_owned((data, self_ref)) && match self_ref { Some(data_ptr) => data_ptr.as_ptr() == ptr.offset(0), None => True } ``` The most important part of this is the last line, saying that if `data_ptr` is a `Some`, it actually points to the first field (at offset `0`). @@ -207,6 +208,17 @@ Moreover, `SelfReferential` also has an owned and a shared typestate, but nothin With this choice, it is easy to justify that `read_ref` is safe to execute: When the function begins, we can rely on `SelfReferential.pin(self)`. If the closure in `self_ref.map` runs, we are in the `Some` case of the `match` so the deref of the pointer obtained from `self_ref` is fine. +Before we go on, I have to explain what I did with `points_to_owned` above. +Before I said that this predicate operates on `List`, but now I am using it on a pair of an `i32` and an `Option`. +Again this is an instance of using a higher-level view of memory than a raw list of bytes. +For example, we might want to say that `ptr` points to `42` of type `i32` by saying `ptr.points_to_owned(42i32)`, without worrying about how to turn that value into a sequence of bytes. +It turns out that we can define `points_to_owned` in terms of a lower-level `points_to_owned_bytes` that operates on `List` as follows: +``` +ptr.points_to_owned(data: U) where List: TryInto := + exists |bytes| bytes.try_into() == Ok(data) && ptr.points_to_owned_bytes(bytes) +``` +Just like before, we (ab)use the `TryInto` trait to convert a properly laid out list of bytes into something of type `U`. + ## Verifying the Pin API With our notion of types extended with a pinned typestate, we can now justify the `Pin` and `PinBox` API. @@ -222,6 +234,8 @@ A list of bytes makes a valid `PinBox` if those bytes form a pointer `ptr` su For `Pin<'a, T>`, `ptr` is valid if we have borrowed `T.pin(ptr)` for lifetime `'a`. ``` PinBox.own(ptr) := T.pin(ptr) +``` +``` Pin<'a, T>.own(ptr) := borrowed('a, T.pin(ptr)) ``` @@ -235,7 +249,7 @@ This seems like a reasonable principle to require in general, so we add it to ou > {:type="a" start="2"} 2. If we own a pointer `ptr` pointing to a list of bytes `bytes` such that `T.own(bytes)` holds, then we can move to the pinned typestate and obtain `T.pin(ptr)`. Formally: ``` -forall |ptr, bytes| (ptr.points_to(bytes) && T.own(bytes)) -> T.pin(ptr) +forall |ptr, bytes| (ptr.points_to_owned(bytes) && T.own(bytes)) -> T.pin(ptr) ``` `PinBox::new` can now be easily implemented using `Box::new` and `PinBox::from`. @@ -243,7 +257,7 @@ forall |ptr, bytes| (ptr.points_to(bytes) && T.own(bytes)) -> T.pin(ptr) For `PinBox::as_pin`, we start with a `&'a mut PinBox`, which is a pointer-to-pointer, and return the inner pointer as `Pin<'a, T>`. This is justified because we start with a borrow for lifetime `'a` of a pointer to some `bytes` that satisfy (for the given lifetime) `PinBox.own(bytes)`: ``` -borrowed('a, exists |bytes| ptr.points_to(bytes) && PinBox.own(bytes)) +borrowed('a, exists |bytes| ptr.points_to_owned(bytes) && PinBox.own(bytes)) ``` Unfolding the `PinBox.own`, we can deduce that `bytes` actually form a pointer `inner_ptr` such that `T.pin(inner_ptr)` (for the given lifetime). This exactly matches our return type, so we're good! @@ -275,7 +289,7 @@ PinBox.shr('a, ptr) := exists |inner_ptr| ptr.points_to_read_only('a, inner_ptr) && T.shr('a, inner_ptr) ``` This requires a way to talk about memory that is read-only for the duration of some lifetime. -We assume we have a predicate `ptr.points_to_only_bytes('a: Lifetime, bytes: List)` saying that `ptr` points to `bytes.len()` many bytes of valid memory containing `bytes`, and that for lifetime `'a`, we are free to read that memory and it will not change. +We assume we have a predicate `ptr.points_to_read_only_bytes('a: Lifetime, bytes: List)` saying that `ptr` points to `bytes.len()` many bytes of valid memory containing `bytes`, and that for lifetime `'a`, we are free to read that memory and it will not change. We then define a convenient variant based on higher-level memory representations as usual: ``` ptr.points_to_read_only('a: Lifetime, data: U) where List: TryInto := @@ -296,7 +310,7 @@ Formally speaking: > **Definition 6: `Unpin`.** A type `T` is `Unpin` if from `T.pin(ptr)` we can deduce that we own the pointer `ptr` and it points to a list of bytes `bytes` such that `T.own(bytes)` holds. Formally: ``` -forall |ptr| T.pin(ptr) -> (exists |bytes| ptr.points_to(bytes) && T.own(bytes)) +forall |ptr| T.pin(ptr) -> (exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes)) ``` Note that this is exactly the inverse direction of axiom (b) added in definition 2b: For `Unpin` types, we can freely move between the owned and pinned typestate. @@ -313,7 +327,7 @@ One advantage of this proposal is that it is *local*: Existing types (and new types) that are designed without considering `Pin` remain sound in the presence of this new typestate, even if we automatically derive the `Unpin` trait for these types. The reason this works is that we can always define `T.pin(ptr)` as saying that we fully own `ptr` pointing to `bytes` such that we have `T.own(bytes)`: ``` -T.pin(ptr) := exists |bytes| ptr.points_to(bytes) && T.own(bytes) +T.pin(ptr) := exists |bytes| ptr.points_to_owned(bytes) && T.own(bytes) ``` This satisfies the additional axioms (b) and (c) connecting the pinned typestate to the others, and it also satisfies `Unpin`. The latter is crucial, because it means we can automatically derive `Unpin` instances through the auto trait mechanism and do not have to review the entire ecosystem for `Pin`-compatibility.