X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/aacb1fee402c329bb4a85d5f02527a72d86ac9c3..e389a292a554454db8f6cd8b9f4070609645dabc:/ralf/_posts/2018-01-31-sharing-for-a-lifetime.md diff --git a/ralf/_posts/2018-01-31-sharing-for-a-lifetime.md b/ralf/_posts/2018-01-31-sharing-for-a-lifetime.md index 25b0bec..0b7a454 100644 --- a/ralf/_posts/2018-01-31-sharing-for-a-lifetime.md +++ b/ralf/_posts/2018-01-31-sharing-for-a-lifetime.md @@ -1,13 +1,14 @@ --- title: "Sharing for a Lifetime" categories: research rust +forum: https://internals.rust-lang.org/t/sharing-for-a-lifetime/6675 --- This post is about an aspect of the [RustBelt paper]({{ site.baseurl }}{% post_url 2017-07-08-rustbelt %}). Concretely, I'd like to share some of our thoughts on the nature of types and shared references. Let's see how well this goes. :) -Shared references are an extremely powerful mechanism in the Rust type system, and we've had quite some trouble fining a good way for handling them in our [formal model]({{ site.baseurl }}{% post_url 2015-10-12-formalizing-rust %}). +Shared references are an extremely powerful mechanism in the Rust type system, and we've had quite some trouble finding a good way of handling them in our [formal model]({{ site.baseurl }}{% post_url 2015-10-12-formalizing-rust %}). In this post, I will present the model that we came up with. Let's get started by discussing what a "model" of a type looks like, and what's so hard about handling shared references. @@ -32,9 +33,9 @@ The (syntactic) data layout imposes some basic structure on the permitted sequen For example, the sequence of four `0` bytes is valid at `i32` but not at `NonZero`. As another example, a pointer is valid at `Box` if it points to four bytes of *owned* memory[^2], and these four bytes are a valid `i32`. -Finally, a pointer is valid at `&'a mut T` if it points to `size_of::()` many bytes of memory[^3] that are *owned for lifetime `'a`*, and those bytes are valid at `T`. +Finally, a pointer is valid at `&'a mut T` if it points to `size_of::()` many bytes of memory[^3] that are *borrowed for lifetime `'a`*, and those bytes are valid at `T`. -[^2]: I will also mostly ignore how the concept of *ownership* can be made more precise. If you want to dig deeper, the keyword is [separation logic](https://en.wikipedia.org/wiki/Separation_logic). We had to extend separation logic to be able to handle "ownership for some lifetime", but you don't have to understand any of that to follow this post. That's the plan, anyway. +[^2]: I will also mostly ignore how the concept of *ownership* can be made more precise. If you want to dig deeper, the keyword is [separation logic](https://en.wikipedia.org/wiki/Separation_logic). We had to extend separation logic to be able to handle borrowing as well, but the details of all of this don't matter for this post. That's the plan, anyway. [^3]: Yes, we do ignore unsized types here. @@ -66,7 +67,7 @@ This already excludes the possibility of data races. Secondly, `Cell` does not permit obtaining a *deep pointer* into the data. So, changing a `Cell>` from `Ok(true)` to `Err(42)` cannot invalidate any pointers that still expect a `bool` in the `Result`. -Coming back to our model, we could say that `&'a Cell` is a pointer valid for lifetime `'a` that points to `size_of::()` many bytes of memory that (for this lifetime) may *be accessed by any code running in the current thread, but not by any other code*; and moreover, those bytes are valid at `T`. +Coming back to our model, we could say that `&'a Cell` is a pointer that points to `size_of::()` many bytes of memory that, for the duration of `'a`, may *be accessed by any code running in the current thread, but not by any other code*; and moreover, those bytes are valid at `T`. This definition forces `&Cell` to *not* be `Send` (because the model refers to the "current thread", so changing that thread could violate the invariant) and hence `Cell` must not be `Sync`. It also rules out pointers into a `Cell>`: Say we would want to have a pointer to the `Ok` data, which would have type `&Cell`. We could not show that this pointer is actually valid at the given type, because anyone in the current thread could modify the (outer) `Cell>` at any time, and if they write an `Err(42)` in it they will violate the requirements that an `&Cell` only ever points to a valid `bool`! @@ -78,7 +79,7 @@ Coming to `Mutex`, the justification for why `Mutex::lock` and `MutexGuard::dere A `Mutex` implements *mutual exclusion* (hence the name), so only one thread can ever hold the lock. As a result, giving that thread mutable access to the inner data cannot result in any data races or other conflicts with other threads. -In the model (and I'm going to gloss over more and more details here), we will say that `&'a Mutex` is valid if everyone accessing the memory it points to follows the locking discipline. +In the model (and I'm going to gloss over more and more details here), we will say that `&'a Mutex` is valid if everyone accessing the memory it points to during the lifetime `'a` follows the locking discipline. A `MutexGuard` is valid if we actually own the lock of the corresponding `Mutex`. Together, this is sufficient to justify safety of all the `Mutex` and `MutexGuard` operations. @@ -134,7 +135,7 @@ This is not so say that the last word on interior mutability has been spoken; th Either way, this model has certainly helped me not only in the formal work on RustBelt, but also in thinking about the Rust type system itself. For example, I briefly wondered why `Cell` is not covariant. -Indeed, if `U <: T`, then it is perfectly safe to turn a `Cell` into a `Cell` -- that is to say, whenever some sequence of bytes is valid at `Cell`, it is also valid at `Cell`. +Indeed, if `T <: U`, then it is perfectly safe to turn a `Cell` into a `Cell` -- that is to say, whenever some sequence of bytes is valid at `Cell`, it is also valid at `Cell`. (In fact, the owned invariant for `Cell` is *exactly the same* as the one for `T`!) That's precisely what we need for covariance, right? Well, not exactly -- because our types have two invariants, we also have to require that whenever a pointer is a shared `Cell`, it is a shared `Cell`, and that is decisively *not* true! @@ -153,6 +154,5 @@ So, maybe this is some extremely ingenious idea that the Rust devs had without e And then there is the question: Why exactly two modes? Would it be worth having more? I honestly don't know, but I had to think of this a while back when I read [this post by glaebhoerl](https://internals.rust-lang.org/t/impl-t-clone-for-cell-rc-t/5397), the first part of which I would translate as "`Cell` could be considered a mode". -If you have any comments or thoughts on this, please join the discussion! -I will create a thread in the Rust internal forums shortly. +If you have any comments or thoughts on this, please join the [discussion in the Rust forums](https://internals.rust-lang.org/t/sharing-for-a-lifetime/6675)! I'd also be interested in feedback on how understandable this post is; this is my first attempt at translating research results into a blog post.