clarify abstract nature of pointers

[web.git] / personal / _posts / 2018-04-05-a-formal-look-at-pinning.md
diff --git a/personal/_posts/2018-04-05-a-formal-look-at-pinning.md b/personal/_posts/2018-04-05-a-formal-look-at-pinning.md

index 989dae55f92f4dcf024385db17daaa6b63209a65..2e72c2a6a73f3a7ef004bf939872c642257a4fb5 100644 (file)
--- a/personal/_posts/2018-04-05-a-formal-look-at-pinning.md
+++ b/personal/_posts/2018-04-05-a-formal-look-at-pinning.md
@@ -47,35 +47,37 @@ Data is only pinned after a `Pin<T>` pointing to it has been created; it can be
  The [corresponding RFC](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md) explains the entirey new API surface in quite some detail: [`Pin`](https://doc.rust-lang.org/nightly/std/mem/struct.Pin.html), [`PinBox`](https://doc.rust-lang.org/nightly/std/boxed/struct.PinBox.html) and the [`Unpin`](https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html) marker trait.
  I will not repeat that here but only show one example of how to use `Pin` references and exploit their guarantees:
  {% highlight rust %}
  The [corresponding RFC](https://github.com/rust-lang/rfcs/blob/master/text/2349-pin.md) explains the entirey new API surface in quite some detail: [`Pin`](https://doc.rust-lang.org/nightly/std/mem/struct.Pin.html), [`PinBox`](https://doc.rust-lang.org/nightly/std/boxed/struct.PinBox.html) and the [`Unpin`](https://doc.rust-lang.org/nightly/std/marker/trait.Unpin.html) marker trait.
  I will not repeat that here but only show one example of how to use `Pin` references and exploit their guarantees:
  {% highlight rust %}
-#![feature(pin, arbitrary_self_types)]
+#![feature(pin, arbitrary_self_types, optin_builtin_traits)]
  
  use std::ptr;
  use std::mem::Pin;
  use std::boxed::PinBox;
  
  use std::ptr;
  use std::mem::Pin;
  use std::boxed::PinBox;
+use std::marker::Unpin;
  
  struct SelfReferential {
      data: i32,
  
  struct SelfReferential {
      data: i32,
-    self_ref: Option<ptr::NonNull<i32>>,
+    self_ref: *const i32,
  }
  }
+impl !Unpin for SelfReferential {}
  
  impl SelfReferential {
  
  impl SelfReferential {
-    fn new() -> Self {
-        SelfReferential { data: 42, self_ref: None }
+    fn new() -> SelfReferential {
+        SelfReferential { data: 42, self_ref: ptr::null()  }
      }
  
      }
  
-    fn init(mut self: Pin<Self>) {
-        unsafe {
-            let this = Pin::get_mut(&mut self);
-            // Set up self_ref to point to this.data
-            this.self_ref = ptr::NonNull::new(&mut this.data as *mut _);
-        }
+    fn init(mut self: Pin<SelfReferential>) {
+        let this : &mut SelfReferential = unsafe { Pin::get_mut(&mut self) };
+        // Set up self_ref to point to this.data.
+        this.self_ref = &mut this.data as *const i32;
      }
      }
-    
-    fn read_ref(mut self: Pin<Self>) -> Option<i32> {
-        unsafe {
-            let this = Pin::get_mut(&mut self);
-            // Dereference self_ref if it is set
-            this.self_ref.map(|self_ref| *self_ref.as_ptr())
+
+    fn read_ref(mut self: Pin<SelfReferential>) -> Option<i32> {
+        let this : &mut SelfReferential = unsafe { Pin::get_mut(&mut self) };
+        // Dereference self_ref if it is non-NULL.
+        if this.self_ref == ptr::null() {
+            None
+        } else {
+            Some(unsafe { *this.self_ref })
          }
      }
  }
          }
      }
  }
@@ -86,8 +88,10 @@ fn main() {
      println!("{:?}", s.as_pin().read_ref()); // prints Some(42)
  }
  {% endhighlight %}
      println!("{:?}", s.as_pin().read_ref()); // prints Some(42)
  }
  {% endhighlight %}
-The most intersting piece of code here is `read_ref`, which dereferences a raw pointer.
-The reason this is legal is that we can rely on the entire `SelfReferential` not having been moved since `init()` was called (which is the only place that would set the pointer to `Some`).
+**Update:** Previously, the example code used `Option<ptr::NonNull<i32>>`. I think using raw pointers directly makes the code easier to follow. **/Update**
+
+The most intersting piece of code here is `read_ref`, which dereferences a raw pointer, `this.self_ref`.
+The reason this is legal is that we can rely on the entire `SelfReferential` not having been moved since `init()` was called (which is the only place that would set the pointer to something non-NULL).
  
  In particular, if we changed the signature to `fn init(&mut self)`, we could easily cause UB by writing the following code:
  {% highlight rust %}
  
  In particular, if we changed the signature to `fn init(&mut self)`, we could easily cause UB by writing the following code:
  {% highlight rust %}
@@ -177,9 +181,9 @@ exists |data: U| bytes.try_into() == Ok(data) && T.own(data)
  
  ## Extending Types to Verify `SelfReferential`
  
  
  ## Extending Types to Verify `SelfReferential`
  
-What would it take to *prove* that `SelfReferential` can be used by arbitrary safe code?
+Coming back to our example, what would it take to *prove* that `SelfReferential` can be used by arbitrary safe code?
  We have to start by writing down the private invariants (for all typestates) of the type.
  We have to start by writing down the private invariants (for all typestates) of the type.
-We want to say that if `self.read_ref` it set to `Some(data_ptr)`, then `data_ptr` is the address of `self.data`.
+We want to say that if `self.read_ref` is not NULL, then it is the address of `self.data`.
  However, if we go back to our notion of Rust types that I laid out at the beginning of this post, we notice that it is *impossible* to refer to the "address of `self.data`" in `T.own`!
  And that's not even surprising; this just reflects the fact that in Rust, if we own a type, we can always move it to a different location---and hence the invariant must not depend on the location.
  
  However, if we go back to our notion of Rust types that I laid out at the beginning of this post, we notice that it is *impossible* to refer to the "address of `self.data`" in `T.own`!
  And that's not even surprising; this just reflects the fact that in Rust, if we own a type, we can always move it to a different location---and hence the invariant must not depend on the location.
  
@@ -193,23 +197,22 @@ We will add a new, *third* typestate on top of the existing owned and shared typ
  
  Notice that this state talks about a *pointer* being valid, in contrast to `T.own` which talks about a *sequence of bytes*.
  This gives us the expressivity we need to talk about immovable data:
  
  Notice that this state talks about a *pointer* being valid, in contrast to `T.own` which talks about a *sequence of bytes*.
  This gives us the expressivity we need to talk about immovable data:
-`SelfReferential.pin(ptr)` says that `ptr` points to some memory we own, and that memory stores some pair `(data, self_ref)`.
-(In terms of memory layout, `SelfReferential` is the same as a pair of `i32` and `Option<ptr::NonNull<i32>>`.)
-Moreover, if `self_ref` is set to `Some(data_ptr)`, then `data_ptr` is the address of the first field of the pair:
+`SelfReferential.pin(ptr)` says that `ptr` points to some memory we own, and that memory stores some pair `(data, self_ref)`, and `self_ref` is either NULL or the address of the first field, `data`, at offset `0`:
  ```
  ```
-SelfReferential.pin(ptr) := exists |data: i32, self_ref: Option<ptr::NonNull<i32>>|
+SelfReferential.pin(ptr) := exists |data: i32, self_ref: *const i32|
    ptr.points_to_owned((data, self_ref)) &&
    ptr.points_to_owned((data, self_ref)) &&
-  match self_ref { Some(data_ptr) => data_ptr.as_ptr() == ptr.offset(0), None => True }
+  (self_ref == ptr::null() || self_ref == ptr.offset(0))
  ```
  ```
-The most important part of this is the last line, saying that if `data_ptr` is a `Some`, it actually points to the first field (at offset `0`).
-(I am of course glossing over plenty of details here, like handling of padding, but those details are not relevant right now.
+(In terms of memory layout, `SelfReferential` is the same as a pair of `i32` and `*const i32`.
+I am of course glossing over plenty of details here, but those details are not relevant right now.
  Moreover, `SelfReferential` also has an owned and a shared typestate, but nothing interesting happens there.)
  
  With this choice, it is easy to justify that `read_ref` is safe to execute: When the function begins, we can rely on `SelfReferential.pin(self)`.
  Moreover, `SelfReferential` also has an owned and a shared typestate, but nothing interesting happens there.)
  
  With this choice, it is easy to justify that `read_ref` is safe to execute: When the function begins, we can rely on `SelfReferential.pin(self)`.
-If the closure in `self_ref.map` runs, we are in the `Some` case of the `match` so the deref of the pointer obtained from `self_ref` is fine.
+If we enter the `else` branch, we know `self_ref` is not NULL, hence it must be `ptr.offset(0)`.
+As a consequence, the deref of `self_ref` is fine.
  
  Before we go on, I have to explain what I did with `points_to_owned` above.
  
  Before we go on, I have to explain what I did with `points_to_owned` above.
-Before I said that this predicate operates on `List<Byte>`, but now I am using it on a pair of an `i32` and an `Option`.
+Before I said that this predicate operates on `List<Byte>`, but now I am using it on a pair of an `i32` and a raw pointer.
  Again this is an instance of using a higher-level view of memory than a raw list of bytes.
  For example, we might want to say that `ptr` points to `42` of type `i32` by saying `ptr.points_to_owned(42i32)`, without worrying about how to turn that value into a sequence of bytes.
  It turns out that we can define `points_to_owned` in terms of a lower-level `points_to_owned_bytes` that operates on `List<Byte>` as follows:
  Again this is an instance of using a higher-level view of memory than a raw list of bytes.
  For example, we might want to say that `ptr` points to `42` of type `i32` by saying `ptr.points_to_owned(42i32)`, without worrying about how to turn that value into a sequence of bytes.
  It turns out that we can define `points_to_owned` in terms of a lower-level `points_to_owned_bytes` that operates on `List<Byte>` as follows:
@@ -315,7 +318,7 @@ forall |ptr| T.pin(ptr) -> (exists |bytes| ptr.points_to_owned(bytes) && T.own(b
  
  Note that this is exactly the inverse direction of axiom (b) added in definition 2b: For `Unpin` types, we can freely move between the owned and pinned typestate.
  
  
  Note that this is exactly the inverse direction of axiom (b) added in definition 2b: For `Unpin` types, we can freely move between the owned and pinned typestate.
  
-Clearly, `SelfReferential` is *not* `Unpin`.
+Clearly, `SelfReferential` is *not* `Unpin`, and the example code above makes that explicit with an `impl !Unpin`.
  On the other hand, for types like `i32`, their pinned typestate invariant `i32.pin(ptr)` will only care about the memory that `ptr` points to and not about the actual value of `ptr`, so they satisfy the `Unpin` axiom.
  
  With this definition at hand, it should be clear that if we assume `T: Unpin`, then `&'a mut T` and `Pin<'a, T>` are equivalent types, and so are `Box<T>` and `PinBox<T>`.
  On the other hand, for types like `i32`, their pinned typestate invariant `i32.pin(ptr)` will only care about the memory that `ptr` points to and not about the actual value of `ptr`, so they satisfy the `Unpin` axiom.
  
  With this definition at hand, it should be clear that if we assume `T: Unpin`, then `&'a mut T` and `Pin<'a, T>` are equivalent types, and so are `Box<T>` and `PinBox<T>`.
@@ -334,7 +337,7 @@ The latter is crucial, because it means we can automatically derive `Unpin` inst
  
  ## Conclusion
  
  
  ## Conclusion
  
-We have seen how the new `Pin` type can be used to give safe APIs to types like `SelfReferential`, and how we can (semi-)formally argue for the correctness of `SelfReferential` and the methods on `Pin` and `PinBox`.
+We have seen how the new `Pin` type can be used to give safe APIs to types like `SelfReferential` (which, previously, was not possible), and how we can (semi-)formally argue for the correctness of `SelfReferential` and the methods on `Pin` and `PinBox`.
  I hope I was able to shed some light both on how pinning is useful, and how we can reason about safety of a typed API in general.
  Next time, we are going to look at an extension to the pinning API proposed by @cramertj which guarantees that `drop` will be called under some circumstances, and how that is useful for intrusive collections.
  
  I hope I was able to shed some light both on how pinning is useful, and how we can reason about safety of a typed API in general.
  Next time, we are going to look at an extension to the pinning API proposed by @cramertj which guarantees that `drop` will be called under some circumstances, and how that is useful for intrusive collections.