From: Ralf Jung <post@ralfj.de>
Date: Fri, 11 Aug 2017 23:02:09 +0000 (-0700)
Subject: new post on evaluating types-as-contracts
X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/81d020bc303e970d1eaebf302bb829dafeb6e2ce?ds=inline;hp=1ce84a5da4500d18c029c8d38fbe1d9cd5f98ee8

new post on evaluating types-as-contracts
---

diff --git a/ralf/_posts/2017-07-14-undefined-behavior.md b/ralf/_posts/2017-07-14-undefined-behavior.md
index 77e7315..4635668 100644
--- a/ralf/_posts/2017-07-14-undefined-behavior.md
+++ b/ralf/_posts/2017-07-14-undefined-behavior.md
@@ -8,7 +8,7 @@ So, finally, just one year later, this post is my take at what the purpose of th
 <!-- MORE -->
 Warning:  This post may contain opinions.  You have been warned.
 
-## When are optimizations legal?
+## When are Optimizations Legal?
 
 Currently, we have a pretty good understanding of what the intended behavior of *safe* Rust is.
 That is, there is general agreement (modulo some [bugs](https://github.com/rust-lang/rust/issues/27868)) about the order in which operations are to be performed, and about what each individual operation does.
@@ -37,7 +37,7 @@ After all, if there is any execution for which the assumption does *not* hold, t
 Now, it turns out that it is often really hard to obtain precise aliasing information.
 This could be the end of the game:  No alias information, no way to verify our assumptions, no optimizations.
 
-## Shifting responsibility
+## Shifting Responsibility
 
 However, it turns out that compiler writers consider these optimizations important enough that they came up with an alternative solution:
 Instead of having the compiler verify such assumptions, they declared the programmer responsible.
diff --git a/ralf/_posts/2017-08-11-types-as-contracts-evaluation.md b/ralf/_posts/2017-08-11-types-as-contracts-evaluation.md
new file mode 100644
index 0000000..e06ccc0
--- /dev/null
+++ b/ralf/_posts/2017-08-11-types-as-contracts-evaluation.md
@@ -0,0 +1,246 @@
+---
+title: "Types as Contracts: Implementation and Evaluation"
+categories: internship rust
+---
+
+Some weeks ago, I described [Types as Contracts]({{ site.baseurl }}{% post_url 2017-07-17-types-as-contracts %}) as an approach for how to go about defining Rust's aliasing-related [undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}).
+One key property of this approach is that it is *executable*, which means we can actually have a program (in this case, [miri](https://github.com/solson/miri/)) tell us whether some particular Rust test case exhibits undefined behavior or not.
+I have since then spent most of my time completing and refining this implementation, and running it on miri's test suite to exercise various bits of the standard library and see whether they are actually following the rules I have been suggesting.
+
+This post is my report on what I found, and how the approach itself changed in response to these findings.
+It also explains how you can run miri yourself and have it check your code for undefined behavior according to this model.
+Finally, I discuss some of the things that I would like to look at going forward.
+
+<!-- MORE -->
+
+## 1 Rule Violations in libstd and miri's Test Suite
+
+Unsurprisingly, some of miri's test cases started to fail once I properly implemented the rules I wanted to have checked.
+Some of these turned out to be [compiler](https://github.com/rust-lang/rust/issues/43457) [bugs](https://github.com/rust-lang/rust/issues/43481), but others are actually examples of code violating the proposed rules.
+In the following, I describe the patterns that I found.
+I assume that you are familiar with Types as Contracts as introduced in my [previous post]({{ site.baseurl }}{% post_url 2017-07-17-types-as-contracts %}).
+
+## 1.1 Ownership Passing Via Raw Pointers
+
+I already mentioned this in the original post:  `mem::swap` triggers a lock violation in the strictest interpretation of the model.
+{% highlight rust %}
+// Defined elsewhere: unsafe fn ptr::swap_nonoverlapping<T>(x: *mut T, y: *mut T) { ... }
+
+pub fn swap<T>(x: &mut T, y: &mut T) {
+    Validate(Acquire, [x, y]);
+    let x = x as *mut T;
+    let y = y as *mut T;
+    unsafe {
+        Validate(Release, [x, y]);
+        ptr::swap_nonoverlapping(x, y, 1);
+    }
+}
+{% endhighlight %}
+The problem is that `ptr::swap_nonoverlapping` takes a raw pointer, so `swap` sees no reason to release its write locks on the memory covered by `x` and `y`.
+That makes it UB for `ptr::swap_nonoverlapping` to access that memory.
+
+Raw pointers sneakily transferring ownership cannot just lead to functions accessing memory that the model does not allow them to access, it can also lead to overlapping mutable references.
+We can see this in `[T]::split_at_mut`:
+{% highlight rust %}
+fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T]) {
+    let len = self.len();
+    let ptr = self.as_mut_ptr();
+
+    unsafe {
+        assert!(mid <= len);
+
+        (from_raw_parts_mut(ptr, mid),
+         from_raw_parts_mut(ptr.offset(mid as isize), len - mid))
+    }
+}
+{% endhighlight %}
+The problem is that when `from_raw_parts_mut(ptr, mid)` returns a `&mut [T]`, that overlaps with `self`, so we have two aliasing mutable references in scope.
+This makes acquire-validation of the return value of `from_raw_parts_mut(ptr, mid)` fail.
+
+Another example of this pattern is `AtomicBool::get_mut`:
+{% highlight rust %}
+pub fn get_mut(&mut self) -> &mut bool {
+    unsafe { &mut *(self.v.get() as *mut bool) }
+}
+{% endhighlight %}
+`self.v.get()` returns a `*mut u8`, which then gets turned into a `&mut bool`.
+However, this again overlaps with `self`.
+
+This list is not exhaustive by any means; there are problems of this sort pretty much whenever raw pointers are used in non-trivial ways.
+
+## 1.2 Uninitialized memory
+
+Any place that calls `mem::uninitialized` instantly has validation blow up because the returned `T` is, obviously, not actually a valid `T` -- it is, after all, completely uninitialized.
+However, it turns out that `mem::uninitialized` is causing other problems as well (e.g. around uninhabited types), and [better APIs for handling uninitialized data have already been proposed](https://internals.rust-lang.org/t/mem-uninitialized-and-trap-representations/4167/18?u=ralfjung).
+
+## 1.3 `Arc::drop`
+
+This is the most interesting case I found.  The `Arc` destructor ends up calling `Arc::drop_slow`, which is implemented as follows:
+{% highlight rust %}
+ unsafe fn drop_slow(&mut self) {
+    let ptr = self.ptr.as_ptr();
+
+    // Destroy the data at this time, even though we may not free the box
+    // allocation itself (there may still be weak pointers lying around).
+    ptr::drop_in_place(&mut self.ptr.as_mut().data);
+
+    if self.inner().weak.fetch_sub(1, Release) == 1 {
+        atomic::fence(Acquire);
+        Heap.dealloc(ptr as *mut u8, Layout::for_value(&*ptr))
+    }
+}
+
+fn inner(&self) -> &ArcInner<T> {
+    // This unsafety is ok because while this arc is alive we're guaranteed
+    // that the inner pointer is valid. Furthermore, we know that the
+    // `ArcInner` structure itself is `Sync` because the inner data is
+    // `Sync` as well, so we're ok loaning out an immutable pointer to these
+    // contents.
+    unsafe { self.ptr.as_ref() }
+}
+{% endhighlight %}
+
+When `drop_slow` is executed, it first uses `drop_in_place` to drop the user data in the `Arc`.
+Next, it calls `inner`, and this is where things go wrong.
+The reason validation fails is that when `inner` calls `self.ptr.as_ref()`, that returns an `&ArcInner<T>`.
+This return value is validated, like all return values are.
+However, remember that we already dropped the `data` field of the `ArcInner`!
+We don't actually have a fully valid `&ArcInner<T>`; only the `weak` and `strong` fields may still be used.
+Nevertheless, the type of `inner` says that there is a valid shared reference to `ArcInner` being passed around, making this code UB.
+
+Notice that even if `inner` did not care about the types, validation would still fail when `drop_slow` goes on calling the safe function `Layout::for_value` and passes it the invalid `&ArcInner<T>`.
+
+## 2 Changes to the Model
+
+There were a whole bunch of other test failures, but they all turned out to be bugs in the implementations or even the model, rather than legit violations of the "Types as Contracts" idea.
+In particular, it turned out that suspensions were actually not modeling borrowing of mutable data properly.
+
+There were also some changes affecting the rules of where validation calls are emitted in some corner cases, but there weren't really any interesting things to learn there.
+At some point I will write down the exact rules in full detail; until then it is really the general approach that matters much more than such corner cases.
+
+## 2.1 Suspension of Write Locks
+
+In the original proposal, on every borrow `y = &[mut] x;`, the write locks on `x` would be *suspended*.
+This means that the locks get released, but will be re-acquired (or *recovered*) automatically when the lifetime of the borrow ends.
+First of all, this actually fails to detect some cases that should be considered contract violations, like passing an `&mut bool` to a function and then having that function write something invalid (i.e., something that is not a boolean) into the reference.
+To fix that, I added a new table to the global state of the program that would record suspended *lvalues* and their type (rather than suspended memory regions).
+When the lifetime of the suspension ends, we perform a full re-acquire of the given lvalue at the given type, acquiring all the write locks again and also checking other invariants, like whether `bool` values are valid.
+
+However, it turned out that this model was still too naive.
+I failed to account for the fact that the borrow checker actually accepts programs like the following:
+{% highlight rust %}
+fn foo(s: (i32, i32)) {
+    Validate(Acquire, [s]);
+    let x;
+    {
+        Validate(Suspend('y), [s.1]);
+        let y = &/*'y*/ s.1;
+        Validate(Acquire, [*y]);
+
+        Validate(Suspend('x), [s]);
+        x = &/*'x*/ s;
+        Validate(Acquire, [*x]);
+    EndRegion('y); }
+EndRegion('x); }
+{% endhighlight %}
+Let us step through this code while considering what happens with the locks.
+Initially, we acquire `s`, so the memory occupied by the pair will be write-locked by `foo`.
+When the borrow of `s.1` happens, we release the write lock and suspend `s.1` at type `i32`, making sure we re-acquire the write lock when the lifetime `'y` of the borrow.
+Nothing interesting happens when `*y` gets acquired.
+Next, the entire `s` gets borrowed, so we will try to release all the write locks on it during suspension -- but at this point, we don't even have a write lock of `s.1`!
+Furthermore, due to this second borrow (that has a longer lifetime than the first borrow), we actually do *not* want to re-acquire the write lock on `s.1` when `'y` ends, we will want to wait until `'x` ends and keep the read lock until then.
+
+What we really want to do is *count* how many shared borrows happened of `s.1` and `s.2`, respectively, and only acquire the write lock again once that count reaches 0.
+This is complicated a lot once `RefCell` enters the game: The fact that we take a shared borrow does not mean that there won't also be mutable borrows again for the same region of memory.
+This is demonstrated by the following example (with comments explaining what happens to the locks; adding all the `Validate` calls just makes this code unreadable):
+{% highlight rust %}
+fn test2(r: &mut RefCell<i32>) {
+    let x = &*r; // releasing write lock, first suspension recorded
+    let mut x_ref = x.borrow_mut();
+    let x_inner : &mut i32 = &mut *x_ref;
+    let x_inner_shr = &*x_inner; // releasing inner write lock, recording suspension
+    let y = &*r; // second suspension for the outer write lock
+    let x_inner_shr2 = &*x_inner; // 2nd suspension for inner write lock
+    /* ... */
+}
+{% endhighlight %}
+At the end of this function, we actually have four aliasing shared borrows covering the content of the `RefCell`: `x`, `y`, `x_inner_shr` and `x_inner_shr2`.
+There are two mutable references, both of which currently suspended (`x_inner` and `r`).
+
+Once both `x_inner_shr` and `x_inner_shr2` have expired (i.e, their lifetime has ended), we will want to recover the write lock on `x_inner`.
+Then `x_inner` expires, releasing the write lock.
+Finally, once both `x` and `y` expire, `r` gets recovered and we again acquire the write lock.
+
+To make that happen, we need to be able to track multiple suspended write locks for a location.
+In implementing that, an interesting question came up:  What *identifies* a write lock?
+I ended up using the lifetime of the lock as the identifier, but that doesn't actually work.
+Notice that in the above example, both `x_inner` and `r` have the *same* lifetime, even though they are clearly corresponding to distinct write locks.
+So, I will have to go back to the drawing board and figure out a different way to identify write locks.
+I have some ideas, but I am not very happy with them.
+
+## 2.2 Unsafe Code
+
+As already indicated in the original post, the way I suggest to handle the problems with raw pointers described in [Â§1.1](#11-ownership-passing-via-raw-pointers) is by relaxing the validation performed around unsafe code.
+Specifically, the code that emits MIR validation statements now detects if a function is unsafe, contains an unsafe block, or (in the case of closures) is defined inside an unsafe block/function.
+In that case, we still perform validation of the arguments passed to us, but we immediately release all locks again.
+The same happens for data returned to us from other functions.
+No further validation is happening inside the unsafe function.
+In other words, we enforce the type invariants when data passes between functions, but we do not enforce any invariant *within* such functions.
+
+I am aware that this part of the proposal is particularly controversial.
+I do not know of an alternative that actually works with the raw pointers, and that performs validation purely based on the types of variables as they are passed around (rather than, say, also taking some form of provenance or "derived-from" relation into account).
+In discussions, two possible solutions to this problems became apparent:
+
+* We could try to find some kind of API that's slightly safer than raw pointers, and somehow does not lose all lifetime information.
+  Unsafe code using that API, and not using raw pointers, would still be subject to full validation and thus also enjoy the full set of optimizations.
+* We could try to incorporate some form of provenance tracking into this model.
+
+Both of these options are certainly worth exploring.
+
+# 3 Testing Your Own Code
+
+The implementation of the model as described above (well, with the open problems mentioned in [Â§2.1](21-suspension-of-write-locks)) actually landed upstream in rustc and miri, so you can play around with this yourself!
+To do so, you need to install Rust nightly and miri.
+Lucky enough, rustup makes this really easy.
+(Unfortunately, I have very little Windows experience. If you manage to get this all running on Windows, I'm happy to link to your instructions for how to do that.)
+On Linux and macOS, you can use the following commands to install everything you need, and run `test.rs` in miri:
+```
+rustup install nightly
+git clone https://github.com/solson/miri.git && cd miri
+cargo +nightly install xargo # xargo is needed to build a libstd that miri can work with
+xargo/build.sh # build that libstd
+cargo +nightly build --release
+MIRI_SYSROOT=~/.xargo/HOST cargo +nightly run --release --bin miri -- -Zmir-emit-validate=1 test.rs
+```
+The flag `-Zmir-emit-validate=1` turns on generation of the MIR validation statements.
+If you set it to `2`, full validation will performed even in unsafe functions.
+Notice however that the flag only affects the code generated when miri actually runs; the script in `xargo/build.sh` controls validation for code in libstd.
+
+If you find anything odd and things don't work as expected, please [report a bug](https://github.com/solson/miri/issues).
+Notice however that miri currently supports virtually no interaction with the surrounding operating system, like file access or spawning threads.
+
+
+# 4 Future Work
+
+As usual, there are so many things that could be done next.
+
+miri's test suite is rather small, but still large enough to keep me busy, so I have not yet looked at the much more extensive test suite that comes with rustc -- but that certainly something I want to do eventually.
+I am sure this will uncover bugs in corner-cases of my implementation, and it may also uncover new interesting cases of rules being violated.
+Speaking of the implementation, the problem around write locks described in [Â§2.1](21-suspension-of-write-locks) is not the only issue; another problem I still have to solve is to properly acquire and release locks on padding bytes and enum discriminants.
+
+While [Â§1.1](#11-ownership-passing-via-raw-pointers) is "fixed" by exempting unsafe code from the full range of checks, and the trouble with uninitialized memory ([Â§1.2](#12-uninitialized-memory)) will hopefully be solved as a side-effect of [changing the way uninitialized memory is handled](https://internals.rust-lang.org/t/mem-uninitialized-and-trap-representations/4167/18?u=ralfjung), this still leaves `Arc::drop` ([Â§1.3](13-arcdrop)) as an open problem.
+This comes down to a judgment call: We could relax validation sufficiently (e.g., relaxing validation in entire modules when they do certain unsafe things, and not validating private fields of structs that we cannot access) to make this code legal, or we could declare that code to actually be UB.
+The concrete problem in `Arc` can be fixed fairly easily by acquiring a reference to the `weak` field *before* dropping the user data in the `Arc`; the question is whether this kind of code is something that programmers actually expect to be legal -- and if they do, whether we can do anything about that (by providing documentation, or tools, or whatever else is necessary).
+Of course, it may also turn out that this shows that the entire approach is not actually feasible.
+
+Interestingly, the [discussion](https://internals.rust-lang.org/t/types-as-contracts/5562) of my previous post introducing Types as Contracts was to a large extend not even about this model, but about the underlying model of what memory and pointers are, and how to write a checker that actually faithfully models what LLVM does.
+After all, we want miri to also detect UB related to e.g. illegal pointer arithmetic.
+These issues are far from settled, and I do have some interesting ideas here that I want so share with you.
+
+Finally, there is of course the discussion around how to handle raw pointers; I already talked in [Â§3](#22-unsafe-code) about some of the options that could be explored here.
+
+Unfortunately, today is the last day of my internship, so I will no longer be able to work on all of this full-time like I did the last three months.
+Still, I certainly intend to stay involved. This problem is way too interesting to just let it go :)
+
+As always, please comment with your thoughts on the topic.
+I am particularly curious about what kind of test cases you are throwing at miri, and how it is doing!