From: Ralf Jung Date: Wed, 26 Dec 2018 15:57:03 +0000 (+0100) Subject: add latest stacked borrows updates X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/725435c183189caf8563d2dbbf3f4cccf3503103 add latest stacked borrows updates --- diff --git a/personal/_posts/2018-12-12-google-scholar.md b/personal/_posts/2018-12-12-google-scholar.md index 81c7090..28a4806 100644 --- a/personal/_posts/2018-12-12-google-scholar.md +++ b/personal/_posts/2018-12-12-google-scholar.md @@ -8,6 +8,8 @@ I even found one vacancy that explicitly requested submitting a Google Scholar p Google Scholar provides not just a list of publications (which is of course part of the application, and which is also [available on dblp](http://dblp.org/pers/hd/j/Jung_0002:Ralf)), but also a citation count and computation of several publication-related indices. This post is about why I don't have a Google Scholar profile (yet). + + First of all, having a Google Scholar profile requires a Google account. There is no technical necessity for this, Google already indexes my papers and other databases (like the aforementioned dblp) manage to create per-author pages just fine without authors having to have an account. But, of course, this is a great way for Google to tie more people into their ecosystem -- few kinds of pressure are as effective as when this directly affects hiring decisions. diff --git a/personal/_posts/2018-12-26-stacked-borrows-barriers.md b/personal/_posts/2018-12-26-stacked-borrows-barriers.md new file mode 100644 index 0000000..085a97a --- /dev/null +++ b/personal/_posts/2018-12-26-stacked-borrows-barriers.md @@ -0,0 +1,157 @@ +--- +title: "Barriers and Two-phase Borrows in Stacked Borrows" +categories: internship rust +--- + +My internship ("research assistantship") with Mozilla has ended several weeks ago, and this post is a report of the most recent tweaks I made to Miri and Stacked Borrows. +Neither project is by any means "done", of course. +However, both have reached a fairly reasonable state, so I felt some kind of closing report made sense. +Also, if I ever want to finish my PhD, I'll have to seriously scale down the amount of time I work on Rust -- so at least from my side, things will move more slowly from now on. + + + +## Stacked Borrows Tweaks + +I made some small tweaks to Stacked Borrows which mostly serve to make things overall more consistent. +All of these have been incorporated into my [previous post on Stacked Borrows]({% post_url 2018-11-16-stacked-borrows-implementation %}). + +The only noticeable functional change affects creating a shared reference: previously, that would push a `Shr` item to the stack before freezing the location. +That would, however, permit mutation like in the following example: + +{% highlight rust %} +fn demo_shared_mutation(ptr: &mut u8) { + let shared = &*ptr; // pushes a `Shr` item + let raw = shared as *const u8 as *mut u8; + unsafe { *raw = 5; } // allowed: unfreezing, then matching the `Shr` item +} +{% endhighlight %} + +Since "no mutation through (pointers derived from) shared references" has been a rule in Rust pretty much from the beginning, I changed creating a shared reference to no longer unconditionally push a `Shr` item. +That now only happens when the pointee is mutable because of an `UnsafeCell`. +As a consequence, the code above is now forbidden. + +## Barriers + +Beyond these small tweaks, I implemented the *barriers* that I already described in my [original proposal]({% post_url 2018-08-07-stacked-borrows %}). +The purpose of barriers is to handle code like the following: + +{% highlight rust %} +fn aliasing(ptr: &mut u8, raw: *mut u8) -> u8 { + let val = *ptr; + unsafe { *raw = 4; } + val // we would like to move the `ptr` load down here +} + +fn demo_barrier() -> u8 { + let x = &mut 8u8; + let raw = x as *mut u8; + aliasing(unsafe { &mut *raw }, raw) +} +{% endhighlight %} + +`aliasing` here gets two aliasing pointers as argument. +This means writing to `raw` changes the value in `ptr`, so the desired optimization would change the behavior of `demo_barrier` (currently it returns 8, after moving the load of `ptr` down it returns 4). +Stacked Borrows without barriers would allow this code because `ptr` does not get used again after `raw`. + +To disallow this code and enable the desired optimization, we introduce a new kind of item that can be pushed to the per-location stack: a *function barrier* referring to a particular function call. +While the function is running, the barrier may not be popped. +The type of items on the borrow stack therefore now looks as follows: + +{% highlight rust %} +pub enum BorStackItem { + Uniq(Timestamp), + Shr, + FnBarrier(CallId), +} +{% endhighlight %} + +These barriers get pushed when doing the initial retagging of a function (remember that we insert `Retag` statements for all arguments at the beginning of each function). +We do this only for references (not for `Box`), and we do *not* push a barrier to locations inside a shared `UnsafeCell`. +Concretely, between steps 4 and 5 of retagging (after we performed the action of an access and before we push the new item), we push a `FnBarrier` referring to the current function call. +Moreover, when retagging with barriers, we do not perform the usual redundancy check -- even if the reborrow is redundant, pushing the barrier might be necessary. + +Barriers have no effect when a pointer gets dereferenced: looking for the item in the stack just skips over barriers. +However, when performing a memory access and popping items from the stack, we stop when we encounter an active barrier, i.e., a barrier for a function call that is still ongoing. +If that ever happens, we have undefined behavior. + +Together, these rules make the above example invalid code, and hence enable our optimization: when doing the memory access in `*raw = 4`, we pop until we find a `Shr` item. +That will hit the barrier that was added when `aliasing` did the initial retagging of `val`, triggering UB. + +An interesting (and positive!) consequence of barriers is that they make the following code UB: + +{% highlight rust %} +fn aliasing(ptr1: &mut u8, ptr2: &mut u8) {} + +fn demo_barrier() -> u8 { + let x = &mut 8u8; + let raw = x as *mut u8; + aliasing(unsafe { &mut *raw }, x) +} +{% endhighlight %} + +Without barriers, there is no problem with this code because `aliasing` does not actually *use* the two aliasing pointers in conflicting ways. +With barriers, the initial retag of `ptr2` goes wrong because it tries to pop off the barrier that just got pushed for `ptr1`. +That's great, because we don't actually want to allow passing two aliasing pointers to `aliasing`. + +Barriers have one more effect: when memory gets deallocated, it must not have any active barriers left in its stacks. +Barriers so far ensure that a reference passed to a function does not get *invalidated* during that function call; this extra check ensures that it also does not get *deallocated*. +This is important, because we tell LLVM via the `dereferencable` attribute that the reference is, well, dereferencable for the duration of the function call. +To justify this attribute, we have to argue that the program really cannot deallocate this memory + + +### Intuition and Design Decisions + +One high-level intuition for barriers is that they encode the rule that a reference passed to a function outlives the function call. +The item matching that reference can never get popped while the function is still being executed: if it would get popped, we would hit the barrier and trigger UB. +This is also why we don't have barriers for `Box`: they don't have any such relationship to function calls. + +The other exception, about not having barriers for shared `UnsafeCell`, is more subtle. +In my [previous post]({% post_url 2018-11-16-stacked-borrows-implementation %}), I had an example of a function where a shared and a mutable reference alias (the shared reference being `&RefCell` and the mutable reference pointing into it). +This example crucially relies on the redundancy check during retagging, where the shared reference does *not* get reborrowed (which would invalidate the mutable reference). +If we wanted to add a barrier for that shared reference, we could not push it in top of the stack; we would have to add it somewhere in the middle next to the `Shr` item matching this reference. +For now, I am trying to avoid adding items to the middle of the stack. + +Another problem with shared `UnsafeCell` and barriers is summarized in [this issue](https://github.com/rust-lang/rust/issues/55005): in `Arc`, it can actually be the case that the memory referenced by a shared reference (but inside an `UnsafeCell`) gets deallocated while the function (that got this shared reference as argument) is still running. +This violated LLVM's expectations for `dereferencable` pointers, and it would also violate the rule that memory with active barriers cannot be deallocated. +We will have to either declare the current implementation of `Arc` incorrect or change the attributes that we use with LLVM. +The model currently takes the position that `Arc` is correct, deallocating data behind a shared `UnsafeCell` is allowed, and the attribute is emitted incorrectly. + +## Two-phase Borrows + +During the RustFest Rome, I was reminded that Stacked Borrows does not currently support [two-phase borrows](http://smallcultfollowing.com/babysteps/blog/2017/03/01/nested-method-calls-via-two-phase-borrowing/). +This means that the following code (which is safe code in the 2018 edition) would get rejected by Miri: +{% highlight rust %} +fn two_phase() { + let mut v = vec![]; + v.push(v.len()); +} +{% endhighlight %} + +To fix that, I just had to make a very tiny change. +Retagging now has a special mode for two-phase borrows (it would be used in the above example when retagging the temporary containing the first argument of `push`), and in that mode, after completing all the usual steps, we create a shared reborrow of the newly created mutable borrow, with all the usual effects (freezing or pushing a `Shr` onto the stack. +Thanks to the rule that reading through a mutable reference does not unfreeze or pop a `Shr` off the stack, this means that the original `v` can be used for reading until the time that the newly created (two-phase) borrow is written to for the first time. +Writing will unfreeze and pop the `Shr` off the stack, disallowing any further reads through other references. +This corresponds to the "activation" of a two-phase borrow. + +Unfortunately, this model is not quite enough to handle all the code that rustc currently accepts. +See [this issue](https://github.com/rust-lang/rust/issues/56254) for further discussion on this subject. + +## Stacked Borrows Specification + +Between the various changes described here, there is now a noticeable discrepancy between the only self-contained description of Stacked Borrows in my previous blog post(s) and what is actually implemented. +I will try to find a good spot to put a "living description" of whatever version of Stacked Borrows is currently implemented in Miri, so that there is more than just the code to look at. +I'll let you know when I found one! + +## Making Miri Easier to Use + +Finally, I decided to spend some time making Miri easier to install and use and fixing the problems with running test suites. +Now all you have to do is run + +``` +cargo +nightly install --git https://github.com/solson/miri/ miri +``` + +and then go to your project directory and run `cargo +nightly miri test` (you may have to `cargo clean` first to make sure all your dependencies are built the right way for Miri). +On first launch, Miri will (with your consent) install xargo and prepare a libstd suited for executing Rust programs in the interpreter, and then it will run your test suite. +@oli-obk is currently working on making this even easier by making Miri a component that you can install via `rustup`. +Give it a try and let us know what you think!