From: Ralf Jung Date: Fri, 13 Jul 2018 15:19:49 +0000 (+0200) Subject: new post on Arc bug X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/44d57d645bd780a2cf354fa7c848bbbf9bcfbc20 new post on Arc bug --- diff --git a/ralf/_posts/2017-06-09-mutexguard-sync.md b/ralf/_posts/2017-06-09-mutexguard-sync.md index 62ac7de..0d12ee8 100644 --- a/ralf/_posts/2017-06-09-mutexguard-sync.md +++ b/ralf/_posts/2017-06-09-mutexguard-sync.md @@ -1,6 +1,6 @@ --- title: How MutexGuard was Sync When It Should Not Have Been -categories: rust +categories: rust research reddit: /rust/comments/6gavfe/how_mutexguard_was_sync_when_it_should_not_have/ --- diff --git a/ralf/_posts/2018-07-13-arc-synchronization.md b/ralf/_posts/2018-07-13-arc-synchronization.md new file mode 100644 index 0000000..dd1e490 --- /dev/null +++ b/ralf/_posts/2018-07-13-arc-synchronization.md @@ -0,0 +1,188 @@ +--- +title: "The Tale of a Bug in Arc: Sychronization and Data Races" +categories: rust research +--- + +While I was busy doing Rust-unelated research, [RustBelt](https://plv.mpi-sws.org/rustbelt/) continues to move and recently found another bug (after [a missing `impl !Sync` that we found previously]({{ site.baseurl }}{% post_url 2017-06-09-mutexguard-sync %})): It turns out that `Arc::get_mut` did [not perform sufficient synchronization](https://github.com/rust-lang/rust/issues/51780), leading to a data race. + +Notice that I am just the messenger here, the bug was actually found by [Hai](https://people.mpi-sws.org/~haidang/) and [Jacques-Henri](https://jhjourdan.mketjh.fr/). +Still, I'd like to use this opportunity to talk a bit about weak memory, synchronization and data races. +This is just a primer, there are tons of resources on the web that go into more detail (for example [here](http://preshing.com/20120913/acquire-and-release-semantics/)). + + + +## Synchronization and Data Races + +Consider the following example (full of unsafe code because I am demonstrating behavior that Rust does its best to rule out): +{% highlight rust %} +extern crate rayon; + +static mut DATA : Option = None; +static mut FLAG : usize = 0; + +fn main() { unsafe { + rayon::join( + || { + // Initialize data + DATA = Some("Hello World!".to_string()); + // Signal that we are done + FLAG = 1; + }, + || { + // Wait until the other thread is done + while FLAG == 0 {} + // Print data + println!("{}", DATA.as_ref().unwrap()); + } + ); +} } +{% endhighlight %} +The question is: What will this program print? +If you are thinking "Hello World!", you are not wrong, but you are not fully right either. +"Hello World!" is one possibility, but this program can also panic at the `unwrap` and in fact it can do anything it wants, because it has undefined behavior. + +### Weak Memory + +But let us first talk about the possibility of a panic. +How can that happen? +The reason is that [accessing memory is slow](https://gist.github.com/jboner/2841832), at least compared to other operations like arithmetic or accessing a register. +Both compilers and CPUs do everything they can to make sure the program does not wait for memory. +For example, your compiler might decide that it is advantageous to reorder the two instructions in the first thread, putting the `FLAG = 1` *before* the `DATA = Some(...)`. +These are writes to two distinct memory locations, so reordering is fine as the overall result is the same -- right? + +Well, that was true when all programs were single-threaded, but with a concurrent program, another thread can actually tell that the writes are now happening in a different order! +That's how the second thread might see `FLAG = 1` but still read `DATA = NONE`. +Moreover, even if the compiler didn't do anything like that, we could *still* see a result of "0" if you are running this program on an ARM chip. +These chips are very aggressive in how they cache memory accesses, with the result that even if you write a program like the above in assembly, you may still be up for a surprise. + +We call memory that can exhibit such strange reordering effects "weak memory" because it is weaker in the sense that it provides fewer guarantees than one might naively expect. + +### Data Races + +But why does the program have undefined behavior? +The reason is that Rust has tons of primitives whose behavior is essentially impossible to specify once concurrency gets involves. +Consider the assignment `DATA = Some(...)` above, where an entire `Option` is being copied. +What would happen if some other thread reads `DATA` while the assignment is still in progress? +The other thread would see some kind of mixture between the old and the new `DATA`! +That's just complete garbage. +So systems programming languages essentially give up at this point and declare *data races* to be undefined behavior. + +A data race is defined as follows: + +> Two accesses to the same location form a data race if: +> 1. at least one of them is a write, and +> 2. at least one of them is not a special *atomic* access, and +> 3. the accesses are not *ordered*. + +We will talk about the "order" mentioned in condition (3) later. +For now, let us look at "atomic accesses". + +### Atomic Memory Accesses + +In our program above, the reads and writes to `FLAG` satisfy all three conditions. +To fix that, we will negate condition (2): We will use *atomic accesses*. + +All usual accesses (like all the ones in our example code above) are so called *non-atomic* accesses. +They can be compiled efficiently and are heavily optimized, but as we have seen they quickly cause trouble when writing low-level concurrent code. +To make it possible to write such code, but still keep most of the existing optimizations, the C++11 standard introduced special functions that can be used to perform *atomic* read and write accesses to a location. +We also have these atomic accesses in Rust, and we can use them in our program as follows: +{% highlight rust %} +extern crate rayon; + +use std::sync::atomic::{AtomicUsize, Ordering}; + +static mut DATA : Option = None; +static FLAG : AtomicUsize = AtomicUsize::new(0); + +fn main() { + rayon::join( + || { + // Initialize data + unsafe { DATA = Some("Hello World!".to_string()); } + // Signal that we are done + FLAG.store(1, Ordering::Relaxed); }, + || { + // Wait until the other thread is done + while FLAG.load(Ordering::Relaxed) == 0 {} + // Print data + println!("{}", unsafe { DATA.as_ref().unwrap() }); + } + ); +} +{% endhighlight %} +Notice how we are using `AtomicUsize` as type for `FLAG`, and `AtomicUsize::load` and `AtomicUsize::store` instead of non-atomic memory accesses. +These are atomic accesses, and hence we no longer have a data race on `FLAG`. +(I will come back to this `Ordering::Relaxed` parameter soon.) +Notice also that we significantly reduced the amount of unsafe operations, because `AtomucUsize` is actually completely safe to use from multiple threads. + +### Synchronization + +However, we are not done yet: We still have a race on `DATA`! +We also cannot use atomic accesses for `DATA` because of its type, `Option`, which is just too big to be read/written atomically. +This brings us to the third condition in our definition of data races: We have to somehow "order" the two accesses to `DATA`. + +So, what is that "order" about? +The idea is that while in general, operations on a concurrent program can happen in parallel, with strange effects like our first program panicking, there are still *some* operations that we can rely on to be properly "ordered". +For example, all operations within a single thread are ordered the way they are written, so e.g. `DATA = Some(...)` is ordered before `FLAG.store`. +However, what is missing in the program above is some way to obtain an order between operations on *different threads*. +This is where the `Ordering::Relaxed` parameter comes in: All atomic accesses come with an order mode that indicates under which circumstances this access will induce an order between operations. +Our accesses above are `Relaxed`, which means no order is induced. + +So, what we will have to do is to strengthen the mode of our `FLAG` accesses to induce an order between the write of `1` in the first thread, and the operation that reads `1` in the second thread. +To this end, we use the `Release` and `Acquire` pair of modes: +{% highlight rust %} +extern crate rayon; + +use std::sync::atomic::{AtomicUsize, Ordering}; + +static mut DATA : Option = None; +static FLAG : AtomicUsize = AtomicUsize::new(0); + +fn main() { + rayon::join( + || { + // Initialize data + unsafe { DATA = Some("Hello World!".to_string()); } + // Signal that we are done + FLAG.store(1, Ordering::Release); }, + || { + // Wait until the other thread is done + while FLAG.load(Ordering::Acquire) == 0 {} + // Print data + println!("{}", unsafe { DATA.as_ref().unwrap() }); + } + ); +} +{% endhighlight %} +This program, finally, is free of data races and hence has no undefined behavior and is guaranteed to print "Hello World!". + +They key point is that *when an `Acquire` read reads from a `Release` write, an order is induced between these two accesses*. +We also say that the two accesses *synchronize*. +Together with the per-thread order, this means we have an order between the `DATA = Some(...)` in the first thread and the load of `DATA` in the second thread, through the accesses to `FLAG`. +Intuitively, the `Release` write in the first thread "releases" everything that has been changed by this thread so far, and the `Acquire` read in the second thread then "acquires" all that data and makes it accessible for access later in this thread. + +Now, most of the time a `Mutex` or `RwLock` is good enough to protect your data against concurrent accesses, so you do not have to worry about such details. +(And, thanks to Rust thread safety guarantees, you never have to worry about such details in safe code!) +But based on what you learned so far, it should make perfect sense that when a lock is released by thread A, that will happen through a `Release` access, and when a lock is acquired by thread B, that happens through an `Acquire` access. +This way, the lock makes sure that there is an order between the accesses thread A performed when it held the lock (before the `Release`), and the accesses thread B will perform while it has the lock (after the `Acquire`). + +## Coming Back to `Arc` + +I said that `Mutex`/`RwLock` are good enough *most of the time*. +However, `Arc` is one of those cases where the overhead induced by an exclusive lock is just way too big, so it is worth using a more optimized, unsafe implementation. +As such, you are going to find plenty of atomic accesses in [the source code of `Arc`](https://github.com/rust-lang/rust/blob/c0955a34bcb17f0b31d7b86522a520ebe7fa93ac/src/liballoc/sync.rs#L201). + +And it turns out, as Hai and Jacques-Henri noticed when attempting to prove correctness of [`Arc::get_mut`](https://doc.rust-lang.org/beta/std/sync/struct.Arc.html#method.get_mut), that there is one place where `Relaxed` as used as an ordering, [but it really should have been `Acquire`](https://github.com/rust-lang/rust/pull/52031). +Discussing the exact details of the bug would probably fill another blog post (`Arc` is *really* subtle), but the high-level story is exactly like in our example above: Thanks to `Acquire`, an ordering is induced between the code that follows the `get_mut` and the code in another thread that dropped the last [`Weak`](https://doc.rust-lang.org/beta/std/sync/struct.Weak.html) reference. +With `Relaxed`, no such ordering is induced, so we have a data race. +To be fair, it is very unlikely that this race could lead to real misbehavior, but I am still happy to know that we now have a proof that `Arc` is mostly[^1] correctly synchronized. + +[^1]: "Mostly", you may wonder? Unfortunately it turns out that there is [one `Relaxed` access in `make_mut`](https://github.com/rust-lang/rust/blob/c0955a34bcb17f0b31d7b86522a520ebe7fa93ac/src/liballoc/sync.rs#L793) that Hai and Jacques-Henri have not yet been able to prove correct. However, this is likely fixable by making the entire proof more complicated. The version where that `Relaxed` is also replaced by `Acquire` *has* been proven correct. + +One last thing: +I have previously claimed that our [first RustBelt paper]({{ site.baseurl }}{% post_url 2017-07-08-rustbelt %}) already verified correctness of `Arc`, how can there still be bugs? +In that paper, we did not (yet) have the tools to reason realistically about these ordering issues we have been discussing here, so instead we worked with a "sequentially consistent" logic which essentially assumes the strongest possible mode, `SeqCst`, for all atomic accesses. +(We did not discuss `SeqCst` above, and in fact there are not many cases where it is really needed. [`Release` and `Acquire` are enough most of the time](https://internals.rust-lang.org/t/sync-once-per-instance/7918/8?u=ralfjung).) +This is one of the simplifications we made compared to real Rust to make the verification feasible. +We were realistic enough to find [another bug]({{ site.baseurl }}{% post_url 2017-06-09-mutexguard-sync %}), but not realistic enough for this one. +Hai and Jacques-Henri are currently working on remedying this particular simplification by extending the first RustBelt paper to also cover weak memory, and that's when they ran into this problem.