X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/44d57d645bd780a2cf354fa7c848bbbf9bcfbc20..6c733eb18ad15cf79e6048b63d548d693bb0f794:/ralf/_posts/2018-07-13-arc-synchronization.md diff --git a/ralf/_posts/2018-07-13-arc-synchronization.md b/ralf/_posts/2018-07-13-arc-synchronization.md index dd1e490..50a518c 100644 --- a/ralf/_posts/2018-07-13-arc-synchronization.md +++ b/ralf/_posts/2018-07-13-arc-synchronization.md @@ -1,9 +1,10 @@ --- -title: "The Tale of a Bug in Arc: Sychronization and Data Races" +title: "The Tale of a Bug in Arc: Synchronization and Data Races" categories: rust research +reddit: /rust/comments/8ykuhv/the_tale_of_a_bug_in_arc_synchronization_and_data/ --- -While I was busy doing Rust-unelated research, [RustBelt](https://plv.mpi-sws.org/rustbelt/) continues to move and recently found another bug (after [a missing `impl !Sync` that we found previously]({{ site.baseurl }}{% post_url 2017-06-09-mutexguard-sync %})): It turns out that `Arc::get_mut` did [not perform sufficient synchronization](https://github.com/rust-lang/rust/issues/51780), leading to a data race. +While I was busy doing Rust-unrelated research, [RustBelt](https://plv.mpi-sws.org/rustbelt/) continues to move and recently found another bug (after [a missing `impl !Sync` that we found previously]({{ site.baseurl }}{% post_url 2017-06-09-mutexguard-sync %})): It turns out that `Arc::get_mut` did [not perform sufficient synchronization](https://github.com/rust-lang/rust/issues/51780), leading to a data race. Notice that I am just the messenger here, the bug was actually found by [Hai](https://people.mpi-sws.org/~haidang/) and [Jacques-Henri](https://jhjourdan.mketjh.fr/). Still, I'd like to use this opportunity to talk a bit about weak memory, synchronization and data races. @@ -173,11 +174,12 @@ However, `Arc` is one of those cases where the overhead induced by an exclusive As such, you are going to find plenty of atomic accesses in [the source code of `Arc`](https://github.com/rust-lang/rust/blob/c0955a34bcb17f0b31d7b86522a520ebe7fa93ac/src/liballoc/sync.rs#L201). And it turns out, as Hai and Jacques-Henri noticed when attempting to prove correctness of [`Arc::get_mut`](https://doc.rust-lang.org/beta/std/sync/struct.Arc.html#method.get_mut), that there is one place where `Relaxed` as used as an ordering, [but it really should have been `Acquire`](https://github.com/rust-lang/rust/pull/52031). -Discussing the exact details of the bug would probably fill another blog post (`Arc` is *really* subtle), but the high-level story is exactly like in our example above: Thanks to `Acquire`, an ordering is induced between the code that follows the `get_mut` and the code in another thread that dropped the last [`Weak`](https://doc.rust-lang.org/beta/std/sync/struct.Weak.html) reference. +Discussing the exact details of the bug would probably fill another blog post (`Arc` is *really* subtle), but the high-level story is exactly like in our example above: Thanks to `Acquire`, an ordering is induced between the code that follows the `get_mut` and the code in another thread that dropped the last other `Arc`, decrementing the reference count to 1. +The PR that fixed the problem contains [some more details in the comments](https://github.com/rust-lang/rust/pull/52031/files). With `Relaxed`, no such ordering is induced, so we have a data race. To be fair, it is very unlikely that this race could lead to real misbehavior, but I am still happy to know that we now have a proof that `Arc` is mostly[^1] correctly synchronized. -[^1]: "Mostly", you may wonder? Unfortunately it turns out that there is [one `Relaxed` access in `make_mut`](https://github.com/rust-lang/rust/blob/c0955a34bcb17f0b31d7b86522a520ebe7fa93ac/src/liballoc/sync.rs#L793) that Hai and Jacques-Henri have not yet been able to prove correct. However, this is likely fixable by making the entire proof more complicated. The version where that `Relaxed` is also replaced by `Acquire` *has* been proven correct. +[^1]: "Mostly", you may wonder? Unfortunately it turns out that there is [one `Relaxed` access in `make_mut`](https://github.com/rust-lang/rust/blob/c0955a34bcb17f0b31d7b86522a520ebe7fa93ac/src/liballoc/sync.rs#L793) that Hai and Jacques-Henri have not yet been able to prove correct. However, this is likely fixable by making the entire proof more complicated. The version where that `Relaxed` is also replaced by `Acquire` *has* been proven correct, albeit still against a model of relaxed memory accesses that is not quite as weak as C11. One last thing: I have previously claimed that our [first RustBelt paper]({{ site.baseurl }}{% post_url 2017-07-08-rustbelt %}) already verified correctness of `Arc`, how can there still be bugs?