X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/79673d79e6824bedd00ca6f68608d4e0c1932105..de05330b63481d7edebb5338bf98760c19b9b339:/personal/_posts/2016-01-09-the-scope-of-unsafe.md?ds=inline diff --git a/personal/_posts/2016-01-09-the-scope-of-unsafe.md b/personal/_posts/2016-01-09-the-scope-of-unsafe.md index 6c69d09..757bd73 100644 --- a/personal/_posts/2016-01-09-the-scope-of-unsafe.md +++ b/personal/_posts/2016-01-09-the-scope-of-unsafe.md @@ -1,6 +1,7 @@ --- title: The Scope of Unsafe categories: research rust +reddit: /rust/comments/4065l2/the_scope_of_unsafe/ --- I'd like to talk about an important aspect of dealing with unsafe code, that still regularly seems to catch people on the wrong foot: @@ -13,6 +14,8 @@ What I am saying is that the scope of `unsafe` is larger than the `unsafe` block It turns out that the underlying reason for this observation is also a nice illustration for the concept of *semantic types* that comes up in my [work on formalizing Rust]({{ site.baseurl }}{% post_url 2015-10-12-formalizing-rust %}) (or rather, its type system). Finally, this discussion will once again lead us to realize that we rely on our type systems to provide much more than just type safety. +**Update (Jan 11th):** Clarified the role of privacy; argued why `evil` is the problem. + ## An Example @@ -35,7 +38,7 @@ Roughly speaking, `ptr` points to the heap-allocated block of memory holding the It is very easy to add a function to `Vec` that contains no `unsafe` code, and still breaks the safety of the data structure: {% highlight rust %} impl Vec { - fn evil(&mut self) { + pub fn evil(&mut self) { self.len += 2; } } @@ -59,7 +62,13 @@ More precisely speaking, `ptr` points to an array of type `T` and size `cap`, of The function `evil` above violates this invariant, while all the functions actually provided by `Vec` (including the ones that are implemented unsafely) preserve the invariant. That's why `evil` is the bad guy. (The name kind of gave it away, didn't it?) -This may seem obvious in hindsight, but I think it is actually fairly subtle. +Some will disagree here and say: "Wait, but there is some `unsafe` code in `Vec`, and without that `unsafe` code `evil` would be all right, so isn't the problem actually that `unsafe` code?" +This observation is correct, however I don't think this position is useful in practice. +`Vec` with `evil` clearly is a faulty data structure, and to fix the bug, we would remove `evil`. +We would never even think about changing the `unsafe` code such that `evil` would be okay, that would defeat the entire purpose of `Vec`. +In that sense, it is `evil` which is the problem, and not the `unsafe` code. + +This may seem obvious in hindsight (and it is also [discussed in the Rustonomicon](https://doc.rust-lang.org/nightly/nomicon/working-with-unsafe.html)), but I think it is actually fairly subtle. There used to be claims on the interwebs that "if a Rust program crashes, the bug must be in some `unsafe` block". (And there probably still are.) Even academic researchers working on Rust got this wrong, arguing that in order to detect bugs in data structures like `Vec` it suffices to check functions involving unsafe code. That's why I think it's worth dedicating an entire blog post to this point. @@ -82,7 +91,7 @@ pub struct MyType { We will define only one function for this type: {% highlight rust %} impl MyType { - fn evil(&mut self) { + pub fn evil(&mut self) { self.len += 2; } } @@ -129,9 +138,10 @@ Or, to put it slightly differently: If the scope of `unsafe` grows beyond the sy Does it sprawl through all our code, silently infecting everything we write -- or is there some limit to its effect? As you probably imagined, of course there *is* a limit. Rust would not be a useful language otherwise. -The scope of `unsafe` ends at the next *abstraction boundary*. -This means that everything outside of the `std::vec` module does not have to worry about `Vec`. -Due to the privacy rules enforced by the compiler, code outside of that module cannot access the private fields of `Vec`, and hence it cannot tell the difference between the syntactic appearance of `Vec` and its actual, semantic meaning. +*If* all your additional invariants are about *private* fields of your data structure, then the scope of `unsafe` ends at the next *abstraction boundary*. +This means that everything outside of the `std::vec` module does not have to worry about `Vec`. +Due to the privacy rules enforced by the compiler, code outside of that module cannot access the private fields of `Vec`. +That code does not have a chance to violate the additional invariants of `Vec` -- it cannot tell the difference between the syntactic appearance of `Vec` and its actual, semantic meaning. Of course, this also means that *everything* inside `std::vec` is potentially dangerous and needs to be proven to respect the semantics of `Vec`. ## Abstraction Safety @@ -142,7 +152,8 @@ This nicely brings us to another important point, which I can only glimpse at he If the type system of Rust lacked a mechanism to establish abstraction (i.e., if there were no private fields), type safety would not be affected. However, it would be very dangerous to write a type like `Vec` that has a semantic meaning beyond its syntactic appearance. -Since users of `Vec` can accidentally perform invalid operations, there would actually be *no bound to the scope of `unsafe`*. +All code could perform invalid operations like `Vec::evil`, operations that rely on the assumption that `Vec` is just like `MyType`. +There would actually be *no bound to the scope of `unsafe`*. To formally establish safety, one would have to literally go over the entire program and prove that it doesn't misuse `Vec`. The safety promise of Rust would be pretty much useless.