announce my ETH position :-))

[web.git] / personal / _posts / 2016-01-09-the-scope-of-unsafe.md
diff --git a/personal/_posts/2016-01-09-the-scope-of-unsafe.md b/personal/_posts/2016-01-09-the-scope-of-unsafe.md

index 6c69d0970e524d3b7377e9e992f4a06779e4b89c..96a7eb41bb1fe91f41572200b806f4eccda87a2e 100644 (file)
--- a/personal/_posts/2016-01-09-the-scope-of-unsafe.md
+++ b/personal/_posts/2016-01-09-the-scope-of-unsafe.md
@@ -1,6 +1,9 @@
  ---
  title: The Scope of Unsafe
  categories: research rust
  ---
  title: The Scope of Unsafe
  categories: research rust
+reddit: /rust/comments/4065l2/the_scope_of_unsafe/
+license: CC BY-SA 4.0
+license-url: https://creativecommons.org/licenses/by-sa/4.0/
  ---
  
  I'd like to talk about an important aspect of dealing with unsafe code, that still regularly seems to catch people on the wrong foot:
  ---
  
  I'd like to talk about an important aspect of dealing with unsafe code, that still regularly seems to catch people on the wrong foot:
@@ -10,9 +13,11 @@ I'd like to talk about an important aspect of dealing with unsafe code, that sti
  The "scope" in the title refers to the extent of the code that has to be manually checked for correctness, once `unsafe` is used.
  What I am saying is that the scope of `unsafe` is larger than the `unsafe` block itself.
  
  The "scope" in the title refers to the extent of the code that has to be manually checked for correctness, once `unsafe` is used.
  What I am saying is that the scope of `unsafe` is larger than the `unsafe` block itself.
  
-It turns out that the underlying reason for this observation is also a nice illustration for the concept of *semantic types* that comes up in my [work on formalizing Rust]({{ site.baseurl }}{% post_url 2015-10-12-formalizing-rust %}) (or rather, its type system).
+It turns out that the underlying reason for this observation is also a nice illustration for the concept of *semantic types* that comes up in my [work on formalizing Rust]({% post_url 2015-10-12-formalizing-rust %}) (or rather, its type system).
  Finally, this discussion will once again lead us to realize that we rely on our type systems to provide much more than just type safety.
  
  Finally, this discussion will once again lead us to realize that we rely on our type systems to provide much more than just type safety.
  
+**Update (Jan 11th):** Clarified the role of privacy; argued why `evil` is the problem.
+
  <!-- MORE -->
  
  ## An Example
  <!-- MORE -->
  
  ## An Example
@@ -35,7 +40,7 @@ Roughly speaking, `ptr` points to the heap-allocated block of memory holding the
  It is very easy to add a function to `Vec` that contains no `unsafe` code, and still breaks the safety of the data structure:
  {% highlight rust %}
  impl Vec<T> {
  It is very easy to add a function to `Vec` that contains no `unsafe` code, and still breaks the safety of the data structure:
  {% highlight rust %}
  impl Vec<T> {
-    fn evil(&mut self) {
+    pub fn evil(&mut self) {
          self.len += 2;
      }
  }
          self.len += 2;
      }
  }
@@ -48,7 +53,7 @@ Oops!
  
  So, this example clearly shows that to evaluate the safety of types like `Vec`, we have to look at *every single function* provided by that data structure, even if it does not contain any `unsafe` code.
  
  
  So, this example clearly shows that to evaluate the safety of types like `Vec`, we have to look at *every single function* provided by that data structure, even if it does not contain any `unsafe` code.
  
-## The reason why
+## The Reason Why
  
  Why is it the case that a safe function can break `Vec`?
  How can we even say that it is the safe function which is at fault, rather than some piece of `unsafe` code elsewhere in `Vec`?
  
  Why is it the case that a safe function can break `Vec`?
  How can we even say that it is the safe function which is at fault, rather than some piece of `unsafe` code elsewhere in `Vec`?
@@ -59,13 +64,19 @@ More precisely speaking, `ptr` points to an array of type `T` and size `cap`, of
  The function `evil` above violates this invariant, while all the functions actually provided by `Vec` (including the ones that are implemented unsafely) preserve the invariant.
  That's why `evil` is the bad guy. (The name kind of gave it away, didn't it?)
  
  The function `evil` above violates this invariant, while all the functions actually provided by `Vec` (including the ones that are implemented unsafely) preserve the invariant.
  That's why `evil` is the bad guy. (The name kind of gave it away, didn't it?)
  
-This may seem obvious in hindsight, but I think it is actually fairly subtle.
+Some will disagree here and say: "Wait, but there is some `unsafe` code in `Vec`, and without that `unsafe` code `evil` would be all right, so isn't the problem actually that `unsafe` code?"
+This observation is correct, however I don't think this position is useful in practice.
+`Vec` with `evil` clearly is a faulty data structure, and to fix the bug, we would remove `evil`.
+We would never even think about changing the `unsafe` code such that `evil` would be okay, that would defeat the entire purpose of `Vec`.
+In that sense, it is `evil` which is the problem, and not the `unsafe` code.
+
+This may seem obvious in hindsight (and it is also [discussed in the Rustonomicon](https://doc.rust-lang.org/nightly/nomicon/working-with-unsafe.html)), but I think it is actually fairly subtle.
  There used to be claims on the interwebs that "if a Rust program crashes, the bug must be in some `unsafe` block". (And there probably still are.)
  Even academic researchers working on Rust got this wrong, arguing that in order to detect bugs in data structures like `Vec` it suffices to check functions involving unsafe code.
  That's why I think it's worth dedicating an entire blog post to this point.
  But we are not done yet, we can actually use this observation to learn more about types and type systems.
  
  There used to be claims on the interwebs that "if a Rust program crashes, the bug must be in some `unsafe` block". (And there probably still are.)
  Even academic researchers working on Rust got this wrong, arguing that in order to detect bugs in data structures like `Vec` it suffices to check functions involving unsafe code.
  That's why I think it's worth dedicating an entire blog post to this point.
  But we are not done yet, we can actually use this observation to learn more about types and type systems.
  
-## The semantic perspective
+## The Semantic Perspective
  
  There is another way to phrase the intuition of types having additional invariants:
  
  
  There is another way to phrase the intuition of types having additional invariants:
  
@@ -82,7 +93,7 @@ pub struct MyType<T> {
  We will define only one function for this type:
  {% highlight rust %}
  impl MyType<T> {
  We will define only one function for this type:
  {% highlight rust %}
  impl MyType<T> {
-    fn evil(&mut self) {
+    pub fn evil(&mut self) {
          self.len += 2;
      }
  }
          self.len += 2;
      }
  }
@@ -92,7 +103,7 @@ The two types are *syntactically* equal, and the same goes for the two `evil` fu
  Still, `MyType::evil` is a perfectly benign function (despite its name).
  How can this be?
  
  Still, `MyType::evil` is a perfectly benign function (despite its name).
  How can this be?
  
-Remember that in a [previous blog post]({{ site.baseurl }}{% post_url 2015-10-12-formalizing-rust %}), I argued that types have a *semantic* aspect.
+Remember that in a [previous blog post]({% post_url 2015-10-12-formalizing-rust %}), I argued that types have a *semantic* aspect.
  For example, a function is semantically well-typed if it *behaves* properly on all valid arguments, independently of how, *syntactically*, the function body has been written down.
  
  The reason for `MyType::evil` being fine, while `Vec::evil` is bad, is that semantically speaking, `Vec` is very different from `MyType` -- even though they look so similar.
  For example, a function is semantically well-typed if it *behaves* properly on all valid arguments, independently of how, *syntactically*, the function body has been written down.
  
  The reason for `MyType::evil` being fine, while `Vec::evil` is bad, is that semantically speaking, `Vec` is very different from `MyType` -- even though they look so similar.
@@ -120,7 +131,7 @@ This is just as bad as a function assigning `*f = 42u32` to a `f: &mut f32`.
  The difference between `Vec` and `MyType` is no less significant than the difference between `f32` and `u32`.
  It's just that the compiler has been specifically taught about `f32`, while it doesn't know enough about the semantics of `Vec`.
  
  The difference between `Vec` and `MyType` is no less significant than the difference between `f32` and `u32`.
  It's just that the compiler has been specifically taught about `f32`, while it doesn't know enough about the semantics of `Vec`.
  
-## The actual scope of unsafe
+## The Actual Scope of Unsafe
  
  At this point, you may be slightly worried about the safety of the Rust ecosystem.
  I just spent two sections arguing that the Rust compiler actually doesn't know what it is doing when it comes to checking functions that work on `Vec`.
  
  At this point, you may be slightly worried about the safety of the Rust ecosystem.
  I just spent two sections arguing that the Rust compiler actually doesn't know what it is doing when it comes to checking functions that work on `Vec`.
@@ -129,9 +140,10 @@ Or, to put it slightly differently: If the scope of `unsafe` grows beyond the sy
  Does it sprawl through all our code, silently infecting everything we write -- or is there some limit to its effect?
  
  As you probably imagined, of course there *is* a limit. Rust would not be a useful language otherwise.
  Does it sprawl through all our code, silently infecting everything we write -- or is there some limit to its effect?
  
  As you probably imagined, of course there *is* a limit. Rust would not be a useful language otherwise.
-The scope of `unsafe` ends at the next *abstraction boundary*.
-This means that everything outside of the `std::vec` module does not have to worry about `Vec`.
-Due to the privacy rules enforced by the compiler, code outside of that module cannot access the private fields of `Vec`, and hence it cannot tell the difference between the syntactic appearance of `Vec` and its actual, semantic meaning.
+*If* all your additional invariants are about *private* fields of your data structure, then the scope of `unsafe` ends at the next *abstraction boundary*.
+This means that everything outside of the `std::vec` module does not have to worry about `Vec`. 
+Due to the privacy rules enforced by the compiler, code outside of that module cannot access the private fields of `Vec`.
+That code does not have a chance to violate the additional invariants of `Vec` -- it cannot tell the difference between the syntactic appearance of `Vec` and its actual, semantic meaning.
  Of course, this also means that *everything* inside `std::vec` is potentially dangerous and needs to be proven to respect the semantics of `Vec`.
  
  ## Abstraction Safety
  Of course, this also means that *everything* inside `std::vec` is potentially dangerous and needs to be proven to respect the semantics of `Vec`.
  
  ## Abstraction Safety
@@ -142,11 +154,12 @@ This nicely brings us to another important point, which I can only glimpse at he
  
  If the type system of Rust lacked a mechanism to establish abstraction (i.e., if there were no private fields), type safety would not be affected.
  However, it would be very dangerous to write a type like `Vec` that has a semantic meaning beyond its syntactic appearance.
  
  If the type system of Rust lacked a mechanism to establish abstraction (i.e., if there were no private fields), type safety would not be affected.
  However, it would be very dangerous to write a type like `Vec` that has a semantic meaning beyond its syntactic appearance.
-Since users of `Vec` can accidentally perform invalid operations, there would actually be *no bound to the scope of `unsafe`*.
+All code could perform invalid operations like `Vec::evil`, operations that rely on the assumption that `Vec` is just like `MyType`.
+There would actually be *no bound to the scope of `unsafe`*.
  To formally establish safety, one would have to literally go over the entire program and prove that it doesn't misuse `Vec`.
  The safety promise of Rust would be pretty much useless.
  
  To formally establish safety, one would have to literally go over the entire program and prove that it doesn't misuse `Vec`.
  The safety promise of Rust would be pretty much useless.
  
-This should not be entirely surprising if you read the aforementioned [post about formalizing Rust's type system]({{ site.baseurl }}{% post_url 2015-10-12-formalizing-rust %}), where I already argued that a proof of (syntactic) type safety does not help to justify safety of most Rust programs out there.
+This should not be entirely surprising if you read the aforementioned [post about formalizing Rust's type system]({% post_url 2015-10-12-formalizing-rust %}), where I already argued that a proof of (syntactic) type safety does not help to justify safety of most Rust programs out there.
  I am now making a similar point, coming from a different angle.
  
  The fact that Rust programmers *can* use `Vec` and many other types without much care is a property of the type system that is independent of type safety.
  I am now making a similar point, coming from a different angle.
  
  The fact that Rust programmers *can* use `Vec` and many other types without much care is a property of the type system that is independent of type safety.