X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/ba97ef876600a735eb3d0aefdc567620a246aeb4..51f3cf55b720feb9172a25f401a5577c22467358:/personal/_posts/2018-08-22-two-kinds-of-invariants.md diff --git a/personal/_posts/2018-08-22-two-kinds-of-invariants.md b/personal/_posts/2018-08-22-two-kinds-of-invariants.md index 16fad75..785a9fd 100644 --- a/personal/_posts/2018-08-22-two-kinds-of-invariants.md +++ b/personal/_posts/2018-08-22-two-kinds-of-invariants.md @@ -1,5 +1,5 @@ --- -title: "Two Kinds of Invariants" +title: "Two Kinds of Invariants: Safety and Validity" categories: internship rust forum: https://internals.rust-lang.org/t/two-kinds-of-invariants/8264 --- @@ -29,7 +29,8 @@ In fact, this is the main point of the safety invariant: Based on that, it should not be hard to fill in the invariant for some more types. For example, `&bool` is an aligned, non-NULL reference to allocated read-only memory (because safe code can dereference) such that the data stored in that memory is safe at `bool` (because safe code can work on that `bool` after dereferencing). This kind of "structural recursion" is happening everywhere for safety invariants: `x` is safe for `(T, U)` if `x.0` is safe for `T` and `x.1` is safe for `U`. -`x` is safe for `Option` if either the discriminant is `None`, or it is `Some` and the data is safe for `T`. +As far as enums is concerned, for example `x` is safe for `Option` if either the discriminant is `None`, or it is `Some` and the data is safe for `T`. +Similarly, `x` is never safe for `!` -- there just is no valid discriminant that `x` could have. When we talk about integer types like `i32`, it is tempting to say that *all* data is safe at `i32`. However, that does not actually work: As [I discussed in a previous post]({% post_url 2018-07-24-pointers-and-bytes %}), the "bytes" that make up data have more possible states than just `0..256`. @@ -38,6 +39,7 @@ Some operations like `if x != 0 { ... }` are undefined behavior if `x` is uninit For reference types, some of the details are [harder to define]({% post_url 2018-04-05-a-formal-look-at-pinning %}), but we can still roughly say that a pointer is safe for `&'a mut T` if it is aligned, non-NULL and points to allocated memory that, for lifetime `'a`, no other pointer accesses, and such that the data stored in memory at that location is safe for `T`. Similarly (and [ignoring interior mutability]({% post_url 2018-01-31-sharing-for-a-lifetime %})), a pointer is safe for `&'a T` if it is aligned, non-NULL and points to allocated memory that is not mutated for lifetime `'a` and such that the data stored in memory is safe for `T`. The safety of `&bool` that we discussed above is just an instance of this general definition. +In particular, it is impossible to have a safe `&!` because it would have to point to a safe `!`, which does not exist. The safety invariant gets particularly interesting for higher-order data like function pointers or trait objects. For example, when is `f` safe at type `fn(bool) -> i32`? @@ -139,6 +141,7 @@ And finally, all safe data must also be valid, so the safety invariant also impo For `bool`, we stick to the same invariant as for safety: It must be `true` or `false`. This invariant is already exploited by enum layout optimizations, so violating it *at any time* can break your code. More generally speaking, validity requires correct enum discriminants. +One consequence of this is that no data is valid at `!`. Similar to safety, validity is structural -- so validity for `(T, U)` requires that the first field is valid at `T`, and the second field is valid at `U`. @@ -151,6 +154,9 @@ It is often useful to be able to call a method that takes `&mut` on a partially Doing all that with raw pointers is much less ergonomic, and making people write less ergonomic code will inadvertently lead to more bugs. Moreover, the *benefits* of making this validity for reference structural (i.e., `&T` is valid only if it points to a valid `T`) seem slim (though @comex [came up with an optimization that would require this](https://internals.rust-lang.org/t/stacked-borrows-an-aliasing-model-for-rust/8153/25?u=ralfjung)). +A consequence of this is would be that while `&!` does not have a safe inhabitant, it *is* possible to have data that is valid for `&!`. +After all, it does not have to point to a valid `!` (which would be impossible). + One point that we *already* assume, however, is that references are dereferencable. We even tell LLVM that this is the case. It seems natural to make that part of validity. @@ -174,7 +180,7 @@ My gut feeling is that it should not be (i.e., validity should require that `i32 I have talked about two kinds of invariants that come with every type, the safety invariant and the validity invariant. For unsafe code authors, the slogan summarizing this post is: -> *You must always be valid, but you must not always be safe.* +> *You must always be valid, but you must only be safe in safe code.* I think we have enough experience writing unsafe code at this point that we can reasonably discuss which validity invariants make sense and which do not -- and I think that it is high time that we do so, because many unsafe code authors are wondering about these exact things all the time.