X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/78aa53554dc5c4be58333e9481e28c5f09c6a9fc..46ffee68f2a482ecbc7cf2d684f43b4884312a9f:/personal/_posts/2017-07-17-types-as-contracts.md diff --git a/personal/_posts/2017-07-17-types-as-contracts.md b/personal/_posts/2017-07-17-types-as-contracts.md index 1227641..3ef565a 100644 --- a/personal/_posts/2017-07-17-types-as-contracts.md +++ b/personal/_posts/2017-07-17-types-as-contracts.md @@ -4,14 +4,14 @@ categories: internship rust forum: https://internals.rust-lang.org/t/types-as-contracts/5562 --- -Over the last couple of weeks of my [internship]({{ site.baseurl }}{% post_url +Over the last couple of weeks of my [internship]({% post_url 2017-05-23-internship-starting %}), I have been working on a proposal for the "Unsafe Code Guidelines". I got some very encouraging feedback at the [Mozilla All-Hands](https://internals.rust-lang.org/t/recapping-this-weeks-mozilla-all-hands/5465), and now I'd like to put this proposal out there and start having a discussion. Warning: Long post is long. You have been warned. -If you need more context on this topic, I suggest you read my post on [unsafe code guidelines and undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}) first. +If you need more context on this topic, I suggest you read my post on [unsafe code guidelines and undefined behavior]({% post_url 2017-07-14-undefined-behavior %}) first. ## 1 It's in the Types @@ -48,7 +48,7 @@ In particular, `simple` would validate `x` and `y` immediately after being calle ## 2 Specification Framework -I have [previously written about my thoughts on undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}): +I have [previously written about my thoughts on undefined behavior]({% post_url 2017-07-14-undefined-behavior %}): > I also think that tooling to *detect* UB is of paramount importance [...]. In fact, specifying a dynamic UB checker is a very good way to specify UB! @@ -56,8 +56,8 @@ Such a specification would describe the additional state that is needed at run-t Following this, I will in the following describe how we could check the contracts expressed by Rust's types for violations. I am considering a MIR program; if a Rust program is supposed to be checked, it has to be translated to MIR first. -(You can [read up on my explanation]({{ site.baseurl }}{% post_url 2017-05-23-internship-starting %}) of why I think this is a good way to proceed.) -The validation of reference types (in particular, their exclusive / read-only nature) will need some extra "instrumented" state -- also see [this previous post on the topic]({{ site.baseurl }}{% post_url 2017-06-06-MIR-semantics %}).[^1] +(You can [read up on my explanation]({% post_url 2017-05-23-internship-starting %}) of why I think this is a good way to proceed.) +The validation of reference types (in particular, their exclusive / read-only nature) will need some extra "instrumented" state -- also see [this previous post on the topic]({% post_url 2017-06-06-MIR-semantics %}).[^1] The extra state in my proposal consists of something akin to a reader-writer lock for every memory location. I am first going to describe these locks and how they affect program behavior, before explaining contract validation. @@ -283,11 +283,11 @@ For compound types like tuples, `struct` or `enum`, validation proceeds recursiv In particular, the enum discriminant is checked to be in-range (in particular, nothing can pass validation at an empty enum type). However, because the recursive validation will only lock memory that's actually covered by a field, we also have to acquire the appropriate lock for padding bytes and the enum discriminant here. -Finally, at a reference type, two things happen. -First, the reference itself is stored in memory somewhere; this memory has to be locked just like the validation of `i32` locks the memory used to store the integer. -The reference is also checked to be non-NULL and properly aligned for the type it points to. -Furthermore, validation proceeds recursively after *dereferencing* the reference. -Crucially, the `mutbl` and `lft` for this recursive call are taking the reference type into account: +Finally, when encountering a box or reference type, two things happen. +First, the pointer itself is stored in memory somewhere; this memory has to be locked just like the validation of `i32` locks the memory used to store the integer. +The pointer is also checked to be non-NULL and properly aligned for the type it points to. +Furthermore, validation proceeds recursively after *dereferencing* the pointer. +Crucially, for the case of validating a reference, the `mutbl` and `lft` for this recursive call are taking the type into account: If `lft` was `None` and this reference's lifetime ends within the function (i.e., there is a corresponding `EndRegion` somewhere), it is now set to the reference's lifetime. If `mutbl` was mutable, it becomes immutable when following a shared reference. @@ -305,7 +305,7 @@ This grants the functions write access to memory that we are explicitly passing Suspend validation occurs before taking a reference, using the lifetime of the new reference. This encodes the fact that when the lifetime ends (and hence the lock is being reacquired), we are the exclusive owner of this memory again. -**Update**: It has been pointed out to me that another way for values to flow back into a function is through mutable references. Good catch! I will need to extend the validation machinery to also perform acquire validation when a write lock is reacquired. **/Update** +**Update:** It has been pointed out to me that another way for values to flow back into a function is through mutable references. Good catch! I will need to extend the validation machinery to also perform acquire validation when a write lock is reacquired. **/Update** To summarize, validation statements are emitted during MIR construction in the following places: - At the beginning of the function, we perform acquire validation of the arguments. @@ -320,9 +320,8 @@ One topic that needs special treatment in this context is interior mutability. If a function takes an `x: &Cell`, following the rules above, it will acquire a read lock of `*x` for the duration of the function call, making `*x` immutable. Clearly, we do not want to do that -- calling `x.set` *will* actually mutate `*x`, and mutating through a shared reference is exactly the point of using `Cell`! -Lucky enough, the compiler *already* says that interior mutability is only allowed via [`UnsafeCell`](https://doc.rust-lang.org/beta/core/cell/struct.UnsafeCell.html). -We can use this for our purposes: To adjust validation for interior mutability, we *stop* our recursive descent and do not do anything when reaching an `UnsafeCell` *while `mutbl` indicates we are in immutable mode*. -(`&mut UnsafeCell` is not affected.) +Lucky enough, the compiler *already* says that interior mutability is only allowed via [`UnsafeCell`](https://doc.rust-lang.org/stable/core/cell/struct.UnsafeCell.html). +We can use this for our purposes: To adjust validation for interior mutability, we *stop* our recursive descent and do not do anything when reaching an `UnsafeCell` *while `mutbl` indicates we are in immutable mode* -- `&mut UnsafeCell` is not affected. In particular, no locks are acquired. This justifies calling `set` on a shared reference and having the value changed. Of course, it also means we cannot do some of the optimizations we discussed above -- but that's actually exactly what we want!