From: Ralf Jung Date: Tue, 18 Jul 2017 01:46:03 +0000 (-0700) Subject: write way too long post on the unsafe code model X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/541a2b6f4ce2373ba4ae9c7878d510c9074a03a9 write way too long post on the unsafe code model --- diff --git a/personal/_drafts/types-as-contracts.md b/personal/_drafts/types-as-contracts.md deleted file mode 100644 index 89d99b3..0000000 --- a/personal/_drafts/types-as-contracts.md +++ /dev/null @@ -1,68 +0,0 @@ ---- -title: "Types as Contracts" -categories: internship rust ---- - -Over the last couple of weeks of my [internship]({{ site.baseurl }}{% post_url -2017-05-23-internship-starting %}), I have been working on a proposal for the "Unsafe Code Guidelines". -I got some very encouraging feedback at the [Mozilla All-Hands](https://internals.rust-lang.org/t/recapping-this-weeks-mozilla-all-hands/5465), -and now I'd like to put this proposal out there and start having a discussion. - -Warning: Long post is long. You have been warned. - -If you need more context on unsafe code guidelines, I suggest you read my post on [undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}) first. - -## It's in the Types - -Consider the following simple function: -{% highlight rust %} -fn simple(x: &mut i32, y: &mut f32) -> i32 { - *x = 3; - *y = 4.0; - *x -} -{% endhighlight %} - -As discussed in my previous post, we would like an unsafe Rust program that calls this function with aliasing pointers to be UB. -In my view, the reason this is UB is given by the *type* of the function and, in particular, its arguments: `&mut` expresses uniqueness, so the memory pointed to by an `&mut` must not alias with anything else. -Borrowing some terminology from related fields, we can see the function type as stating a *contract*: We can call `simple` with non-aliasing pointers, and the function will behave. -If we violate the contract, all bets are off. - -To realize this idea, we need to define, for every type, what is the contract associated with that type -- when are elements of this type *valid*? -For some types, that's simple: `i32` is any four-byte value. `bool` is one byte, either `0` or `1`. -Compound types like `(T, U)` are defined in terms of their constituent types: The first component must be a valid `T`, the second component a valid `U`. - -Where it gets interesting is the reference types: `&mut T` says that we have a (non-null, properly aligned) pointer. -Whatever the pointer points to, must be a valid `T`. -*Moreover*, the memory range covered by this pointer (whose extent is given by the size of `T`) is exclusively accessed by this function -- no concurrent thread performs any access, and when we call unknown code and do *not* pass on the reference, we can rely on that unknown code not accessing this memory either. - -Similarly, for `&T`, we can rely on no-one performing any write access to this memory. -At the same time, the contract for `&T` also says that we ourselves will not write to this memory either. -This is a nice example showing that contracts go both ways: The function can rely on the other party (the caller) having done their part by providing valid arguments, but the function also has to make guarantees itself. -Another example of a guarantee provided by functions is their promise to provide a valid return value when they are done. - -My proposal for the unsafe code guidelines then is to validate the contract described by a function's types at the function boundaries, and to make it UB for the contract to be violated. -In particular, `simple` would validate `x` and `y` immediately after being called, and it would be UB for them to alias. - - -## Specification - -I have [previously written]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}): - -> I also think that tooling to *detect* UB is of paramount importance [...]. -In fact, specifying a dynamic UB checker is a very good way to specify UB! -Such a specification would describe the additional state that is needed at run-time to then *check* at every operation whether we are running into UB. - -Following this, I will in the following describe how to check for contract violations. -I am considering a MIR program; if a Rust program is supposed to be checked, it has to be translated to MIR first. -(You can [read up on my explanation]({{ site.baseurl }}{% post_url 2017-05-23-internship-starting %}) of why I think this is a good way to proceed.) -The validation of reference types (in particular, their exclusive / read-only nature) will need some extra "instrumented" state -- also see [this previous post on the topic]({{ site.baseurl }}{% post_url 2017-06-06-MIR-semantics %}).[^1] -The extra state in my proposal consists of something akin to a reader-writer lock for every memory location. -I am first going to describe these locks and how they affect program behavior, before explaining contract validation. - -[^1]: If at this point you are under the impression that all these previous posts have been collectively building up towards this proposal, you are not entirely wrong. - -### Memory locks - -### Validation -