From: Ralf Jung Date: Sat, 15 Jul 2017 05:03:36 +0000 (-0700) Subject: Starting working on unsafe guidelines post X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/0577c00521c5c2a8dd0106ee6bdc0e2fbcd36f79 Starting working on unsafe guidelines post --- diff --git a/ralf/_drafts/types-as-contracts.md b/ralf/_drafts/types-as-contracts.md new file mode 100644 index 0000000..5b1974f --- /dev/null +++ b/ralf/_drafts/types-as-contracts.md @@ -0,0 +1,68 @@ +--- +title: "Types as Contracts" +categories: internship rust +--- + +Over the last couple of weeks of my [internship]({{ site.baseurl }}{% post_url +2017-05-23-internship-starting %}), I have been working on a proposal for the "Unsafe Code Guidelines". +I got some very encouraging feedback at the [Mozilla All-Hands](https://internals.rust-lang.org/t/recapping-this-weeks-mozilla-all-hands/5465), +and now I'd like to put this proposal out there and start having a discussion. + +Warning: Long post is long. You have been warned. + +If you need more context on unsafe code guidelines, I suggest you read my post on [undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}) first. + +## It's in the Types + +Consider the following simple function: +{% highlight rust %} +fn simple(x: &mut i32, y: &mut f32) -> i32 { + *x = 3; + *y = 4.0; + *x +} +{% endhighlight %} + +As discussed in my previous post, we would like an unsafe Rust program that calls this function with aliasing pointers to be UB. +In my view, the reason this is UB is given by the *type* of the function and, in particular, its arguments: `&mut` expresses uniqueness, so the memory pointed to by an `&mut` must not alias with anything else. +Borrowing some terminology from related fields, we can see the function type as stating a *contract*: We can call `simple` with non-aliasing pointers, and the function will behave. +If we violate the contract, all bets are off. + +To realize this idea, we need to define, for every type, what is the contract associated with that type -- when are elements of this type *valid*? +For some types, that's simple: `i32` is any four-byte value. `bool` is one byte, either `0` or `1`. +Compound types like `(T, U)` are defined in terms of their constituent types: The first component must be a valid `T`, the second component a valid `U`. + +Where it gets interesting is the reference types: `&mut T` says that we have a (non-null, properly aligned) pointer. +Whatever the pointer points to, must be a valid `T`. +*Moreover*, the memory range covered by this pointer (whose extent is given by the size of `T`) is exclusively accessed by this function -- no concurrent thread performs any access, and when we call unknown code and do *not* pass on the reference, we can rely on that unknown code not accessing this memory either. + +Similarly, for `&T`, we can rely on no-one performing any write access to this memory. +At the same time, the contract for `&T` also says that we ourselves will not write to this memory either. +This is a nice example showing that contracts go both ways: The function can rely on the other party (the caller) having done their part by providing valid arguments, but the function also has to make guarantees itself. +Another example of a guarantee provided by functions is their promise to provide a valid return value when they are done. + +My proposal for the unsafe code guidelines then is to validate the contract described by a function's types at the function boundaries, and to make it UB for the contract to be violated. +In particular, `simple` would validate `x` and `y` immediately after being called, and it would be UB for them to alias. + + +## Specification + +I have [previously written]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}): + +> I also think that tooling to *detect* UB is of paramount importance [...]. +In fact, specifying a dynamic UB checker is a very good way to specify UB! +Such a specification would describe the additional state that is needed at run-time to then *check* at every operation whether we are running into UB. + +Following this, I will in the following describe how to check for contract violations. +I am considering a MIR program; if a Rust program is supposed to be checked, it has to be translated to MIR first. +(You can [read up on my explanation]({{ site.baseurl }}{% post_url 2017-05-23-internship-starting %}) of why I think this is a good way to proceed.) +The validation of reference types (in particular, their exclusive / read-only nature) will need some extra "instrumented" state -- also see [this previous post on the topic]({{ site.baseurl }}{% post_url 2017-06-06-MIR-semantics %}).[^1] +The extra state in my proposal consists of something akin to a reader-writer lock for every memory location. +I am first going to describe these locks and how they affect program behavior, before explaining contract validation. + +[^1]: If at this point you are under the impression that all these previous posts have been just building up and are now coming together in this proposal, you are not entirely wrong. + +### Memory locks + +### Validation +