ralf/_posts/2017-07-17-types-as-contracts.md

   1 ---
   2 title: "Types as Contracts"
   3 categories: internship rust
   4 ---
   5
   6 Over the last couple of weeks of my [internship]({{ site.baseurl }}{% post_url
   7 2017-05-23-internship-starting %}), I have been working on a proposal for the "Unsafe Code Guidelines".
   8 I got some very encouraging feedback at the [Mozilla All-Hands](https://internals.rust-lang.org/t/recapping-this-weeks-mozilla-all-hands/5465),
   9 and now I'd like to put this proposal out there and start having a discussion.
  10 <!-- MORE -->
  11 Warning: Long post is long.  You have been warned.
  12
  13 If you need more context on this topic, I suggest you read my post on [unsafe code guidelines and undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}) first.
  14
  15 ## 1 It's in the Types
  16
  17 Consider the following simple function:
  18 {% highlight rust %}
  19 fn simple(x: &mut i32, y: &mut f32) -> i32 {
  20     *x = 3;
  21     *y = 4.0;
  22     *x
  23 }
  24 {% endhighlight %}
  25
  26 As discussed in my previous post, we would like an unsafe Rust program that calls this function with aliasing pointers to be UB.
  27 I'd like to propose that the fundamental reason this is UB is the *type* of the function and, in particular, its arguments: `&mut` expresses uniqueness, so the memory pointed to by an `&mut` must not alias with anything else.
  28 Borrowing some terminology from related fields, we can see the function type as stating a *contract*:  The `&mut` say that `simple` expects to be called with non-aliasing pointers.
  29 If we violate the contract, all bets are off.
  30
  31 To realize this idea, we need to define, for every type, what is the contract associated with that type. When are elements of any given type *valid*?
  32 For some types, that's simple: `i32` is any four-byte value. `bool` is one byte, either `0` or `1`.
  33 Compound types like `(T, U)` are defined in terms of their constituent types:  The first component must be a valid `T`, the second component a valid `U`.
  34
  35 Where it gets interesting is the reference types: `&mut T` says that we have a (non-null, properly aligned) pointer.
  36 Whatever the pointer points to must be a valid `T`.
  37 *Moreover*, the memory range covered by this pointer (whose extent is given by the size of `T`) is exclusively accessed by this function -- no concurrent thread performs any access, and when we call unknown code and do *not* pass on the reference, we can rely on that unknown code not accessing this memory either.
  38
  39 Similarly, for `&T`, we can rely on no-one performing any write access to this memory.
  40 At the same time, the contract for `&T` also says that we ourselves will not write to this memory either.
  41 This is a nice example showing that contracts go both ways:  The function can rely on the other party (the caller) having done their part by providing valid arguments, but the function also has to make guarantees itself.
  42 Another example of a guarantee provided by functions is their promise to provide a valid return value when they are done.
  43
  44 My proposal for the unsafe code guidelines then is to validate the contract described by a function's types at the function boundaries, and to make it UB for the contract to be violated.
  45 In particular, `simple` would validate `x` and `y` immediately after being called, and it would be UB for them to alias.
  46
  47
  48 ## 2 Specification Framework
  49
  50 I have [previously written about my thoughts on undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}):
  51
  52 > I also think that tooling to *detect* UB is of paramount importance [...].
  53 In fact, specifying a dynamic UB checker is a very good way to specify UB!
  54 Such a specification would describe the additional state that is needed at run-time to then *check* at every operation whether we are running into UB.
  55
  56 Following this, I will in the following describe how we could check the contracts expressed by Rust's types for violations.
  57 I am considering a MIR program; if a Rust program is supposed to be checked, it has to be translated to MIR first.
  58 (You can [read up on my explanation]({{ site.baseurl }}{% post_url 2017-05-23-internship-starting %}) of why I think this is a good way to proceed.)
  59 The validation of reference types (in particular, their exclusive / read-only nature) will need some extra "instrumented" state -- also see [this previous post on the topic]({{ site.baseurl }}{% post_url 2017-06-06-MIR-semantics %}).[^1]
  60 The extra state in my proposal consists of something akin to a reader-writer lock for every memory location.
  61 I am first going to describe these locks and how they affect program behavior, before explaining contract validation.
  62
  63 [^1]: If at this point you are under the impression that all these previous posts have been collectively building up towards this proposal, you are not entirely wrong.
  64
  65 This is not (yet) a proper specification in that some important details are still left open.
  66 The goal is to lay down a framework as the basis for discussion.
  67 Many things already work as expected, and hopefully the remaining issues can be sorted out by tuning some of the details.
  68
  69 ### 2.1 Memory Locks
  70
  71 Every location comes with an additional reader-writer "lock".
  72 These locks have nothing to do with concurrency, in fact they are more like `RefCell`.
  73 Locks are used to verify the guarantees provided by references:
  74 Shared references point to memory that is not mutated by anyone, and mutable references point to memory that is not *accessed* (read from or written to) by anyone *else*.
  75 Notably, however, unlocked memory is free to be accessed by anyone -- nobody is making any particular assumptions about this memory.
  76
  77 To track these locks, we store a `locks: Vec<LockInfo>` for every location, with `LockInfo` defined as:
  78 {% highlight rust %}
  79 struct LockInfo {
  80     write_lock: bool,
  81     frame: usize,
  82     lft: Option<Lifetime>,
  83 }
  84 {% endhighlight %}
  85 (Notice that an implementation of this specification is free to track locks in a different way, as long as the effective behavior is the same.
  86 The reason I am giving concrete types here is just to make the specification more precise.)
  87
  88 When a lock is acquired, we record which stack frame is the owner of this lock (starting at 0 for `main`, and counting up).
  89 Now, on every read access, we check whether there is a write lock of this location held *by another function*.
  90 If yes, the access is UB.
  91 Furthermore, write accesses and acquiring a write lock are UB if there is a read lock held *or* if there is a write lock held by another function.
  92 Finally, acquiring a read lock is UB if there is a write lock held by anyone, including the function requesting a read lock.
  93
  94 Locks are typically released automatically when they expire.
  95 To this end, the function acquiring a lock can optionally specify the *lifetime* of the lock.
  96 Not specifying a lifetime implies that the lock is held until the function returns.
  97 For the purpose of this specification, it does not matter what exactly a lifetime is; it is only relevant at which point it *ends*.
  98 Lucky enough, MIR recently gained [`EndRegion` statements](https://github.com/rust-lang/rust/pull/39409) that we can use for this purpose:
  99 When `EndRegion(lft)` is executed, all locks with lifetime `lft` are released.
 100 Beyond this, a write lock can also be explicitly released by the function that acquired it, even before the lock's lifetime ends.
 101 (It does not turn out to ever be necessary to do this for read locks in my proposal.)
 102
 103 There is one more mechanism we need to introduce:  Acquired write locks can be *suspended* for a lifetime.
 104 This means the lock is temporarily released, but meant to be reacquired later.
 105 To this end, the `LockInfo` is removed from the location's `locks` and put into a `HashMap<Lifetime, (Location, LockInfo)>` which is stored somewhere separately from the program's memory.
 106 When the `EndRegion` corresponding to the given lifetime is reached, the lock is reacquired and put back into the `locks` of the given location.
 107
 108 ### 2.2 Validation
 109
 110 With the lock mechanism established, we can now look at the core of the proposal:  Validating the contract described by a type.
 111 This is done by a new MIR statement, `Validate`, which takes two arguments:  An *operation* (which we will discuss shortly), and a list of `(Lvalue, Ty)` pairs defining which lvalues is supposed to be validated at which type.
 112 Crucially, this is a non-erased type -- all the lifetime information must be present.
 113
 114 The most important validation operation is *acquire validation*.
 115 As the name indicates, this is about acquiring the appropriate locks.
 116 It also otherwise makes sure that values are valid at the given type, e.g., references have to be non-NULL and aligned.
 117
 118 The core of acquire validation is an operation with a signature roughly like this:
 119 {% highlight rust %}
 120 fn acquire_validate(lval: Lvalue, ty: Ty, mutbl: Mutability, lft: Option<Lifetime>)
 121 {% endhighlight %}
 122 Why are mutability and lifetime showing up here?
 123 The reason is that reference types have a "transitive" effect down the type hierarchy: `&T` makes everything reachable through this reference read-only, even if `T` involves `&mut`.
 124 Similarly, in case of nested references, it is the *outermost* reference that defines the lifetime for which access through this pointer is granted.
 125 (The compiler makes sure that inner references always have longer lifetimes than outer references.)
 126
 127 `acquire_validate` recursively traverses the type hierarchy.
 128 When it encounters a primitive type like `i32`, it acquires a lock of `size_of::<ty>` many locations starting at `lval` for lifetime `lft` -- this will be a write or read lock depending on whether `mutbl` says this data is mutable or not.[^2]
 129 In other words, memory is locked "at the leaves" of the type structure, rather than locking immediately when we see a reference type.
 130 We will come back later to why this is necessary.
 131 Validation also makes sure that the value stored in memory is valid at the given type.
 132 For a `u32`, this just amounts to checking that the memory is properly initialized.
 133
 134 [^2]: For simplicity, we pretend lvalues are just locations in memory.  Actually, lvalues can also refer to variables not in memory (like non-addressable stack locals), but these do not need to be locked.  Furthermore, lvalues for unsized types carry the additional information stored in a fat pointer, i.e., a length (for slices) or a vtable (for trait objects); this is handled appropriately when validating the respective types.
 135
 136 For compound types like tuples, `struct` or `enum`, validation proceeds recursively on the fields.
 137 In particular, the enum discriminant is checked to be in-range (in particular, this empty enums always invalid).
 138 However, because the recursive validation will only lock memory that's actually covered by a field, we also have to acquire the appropriate lock for padding bytes and the enum discriminant here.
 139
 140 Finally, at a reference type, two things happen.
 141 First, the reference itself is stored in memory somewhere; this memory has to be locked just like the validation of `i32` locks the memory used to store the integer.
 142 The reference is also checked to be non-NULL and properly aligned for the type it points to.
 143 Furthermore, validation proceeds recursively after *dereferencing* the reference.
 144 Crucially, the `mutbl` and `lft` for this recursive call are taking the reference type into account:
 145 If `lft` was `None` and this reference's lifetime ends within the function (i.e., there is a corresponding `EndRegion` somewhere), it is now set to the reference's lifetime.
 146 If `mutbl` was mutable, it becomes immutable when following a shared reference.
 147
 148 There are two other validation operations besides acquire validation: release validation, and suspend validation.
 149 These serve to release the appropriate locks before handing references to foreign code, and to handle the operation of taking a reference.
 150 (Bear with me for just a little longer, we will see examples of this soon.)
 151 Suspend validation takes an additional parameter, namely the lifetime for which locks are to be suspended.
 152 Both operations recursively follow the type just like acquire validation, and for all memory they visit, they release/suspend *write* locks.
 153 Read locks are not touched, which corresponds to the fact that shared references are `Copy` and hence never "given up":
 154 Shared references always remain usable until their lifetime expires, at which point the read lock is released automatically.
 155
 156 Validation statements are emitted during MIR construction in the following places:
 157 - At the beginning of the function, we perform acquire validation of the arguments.
 158 - When taking a reference, we suspend the referee for the reference lifetime and acquire the referent.
 159 - When calling another function, we release the arguments and acquire the return value.
 160
 161 ### 2.3 Examples
 162
 163 After all this rather dry theory, let us look at some examples demonstrating how validation works, and how it enables optimizations.
 164
 165 #### Validating function arguments
 166
 167 Here's the sample code from above with validation annotated explicitly:
 168 {% highlight rust %}
 169 fn simple(x: &mut i32, y: &mut f32) -> i32 {
 170     Validate(Acquire, [x, y]);
 171     *x = 3;
 172     *y = 4.0;
 173     *x
 174 }
 175 {% endhighlight %}
 176 For simplicity's sake, I just pretend that validation commands were present not just in MIR, but also in the Rust surface language. Also, while validation takes *pairs* of lvalues and types, the types can be inferred, so I am only listing lvalues here.
 177
 178 Let us now see what happens when `x` and `y` alias.
 179 I mentioned above that we want this case to be UB, because we want to be able to reorder the two stores.
 180 Indeed, it will turn out that the validation describes above triggers UB here.
 181
 182 There are no references taken and no other function is called, so the only validation that happens is acquiring the arguments at the beginning of the function.
 183 This validates our contract with the caller.
 184 `x` is of reference type, so validation will first acquire the write lock for the memory used to store `x` itself (that's somewhere on the stack).
 185 Next, it will follow the reference and recursively validate whatever `*x` is valid at type `i32`.
 186 In the recursive call, the `lft` will still be `None` because the lifetime of the reference outlives the function call.
 187 The recursive call then acquires the write lock for `*x` and checks that memory is initialized.
 188 This completes validating `x`.
 189 Validation of `y` proceeds exactly the same way.
 190 In particular, validation will acquire a write lock for `*y`.
 191 However, since `x` and `y` alias, this will overlap the write lock that was acquired for `*x`!
 192 This is how we get the UB that we were looking for.
 193
 194 In other words, acquiring locks for everything we can reach through references implicitly checks for aliasing, making sure that mutable references do not overlap with anything else.
 195
 196 #### Validating function calls
 197
 198 The second example demonstrates how validation, and memory locks in particular, let us reason about the actions of unknown external functions:
 199 {% highlight rust %}
 200 // Defined elsewhere: fn fun(x: &i32, z: &mut (i32, i32)) { ... }
 201
 202 fn example(x: &i32, y: &mut i32, z: &mut (i32, i32)) {
 203     Validate(Acquire, [x, y, z]);
 204     let x_val = *x;
 205     let y_val = *y;
 206
 207     Validate(Release, [x, z]);
 208     fun(x, {z});
 209
 210     let x_val_2 = *x;
 211     *y = y_val;
 212 }
 213 {% endhighlight %}
 214 We would like to optimize this code and eliminate the write to `y` at the end of the function, arguing that we are just writing back the value that's already stored there.
 215 The reason we think we can do that is that we hold an exclusive reference to `*y`.
 216 We do not pass `y` to `fun`, and there most not be any other reference to this memory, so `fun` cannot touch `y`.
 217 And indeed, the validation rules make it so that it is UB for `fun` to access `y`.
 218
 219 Furthermore, we want to eliminate the load from `x` at the end, and instead re-use `x_val`, arguing that since `x` is a shared reference, it is UB for `nop` to change its value.
 220
 221 In the beginning of `example`, during acquire validation, we acquire the read lock for `*x` and the write lock for `*y` and `*z`.
 222 The release validation before calling `fun` releases all write locks covered by `x` and `z`.
 223 For `x`, this doesn't actually do anything, but the write lock on `z` is released here.
 224 (We assume there is no implicit reborrowing of `z` going on here. The curly braces make it so that `z` is moved, not borrowed, to `fun`.)
 225 So, when `fun` is called, we hold the read lock for `x` and the write lock for `y`.
 226 Consequently, if `fun` were to access `y` or write to `x`, we know that's UB -- so we can rely on this not to happen and perform our desired optimizations.
 227
 228 Notice that this optimization is extremely hard to perform e.g. in C, because to know that `fun` cannot access `y`, the compiler needs to do whole-program alias analysis -- a notoriously hard problem.
 229 In contrast, Rust's type system allows us to perform such optimizations intra-procedurally, without looking at other functions' code at all.
 230
 231 #### Validating taking a reference
 232
 233 In the last example, we deliberately *moved* `z` to `fun` to side-step any discussions about borrowing.
 234 However, typically, you would actually *reborrow* `z`, leading to additional validation calls around the operation that takes a reference.
 235 We will look at that case now.
 236 Let us also say that `fun` actually returns a reference with the same lifetime as `z`.
 237 Overall, our code now looks like this (with all reborrowing being explicit):
 238 {% highlight rust %}
 239 // Defined elsewhere: fn fun<'z>(x: &i32, z: &'z mut (i32, i32)) -> &'z mut i32  { ... }
 240
 241 fn example(x: &i32, y: &mut i32, z: &mut (i32, i32)) {
 242     Validate(Acquire, [x, y]);
 243     let x_val = *x;
 244     let y_val = *y;
 245
 246     {   // 'inner begins here
 247         Validate(Suspend('z), [z]);
 248         let z_for_fun = &'inner mut z;
 249         Validate(Acquire, [z_for_fun]);
 250
 251         Validate(Release, [x, z_for_fun]);
 252         let z_inner : &'inner mut i32 = fun(x, {z_for_fun});
 253         Validate(Acquire, [z_inner]);
 254
 255         *z_inner = 42;
 256     }   // 'inner ends here
 257
 258     let x_val_2 = *x;
 259     *y = y_val;
 260 }
 261 {% endhighlight %}
 262 Again we would like to eliminate the write to `y` and the read from `x`.
 263 To eliminate the write to `y` as being redundant, we have to prove, in particular, that the write to `z_inner` cannot affect `*y`.
 264
 265 As usual, we start by acquiring the read lock for `*x` and the write lock for `*y` and `*z`.
 266 When `z` gets reborrowed to be passed to `fun` with lifetime `'inner`, we *suspend* our lock of `*z` for this lifetime.
 267 This models the fact that, no matter what we do with the newly created reference, we know that when `'inner` ends, we get our exclusive access back.
 268 After the suspension, we don't hold a lock any more for what is now `*z_for_fun`, so we immediately acquire a write lock again.
 269 However, following the type of `z_for_fun`, we will acquire this lock only for lifetime `'inner`.
 270 (Releasing and then acquiring the write lock may sound somewhat redundant.
 271 However, notice that, if we were to take a shared reference, we would now acquire a read lock rather than a write lock.)
 272
 273 Next, we call `fun`.
 274 At this point, we release our write lock on `*z_for_fun` again.
 275 However, notice that the suspended lock of `*z` stays around -- so while we gave up control of `z` (aka `z_for_fun`) for lifetime `'inner`, we still know that we will get control back when `'inner` ends!
 276 `fun` is now free to do whatever it wants, as long as it does not violate our still-existing write lock of `*y` and the read lock of `*x`.
 277
 278 When `fun` returns, we obtain a new mutable reference.
 279 At this point, we again run acquire validation.
 280 In this case, we want to ensure that `z_inner` does not alias `y`.
 281 The locks ensure that `fun` cannot possibly change `y`, but we (`example`) can -- and if writing to `z_inner` were to change `*y`, that access would come from the stack frame holding the lock on `y`, so it would be permitted!
 282 However, we do want to still optimize away the write to `y` at the end of `example` as being redundant.
 283 To this end, we acquire-validate all data returned to us from other functions, making sure that references we obtain this way to not conflict with other references we already hold.
 284 Following the type of `z_inner`, we only acquire this lock for `'inner`.
 285
 286 Another way to look at acquiring the return value is that we are validating the contract we have with `fun`: The contract says that `fun` should return a reference to us that comes with the uniqueness guarantees of `&mut`, so we better make sure that `fun` actually delivered what it said it would.
 287
 288 Finally, `'inner` ends.
 289 Multiple things happen now:
 290 First, our lock of `z_inner` expires and is hence released.
 291 (We already released the lock of `z_for_fun` ourselves, so nothing is to be done there.)
 292 Next, remember that we suspended our lock of `z` for `'inner`; so now is the time that this lock gets reacquired.
 293 In other words, as the lifetime of the reborrow from `z` expires, we release all locks of memory that is tied to that lifetime, and then re-gain full exclusive access to `z`.
 294
 295 When we come to the write to `*y`, we know that this write is redundant because (a) we held the write lock for `*y` all the time, so nobody else could have written to it without triggering UB, and (b) we know `z_inner` does not alias with `y` (or else we would already have UB), so we ourselves have not written to `*y` either.
 296 Hence, we may optimize away the write.
 297
 298 ### 2.4 `UnsafeCell`
 299
 300 This concludes the examples.
 301 I hope you now have an idea of how memory locks and validation affect the execution when "bad" references are passed around, and how they validate the assumptions we need to make for optimizations that reorder memory accesses.
 302
 303 One topic that did not come up yet is interior mutability.
 304 If a function takes an `x: &Cell<i32>`, following the rules above, it will acquire a read lock of `*x` for the duration of the function call, making `*x` immutable.
 305 Clearly, we do not want to do that -- calling `x.set` *will* actually mutate `*x`, and mutating through a shared reference is exactly the point of using `Cell`!
 306
 307 Lucky enough, the compiler *already* says that interior mutability is only allowed via [`UnsafeCell`](https://doc.rust-lang.org/beta/core/cell/struct.UnsafeCell.html).
 308 We can use this for our purposes: To adjust validation for interior mutability, we *stop* our recursive descent and do not do anything when reaching an `UnsafeCell`.
 309 In particular, no locks are acquired.
 310 This justifies calling `set` on a shared reference and having the value changed.
 311 Of course, it also means we cannot do some of the optimizations we discussed above -- but that's actually exactly what we want!
 312 These optimizations are unsound in the presence of interior mutability.
 313
 314 Notice that for this to work, it is crucial that we acquire locks "at the leaves" of the type structure, and not just when we encounter a reference.
 315 For example, if we validate something at type `&(i32, Cell<i32>)`, we *do* want to acquire a read lock for the first component, but not for the second.
 316
 317 ### 2.5 Unsafe Code
 318
 319 There's one last issue that I want to cover before (finally) concluding this post: Unsafe code.
 320 So far, we have only considered entirely safe code, and we have seen how validation and memory locks enforce the guarantees associated with reference type.
 321 The type checker already ensures that safe code will never trigger a lock failure -- actually, you can see the validation statements as some form of "dynamic borrow checking", enforcing the same rules as the borrow checker does statically.
 322 The only way that these checks could ever fail is for unsafe code to feed "bad" values into safe code, and that is exactly the point:
 323 These proposed "unsafe code guidelines" restrict what unsafe code can do when interfacing with safe code.
 324
 325 However, *within* unsafe code, we have to give programmers a little more leeway.
 326 We cannot always make as strong assumptions as we did in safe code:
 327 For example, let us look at `mem::swap` (somewhat desugared):
 328 {% highlight rust %}
 329 // Defined elsewhere: unsafe fn ptr::swap_nonoverlapping<T>(x: *mut T, y: *mut T) { ... }
 330
 331 pub fn swap<T>(x: &mut T, y: &mut T) {
 332     Validate(Acquire, [x, y]);
 333     let x = x as *mut T;
 334     let y = y as *mut T;
 335     unsafe {
 336         Validate(Release, [x, y]);
 337         ptr::swap_nonoverlapping(x, y, 1);
 338     }
 339 }
 340 {% endhighlight %}
 341 In the beginning, we acquire the write lock for `x` and `y`, establishing that the two pointers do not alias -- this is exactly as expected.
 342 However, when we release the arguments to `ptr::swap_nonoverlaping`, nothing actually happens:
 343 We are only passing raw pointers that do not come with any guarantees.
 344 In particular, we keep the write locks on `x` and `y`!
 345 Now, that's a problem -- `ptr::swap_nonoverlaping` will of course write to both of them when it swaps their content, triggering UB.
 346
 347 First of all, let me note that this example actually demonstrates that we are validating the borrow checking gurantees correctly.
 348 As far as the borrow checker is concerned, `x` and `y` have never been borrowed or moved to anyone else, they were fully owned by `swap` all the time.
 349
 350 The trouble is that the ownership transfer to `ptr::swap_nonoverlaping` is not described by its type, and hence cannot be known to the compiler.
 351 This is possible because `ptr::swap_nonoverlapping` is an *unsafe* function that can make more assumptions than its type says, and rely on the caller to uphold these assumptions.
 352 The compiler has no way to know what these assumptions are.
 353
 354 To accommodate such code, I suggest that we emit fewer validation instruction when building MIR for functions containing unsafe blocks.
 355 For example, we could imagine that if a function contains an unsafe block, instead of the usual validation, we only perform an acquire in the beginning of the function, immediately followed by a release.
 356 This would verify that our arguments are well-formed, but would otherwise acknowledge that we just do not know how ownership is flowing inside such a function.
 357 Of course, this also disables some optimizations -- but maybe these are optimizations that we do not want to perform anyway in unsafe code, because they could end up breaking the program.
 358
 359 Anyway, this part of my proposal is certainly not yet final.
 360 This topic needs input from many people so as to safely cover the unsafe code people write while still (hopefully) permitting the optimizations people expect to happen.
 361 One potential middle-ground here is having a function attribute that controls which validation statements are emitted, or even providing some form of intrinsics for programmers to explicitly perform validation.
 362
 363 ## 3 Conclusion
 364
 365 I have described a mechanism to validate the contract expressed by Rust's types, in particular, the uniqueness and read-only guarantees provided by reference types.
 366 This validation describes what (safe) code can assume about its variables at run-time, and it enables some code motion optimizations that would otherwise require whole-program alias analysis.
 367
 368
 369 I said before that I think providing a way to *detect* UB is crucial, and the framework laid out above is designed with this in mind.
 370 In fact, I have been working on a miri-based implementation of this mechanism:
 371 [This rustc branch](https://github.com/RalfJung/rust/tree/mir-validate) adds the new `Validate` statement to MIR and adds a MIR pass to insert validation statements as described above, while [this miri branch](https://github.com/RalfJung/miri/tree/mir-validate) adds memory locks to miri and implements type-based validation.
 372 (Both of these branches are subject to frequent rebasing, so expect force pushes if you track them.)
 373 The implementation is not yet complete, and some of the terminology changed (it turns out writing things down helps you find good names :), so don't be confused if it looks a little different.
 374 Other than that though, feel free to play around and provide feedback.
 375 I hope the implementation can help us check whether unsafe code follows the rules I laid down here, and whether these rules match programmer intuition.
 376 (In fact, this is how I figured out that `mem::swap` violates the most strict interpretation of the rules -- I did expect a violation somewhere in libstd's unsafe code, but had no idea where it would show up.)
 377 We could even have a test suite to aid in fine-tuning the rules.
 378
 379 I also said that "we should strive for programmers' intuition agreeing with the [...] compiler".
 380 This is of course hard to measure, but one property that I think goes a long way here is that in this proposal, pointers can remain "thin".
 381 I probably should write a whole separate blog post on this topic, but the short version is that in my proposal, if a function has permission to access certain region of memory, then it does not matter how it performs the access -- if there are multiple pointers to this region around, they are all equally valid to use.
 382 This is in contrast to e.g. C's `restrict` or type-based alias analysis, which make a distinction based on *which pointer* is used to access certain regions of memory.
 383 Such models are fairly hard to write checkers for, as witness by there being no sanitizer.
 384 Moreover, they typically (from what I have seen so far) have some trouble supporting integer-pointer casts.
 385 If pointers have an "identity" beyond the address they point to, then which "identity" does a pointer that's casted from an integer obtain?
 386 I am not saying it is impossible to solve this issue, but it is a source of complication.
 387 It is also quickly a source of programmer confusion.
 388 In my experience, programmers that use pointers casted from integers typically like to think of pointers as being "just integers" and not having any "identity" beyond the address they point to (and indeed, the fact that such casts work seems to indicate that they are right).
 389 I would be much less worried about unsafe Rust code being broken by optimizations if we can make it so that this mental model is actually right.
 390 Wouldn't it be great if we could add "Has the simple model of pointers that everybody incorrectly *thinks* C has" to the list of Rust selling points?
 391 I think this aligns well with Rust's slogan "Hacking without fear" -- and without compromise; remember we still get all the optimizations!
 392
 393 Congratulations!
 394 You have reached the end of this blog post.
 395 Thanks a lot for reading.
 396 If you have any comments, please raise them -- the entire purpose of this post is to get the proposal out there so that we can see if this approach can suit the needs of Rust programmers and compiler writers alike.
 397 If there are explanations you think are missing or confusing, please bring that up as well; I may then update this post accordingly.
 398 The concrete proposal will sure need some fixing here and there, but my hope is that the general approach of a contract-like, type-driven validation mechanism ends up being useful.
 399 So, keep the comments flowing -- and safe hacking.
 400
 401 #### Footnotes