personal/_posts/2017-07-17-types-as-contracts.md

   1 ---
   2 title: "Types as Contracts"
   3 categories: internship rust
   4 forum: https://internals.rust-lang.org/t/types-as-contracts/5562
   5 ---
   6
   7 Over the last couple of weeks of my [internship]({{ site.baseurl }}{% post_url
   8 2017-05-23-internship-starting %}), I have been working on a proposal for the "Unsafe Code Guidelines".
   9 I got some very encouraging feedback at the [Mozilla All-Hands](https://internals.rust-lang.org/t/recapping-this-weeks-mozilla-all-hands/5465),
  10 and now I'd like to put this proposal out there and start having a discussion.
  11 <!-- MORE -->
  12 Warning: Long post is long.  You have been warned.
  13
  14 If you need more context on this topic, I suggest you read my post on [unsafe code guidelines and undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}) first.
  15
  16 ## 1 It's in the Types
  17
  18 Consider the following simple function:
  19 {% highlight rust %}
  20 fn simple(x: &mut i32, y: &mut f32) -> i32 {
  21     *x = 3;
  22     *y = 4.0;
  23     *x
  24 }
  25 {% endhighlight %}
  26
  27 As discussed in my previous post, we would like an unsafe Rust program that calls this function with aliasing pointers to be UB.
  28 I'd like to propose that the fundamental reason this is UB is the *type* of the function and, in particular, its arguments: `&mut` expresses uniqueness, so the memory pointed to by an `&mut` must not alias with anything else.
  29 Borrowing some terminology from related fields, we can see the function type as stating a *contract*:  The `&mut` say that `simple` expects to be called with non-aliasing pointers.
  30 If we violate the contract, all bets are off.
  31
  32 To realize this idea, we need to define, for every type, what is the contract associated with that type. When are elements of any given type *valid*?
  33 For some types, that's simple: `i32` is any four-byte value. `bool` is one byte, either `0` or `1`.
  34 Compound types like `(T, U)` are defined in terms of their constituent types:  The first component must be a valid `T`, the second component a valid `U`.
  35
  36 Where it gets interesting is the reference types: `&mut T` says that we have a (non-null, properly aligned) pointer.
  37 Whatever the pointer points to must be a valid `T`.
  38 *Moreover*, the memory range covered by this pointer (whose extent is given by the size of `T`) is exclusively accessed by this function -- no concurrent thread performs any access, and when we call unknown code and do *not* pass on the reference, we can rely on that unknown code not accessing this memory either.
  39
  40 Similarly, for `&T`, we can rely on no-one performing any write access to this memory.
  41 At the same time, the contract for `&T` also says that we ourselves will not write to this memory either.
  42 This is a nice example showing that contracts go both ways:  The function can rely on the other party (the caller) having done their part by providing valid arguments, but the function also has to make guarantees itself.
  43 Another example of a guarantee provided by functions is their promise to provide a valid return value when they are done.
  44
  45 My proposal for the unsafe code guidelines then is to validate the contract described by a function's types at the function boundaries, and to make it UB for the contract to be violated.
  46 In particular, `simple` would validate `x` and `y` immediately after being called, and it would be UB for them to alias.
  47
  48
  49 ## 2 Specification Framework
  50
  51 I have [previously written about my thoughts on undefined behavior]({{ site.baseurl }}{% post_url 2017-07-14-undefined-behavior %}):
  52
  53 > I also think that tooling to *detect* UB is of paramount importance [...].
  54 In fact, specifying a dynamic UB checker is a very good way to specify UB!
  55 Such a specification would describe the additional state that is needed at run-time to then *check* at every operation whether we are running into UB.
  56
  57 Following this, I will in the following describe how we could check the contracts expressed by Rust's types for violations.
  58 I am considering a MIR program; if a Rust program is supposed to be checked, it has to be translated to MIR first.
  59 (You can [read up on my explanation]({{ site.baseurl }}{% post_url 2017-05-23-internship-starting %}) of why I think this is a good way to proceed.)
  60 The validation of reference types (in particular, their exclusive / read-only nature) will need some extra "instrumented" state -- also see [this previous post on the topic]({{ site.baseurl }}{% post_url 2017-06-06-MIR-semantics %}).[^1]
  61 The extra state in my proposal consists of something akin to a reader-writer lock for every memory location.
  62 I am first going to describe these locks and how they affect program behavior, before explaining contract validation.
  63
  64 [^1]: If at this point you are under the impression that all these previous posts have been collectively building up towards this proposal, you are not entirely wrong.
  65
  66 This is not (yet) a proper specification in that some important details are still left open.
  67 The goal is to lay down a framework as the basis for discussion.
  68 Many things already work as expected, and hopefully the remaining issues can be sorted out by tuning some of the details.
  69
  70 ### 2.1 Memory Locks
  71
  72 Every location comes with an additional reader-writer "lock".
  73 These locks have nothing to do with concurrency, in fact they are more like `RefCell`.
  74 Locks are used to verify the guarantees provided by references:
  75 Shared references point to memory that is not mutated by anyone, and mutable references point to memory that is not *accessed* (read from or written to) by anyone *else*.
  76 Notably, however, unlocked memory is free to be accessed by anyone -- nobody is making any particular assumptions about this memory.
  77
  78 To track these locks, we store a `locks: Vec<LockInfo>` for every location, with `LockInfo` defined as:
  79 {% highlight rust %}
  80 struct LockInfo {
  81     write_lock: bool,
  82     frame: usize,
  83     lft: Option<Lifetime>,
  84 }
  85 {% endhighlight %}
  86 (Notice that an implementation of this specification is free to track locks in a different way, as long as the effective behavior is the same.
  87 The reason I am giving concrete types here is just to make the specification more precise.)
  88
  89 When a lock is acquired, we record which stack frame is the owner of this lock (starting at 0 for `main`, and counting up).
  90 Now, on every read access, we check whether there is a write lock of this location held *by another function*.
  91 If yes, the access is UB.
  92 Furthermore, write accesses and acquiring a write lock are UB if there is a read lock held *or* if there is a write lock held by another function.
  93 Finally, acquiring a read lock is UB if there is a write lock held by anyone, including the function requesting a read lock.
  94
  95 Locks are typically released automatically when they expire.
  96 To this end, the function acquiring a lock can optionally specify the *lifetime* of the lock.
  97 Not specifying a lifetime implies that the lock is held until the function returns.
  98 For the purpose of this specification, it does not matter what exactly a lifetime is; it is only relevant at which point it *ends*.
  99 Lucky enough, MIR recently gained [`EndRegion` statements](https://github.com/rust-lang/rust/pull/39409) that we can use for this purpose:
 100 When `EndRegion(lft)` is executed, all locks with lifetime `lft` are released.
 101 Beyond this, a write lock can also be explicitly released by the function that acquired it, even before the lock's lifetime ends.
 102 (It does not turn out to ever be necessary to do this for read locks in my proposal.)
 103
 104 There is one more mechanism we need to introduce:  Acquired write locks can be *suspended* for a lifetime.
 105 This means the lock is temporarily released, but meant to be reacquired later.
 106 To this end, the `LockInfo` is removed from the location's `locks` and put into a `HashMap<Lifetime, (Location, LockInfo)>` which is stored somewhere separately from the program's memory.
 107 When the `EndRegion` corresponding to the given lifetime is reached, the lock is reacquired and put back into the `locks` of the given location.
 108
 109 ### 2.2 Validation
 110
 111 With the lock mechanism established, we can now look at the core of the proposal:  Validating the contract described by a type.
 112 This is done by a new MIR statement, `Validate`, which takes two arguments:  An *operation* (which we will discuss shortly), and a list of `(Lvalue, Ty)` pairs defining which lvalues is supposed to be validated at which type.
 113 Crucially, this is a non-erased type -- all the lifetime information must be present.
 114
 115 The most important validation operation is *acquire validation*.
 116 As the name indicates, this is about acquiring the appropriate locks.
 117 It also otherwise makes sure that values are valid at the given type, e.g., references have to be non-NULL and aligned.
 118
 119 The core of acquire validation is an operation with a signature roughly like this:
 120 {% highlight rust %}
 121 fn acquire_validate(lval: Lvalue, ty: Ty, mutbl: Mutability, lft: Option<Lifetime>)
 122 {% endhighlight %}
 123 Why are mutability and lifetime showing up here?
 124 The reason is that reference types have a "transitive" effect down the type hierarchy: `&T` makes everything reachable through this reference read-only, even if `T` involves `&mut`.
 125 Similarly, in case of nested references, it is the *outermost* reference that defines the lifetime for which access through this pointer is granted.
 126 (The compiler makes sure that inner references always have longer lifetimes than outer references.)
 127
 128 `acquire_validate` recursively traverses the type hierarchy.
 129 When it encounters a primitive type like `i32`, it acquires a lock of `size_of::<ty>` many locations starting at `lval` for lifetime `lft` -- this will be a write or read lock depending on whether `mutbl` says this data is mutable or not.[^2]
 130 In other words, memory is locked "at the leaves" of the type structure, rather than locking immediately when we see a reference type.
 131 We will come back later to why this is necessary.
 132 Validation also makes sure that the value stored in memory is valid at the given type.
 133 For a `u32`, this just amounts to checking that the memory is properly initialized.
 134
 135 [^2]: For simplicity, we pretend lvalues are just locations in memory.  Actually, lvalues can also refer to variables not in memory (like non-addressable stack locals), but these do not need to be locked.  Furthermore, lvalues for unsized types carry the additional information stored in a fat pointer, i.e., a length (for slices) or a vtable (for trait objects); this is handled appropriately when validating the respective types.
 136
 137 For compound types like tuples, `struct` or `enum`, validation proceeds recursively on the fields.
 138 In particular, the enum discriminant is checked to be in-range (in particular, this empty enums always invalid).
 139 However, because the recursive validation will only lock memory that's actually covered by a field, we also have to acquire the appropriate lock for padding bytes and the enum discriminant here.
 140
 141 Finally, at a reference type, two things happen.
 142 First, the reference itself is stored in memory somewhere; this memory has to be locked just like the validation of `i32` locks the memory used to store the integer.
 143 The reference is also checked to be non-NULL and properly aligned for the type it points to.
 144 Furthermore, validation proceeds recursively after *dereferencing* the reference.
 145 Crucially, the `mutbl` and `lft` for this recursive call are taking the reference type into account:
 146 If `lft` was `None` and this reference's lifetime ends within the function (i.e., there is a corresponding `EndRegion` somewhere), it is now set to the reference's lifetime.
 147 If `mutbl` was mutable, it becomes immutable when following a shared reference.
 148
 149 There are two other validation operations besides acquire validation: release validation, and suspend validation.
 150 These serve to release the appropriate locks before handing references to foreign code, and to handle the operation of taking a reference.
 151 (Bear with me for just a little longer, we will see examples of this soon.)
 152 Suspend validation takes an additional parameter, namely the lifetime for which locks are to be suspended.
 153 Both operations recursively follow the type just like acquire validation, and for all memory they visit, they release/suspend *write* locks.
 154 Read locks are not touched, which corresponds to the fact that shared references are `Copy` and hence never "given up":
 155 Shared references always remain usable until their lifetime expires, at which point the read lock is released automatically.
 156
 157 Validation statements are emitted during MIR construction in the following places:
 158 - At the beginning of the function, we perform acquire validation of the arguments.
 159 - When taking a reference, we suspend the referee for the reference lifetime and acquire the referent.
 160 - When calling another function, we release the arguments and acquire the return value.
 161
 162 ### 2.3 Examples
 163
 164 After all this rather dry theory, let us look at some examples demonstrating how validation works, and how it enables optimizations.
 165
 166 #### Validating function arguments
 167
 168 Here's the sample code from above with validation annotated explicitly:
 169 {% highlight rust %}
 170 fn simple(x: &mut i32, y: &mut f32) -> i32 {
 171     Validate(Acquire, [x, y]);
 172     *x = 3;
 173     *y = 4.0;
 174     *x
 175 }
 176 {% endhighlight %}
 177 For simplicity's sake, I just pretend that validation commands were present not just in MIR, but also in the Rust surface language. Also, while validation takes *pairs* of lvalues and types, the types can be inferred, so I am only listing lvalues here.
 178
 179 Let us now see what happens when `x` and `y` alias.
 180 I mentioned above that we want this case to be UB, because we want to be able to reorder the two stores.
 181 Indeed, it will turn out that the validation describes above triggers UB here.
 182
 183 There are no references taken and no other function is called, so the only validation that happens is acquiring the arguments at the beginning of the function.
 184 This validates our contract with the caller.
 185 `x` is of reference type, so validation will first acquire the write lock for the memory used to store `x` itself (that's somewhere on the stack).
 186 Next, it will follow the reference and recursively validate whatever `*x` is valid at type `i32`.
 187 In the recursive call, the `lft` will still be `None` because the lifetime of the reference outlives the function call.
 188 The recursive call then acquires the write lock for `*x` and checks that memory is initialized.
 189 This completes validating `x`.
 190 Validation of `y` proceeds exactly the same way.
 191 In particular, validation will acquire a write lock for `*y`.
 192 However, since `x` and `y` alias, this will overlap the write lock that was acquired for `*x`!
 193 This is how we get the UB that we were looking for.
 194
 195 In other words, acquiring locks for everything we can reach through references implicitly checks for aliasing, making sure that mutable references do not overlap with anything else.
 196
 197 #### Validating function calls
 198
 199 The second example demonstrates how validation, and memory locks in particular, let us reason about the actions of unknown external functions:
 200 {% highlight rust %}
 201 // Defined elsewhere: fn fun(x: &i32, z: &mut (i32, i32)) { ... }
 202
 203 fn example(x: &i32, y: &mut i32, z: &mut (i32, i32)) {
 204     Validate(Acquire, [x, y, z]);
 205     let x_val = *x;
 206     let y_val = *y;
 207
 208     Validate(Release, [x, z]);
 209     fun(x, {z});
 210
 211     let x_val_2 = *x;
 212     *y = y_val;
 213 }
 214 {% endhighlight %}
 215 We would like to optimize this code and eliminate the write to `y` at the end of the function, arguing that we are just writing back the value that's already stored there.
 216 The reason we think we can do that is that we hold an exclusive reference to `*y`.
 217 We do not pass `y` to `fun`, and there most not be any other reference to this memory, so `fun` cannot touch `y`.
 218 And indeed, the validation rules make it so that it is UB for `fun` to access `y`.
 219
 220 Furthermore, we want to eliminate the load from `x` at the end, and instead re-use `x_val`, arguing that since `x` is a shared reference, it is UB for `nop` to change its value.
 221
 222 In the beginning of `example`, during acquire validation, we acquire the read lock for `*x` and the write lock for `*y` and `*z`.
 223 The release validation before calling `fun` releases all write locks covered by `x` and `z`.
 224 For `x`, this doesn't actually do anything, but the write lock on `z` is released here.
 225 (We assume there is no implicit reborrowing of `z` going on here. The curly braces make it so that `z` is moved, not borrowed, to `fun`.)
 226 So, when `fun` is called, we hold the read lock for `x` and the write lock for `y`.
 227 Consequently, if `fun` were to access `y` or write to `x`, we know that's UB -- so we can rely on this not to happen and perform our desired optimizations.
 228
 229 Notice that this optimization is extremely hard to perform e.g. in C, because to know that `fun` cannot access `y`, the compiler needs to do whole-program alias analysis -- a notoriously hard problem.
 230 In contrast, Rust's type system allows us to perform such optimizations intra-procedurally, without looking at other functions' code at all.
 231
 232 #### Validating taking a reference
 233
 234 In the last example, we deliberately *moved* `z` to `fun` to side-step any discussions about borrowing.
 235 However, typically, you would actually *reborrow* `z`, leading to additional validation calls around the operation that takes a reference.
 236 We will look at that case now.
 237 Let us also say that `fun` actually returns a reference with the same lifetime as `z`.
 238 Overall, our code now looks like this (with all reborrowing being explicit):
 239 {% highlight rust %}
 240 // Defined elsewhere: fn fun<'z>(x: &i32, z: &'z mut (i32, i32)) -> &'z mut i32  { ... }
 241
 242 fn example(x: &i32, y: &mut i32, z: &mut (i32, i32)) {
 243     Validate(Acquire, [x, y]);
 244     let x_val = *x;
 245     let y_val = *y;
 246
 247     {   // 'inner begins here
 248         Validate(Suspend('z), [z]);
 249         let z_for_fun = &'inner mut z;
 250         Validate(Acquire, [z_for_fun]);
 251
 252         Validate(Release, [x, z_for_fun]);
 253         let z_inner : &'inner mut i32 = fun(x, {z_for_fun});
 254         Validate(Acquire, [z_inner]);
 255
 256         *z_inner = 42;
 257     }   // 'inner ends here
 258
 259     let x_val_2 = *x;
 260     *y = y_val;
 261 }
 262 {% endhighlight %}
 263 Again we would like to eliminate the write to `y` and the read from `x`.
 264 To eliminate the write to `y` as being redundant, we have to prove, in particular, that the write to `z_inner` cannot affect `*y`.
 265
 266 As usual, we start by acquiring the read lock for `*x` and the write lock for `*y` and `*z`.
 267 When `z` gets reborrowed to be passed to `fun` with lifetime `'inner`, we *suspend* our lock of `*z` for this lifetime.
 268 This models the fact that, no matter what we do with the newly created reference, we know that when `'inner` ends, we get our exclusive access back.
 269 After the suspension, we don't hold a lock any more for what is now `*z_for_fun`, so we immediately acquire a write lock again.
 270 However, following the type of `z_for_fun`, we will acquire this lock only for lifetime `'inner`.
 271 (Releasing and then acquiring the write lock may sound somewhat redundant.
 272 However, notice that, if we were to take a shared reference, we would now acquire a read lock rather than a write lock.)
 273
 274 Next, we call `fun`.
 275 At this point, we release our write lock on `*z_for_fun` again.
 276 However, notice that the suspended lock of `*z` stays around -- so while we gave up control of `z` (aka `z_for_fun`) for lifetime `'inner`, we still know that we will get control back when `'inner` ends!
 277 `fun` is now free to do whatever it wants, as long as it does not violate our still-existing write lock of `*y` and the read lock of `*x`.
 278
 279 When `fun` returns, we obtain a new mutable reference.
 280 At this point, we again run acquire validation.
 281 In this case, we want to ensure that `z_inner` does not alias `y`.
 282 The locks ensure that `fun` cannot possibly change `y`, but we (`example`) can -- and if writing to `z_inner` were to change `*y`, that access would come from the stack frame holding the lock on `y`, so it would be permitted!
 283 However, we do want to still optimize away the write to `y` at the end of `example` as being redundant.
 284 To this end, we acquire-validate all data returned to us from other functions, making sure that references we obtain this way to not conflict with other references we already hold.
 285 Following the type of `z_inner`, we only acquire this lock for `'inner`.
 286
 287 Another way to look at acquiring the return value is that we are validating the contract we have with `fun`: The contract says that `fun` should return a reference to us that comes with the uniqueness guarantees of `&mut`, so we better make sure that `fun` actually delivered what it said it would.
 288
 289 Finally, `'inner` ends.
 290 Multiple things happen now:
 291 First, our lock of `z_inner` expires and is hence released.
 292 (We already released the lock of `z_for_fun` ourselves, so nothing is to be done there.)
 293 Next, remember that we suspended our lock of `z` for `'inner`; so now is the time that this lock gets reacquired.
 294 In other words, as the lifetime of the reborrow from `z` expires, we release all locks of memory that is tied to that lifetime, and then re-gain full exclusive access to `z`.
 295
 296 When we come to the write to `*y`, we know that this write is redundant because (a) we held the write lock for `*y` all the time, so nobody else could have written to it without triggering UB, and (b) we know `z_inner` does not alias with `y` (or else we would already have UB), so we ourselves have not written to `*y` either.
 297 Hence, we may optimize away the write.
 298
 299 ### 2.4 `UnsafeCell`
 300
 301 This concludes the examples.
 302 I hope you now have an idea of how memory locks and validation affect the execution when "bad" references are passed around, and how they validate the assumptions we need to make for optimizations that reorder memory accesses.
 303
 304 One topic that did not come up yet is interior mutability.
 305 If a function takes an `x: &Cell<i32>`, following the rules above, it will acquire a read lock of `*x` for the duration of the function call, making `*x` immutable.
 306 Clearly, we do not want to do that -- calling `x.set` *will* actually mutate `*x`, and mutating through a shared reference is exactly the point of using `Cell`!
 307
 308 Lucky enough, the compiler *already* says that interior mutability is only allowed via [`UnsafeCell`](https://doc.rust-lang.org/beta/core/cell/struct.UnsafeCell.html).
 309 We can use this for our purposes: To adjust validation for interior mutability, we *stop* our recursive descent and do not do anything when reaching an `UnsafeCell`.
 310 In particular, no locks are acquired.
 311 This justifies calling `set` on a shared reference and having the value changed.
 312 Of course, it also means we cannot do some of the optimizations we discussed above -- but that's actually exactly what we want!
 313 These optimizations are unsound in the presence of interior mutability.
 314
 315 Notice that for this to work, it is crucial that we acquire locks "at the leaves" of the type structure, and not just when we encounter a reference.
 316 For example, if we validate something at type `&(i32, Cell<i32>)`, we *do* want to acquire a read lock for the first component, but not for the second.
 317
 318 ### 2.5 Unsafe Code
 319
 320 There's one last issue that I want to cover before (finally) concluding this post: Unsafe code.
 321 So far, we have only considered entirely safe code, and we have seen how validation and memory locks enforce the guarantees associated with reference type.
 322 The type checker already ensures that safe code will never trigger a lock failure -- actually, you can see the validation statements as some form of "dynamic borrow checking", enforcing the same rules as the borrow checker does statically.
 323 The only way that these checks could ever fail is for unsafe code to feed "bad" values into safe code, and that is exactly the point:
 324 These proposed "unsafe code guidelines" restrict what unsafe code can do when interfacing with safe code.
 325
 326 However, *within* unsafe code, we have to give programmers a little more leeway.
 327 We cannot always make as strong assumptions as we did in safe code:
 328 For example, let us look at `mem::swap` (somewhat desugared):
 329 {% highlight rust %}
 330 // Defined elsewhere: unsafe fn ptr::swap_nonoverlapping<T>(x: *mut T, y: *mut T) { ... }
 331
 332 pub fn swap<T>(x: &mut T, y: &mut T) {
 333     Validate(Acquire, [x, y]);
 334     let x = x as *mut T;
 335     let y = y as *mut T;
 336     unsafe {
 337         Validate(Release, [x, y]);
 338         ptr::swap_nonoverlapping(x, y, 1);
 339     }
 340 }
 341 {% endhighlight %}
 342 In the beginning, we acquire the write lock for `x` and `y`, establishing that the two pointers do not alias -- this is exactly as expected.
 343 However, when we release the arguments to `ptr::swap_nonoverlaping`, nothing actually happens:
 344 We are only passing raw pointers that do not come with any guarantees.
 345 In particular, we keep the write locks on `x` and `y`!
 346 Now, that's a problem -- `ptr::swap_nonoverlaping` will of course write to both of them when it swaps their content, triggering UB.
 347
 348 First of all, let me note that this example actually demonstrates that we are validating the borrow checking gurantees correctly.
 349 As far as the borrow checker is concerned, `x` and `y` have never been borrowed or moved to anyone else, they were fully owned by `swap` all the time.
 350
 351 The trouble is that the ownership transfer to `ptr::swap_nonoverlaping` is not described by its type, and hence cannot be known to the compiler.
 352 This is possible because `ptr::swap_nonoverlapping` is an *unsafe* function that can make more assumptions than its type says, and rely on the caller to uphold these assumptions.
 353 The compiler has no way to know what these assumptions are.
 354
 355 To accommodate such code, I suggest that we emit fewer validation instruction when building MIR for functions containing unsafe blocks.
 356 For example, we could imagine that if a function contains an unsafe block, instead of the usual validation, we only perform an acquire in the beginning of the function, immediately followed by a release.
 357 This would verify that our arguments are well-formed, but would otherwise acknowledge that we just do not know how ownership is flowing inside such a function.
 358 Of course, this also disables some optimizations -- but maybe these are optimizations that we do not want to perform anyway in unsafe code, because they could end up breaking the program.
 359
 360 Anyway, this part of my proposal is certainly not yet final.
 361 This topic needs input from many people so as to safely cover the unsafe code people write while still (hopefully) permitting the optimizations people expect to happen.
 362 One potential middle-ground here is having a function attribute that controls which validation statements are emitted, or even providing some form of intrinsics for programmers to explicitly perform validation.
 363
 364 ## 3 Conclusion
 365
 366 I have described a mechanism to validate the contract expressed by Rust's types, in particular, the uniqueness and read-only guarantees provided by reference types.
 367 This validation describes what (safe) code can assume about its variables at run-time, and it enables some code motion optimizations that would otherwise require whole-program alias analysis.
 368
 369
 370 I said before that I think providing a way to *detect* UB is crucial, and the framework laid out above is designed with this in mind.
 371 In fact, I have been working on a miri-based implementation of this mechanism:
 372 [This rustc branch](https://github.com/RalfJung/rust/tree/mir-validate) adds the new `Validate` statement to MIR and adds a MIR pass to insert validation statements as described above, while [this miri branch](https://github.com/RalfJung/miri/tree/mir-validate) adds memory locks to miri and implements type-based validation.
 373 (Both of these branches are subject to frequent rebasing, so expect force pushes if you track them.)
 374 The implementation is not yet complete, and some of the terminology changed (it turns out writing things down helps you find good names :), so don't be confused if it looks a little different.
 375 Other than that though, feel free to play around and provide feedback.
 376 I hope the implementation can help us check whether unsafe code follows the rules I laid down here, and whether these rules match programmer intuition.
 377 (In fact, this is how I figured out that `mem::swap` violates the most strict interpretation of the rules -- I did expect a violation somewhere in libstd's unsafe code, but had no idea where it would show up.)
 378 We could even have a test suite to aid in fine-tuning the rules.
 379
 380 I also said that "we should strive for programmers' intuition agreeing with the [...] compiler".
 381 This is of course hard to measure, but one property that I think goes a long way here is that in this proposal, pointers can remain "thin".
 382 I probably should write a whole separate blog post on this topic, but the short version is that in my proposal, if a function has permission to access certain region of memory, then it does not matter how it performs the access -- if there are multiple pointers to this region around, they are all equally valid to use.
 383 This is in contrast to e.g. C's `restrict` or type-based alias analysis, which make a distinction based on *which pointer* is used to access certain regions of memory.
 384 Such models are fairly hard to write checkers for, as witness by there being no sanitizer.
 385 Moreover, they typically (from what I have seen so far) have some trouble supporting integer-pointer casts.
 386 If pointers have an "identity" beyond the address they point to, then which "identity" does a pointer that's casted from an integer obtain?
 387 I am not saying it is impossible to solve this issue, but it is a source of complication.
 388 It is also quickly a source of programmer confusion.
 389 In my experience, programmers that use pointers casted from integers typically like to think of pointers as being "just integers" and not having any "identity" beyond the address they point to (and indeed, the fact that such casts work seems to indicate that they are right).
 390 I would be much less worried about unsafe Rust code being broken by optimizations if we can make it so that this mental model is actually right.
 391 Wouldn't it be great if we could add "Has the simple model of pointers that everybody incorrectly *thinks* C has" to the list of Rust selling points?
 392 I think this aligns well with Rust's slogan "Hacking without fear" -- and without compromise; remember we still get all the optimizations!
 393
 394 Congratulations!
 395 You have reached the end of this blog post.
 396 Thanks a lot for reading.
 397 If you have any comments, please raise them -- the entire purpose of this post is to get the proposal out there so that we can see if this approach can suit the needs of Rust programmers and compiler writers alike.
 398 If there are explanations you think are missing or confusing, please bring that up as well; I may then update this post accordingly.
 399 The concrete proposal will sure need some fixing here and there, but my hope is that the general approach of a contract-like, type-driven validation mechanism ends up being useful.
 400 So, keep the comments flowing -- and safe hacking.
 401
 402 #### Footnotes