X-Git-Url: https://git.ralfj.de/web.git/blobdiff_plain/4d9c13e4b5c8fd488ab78ae72db7f095669fcb67..fe126eb79c2b16fe4bc0d2921ad20f7271ce4451:/personal/_posts/2015-10-12-formalizing-rust.md diff --git a/personal/_posts/2015-10-12-formalizing-rust.md b/personal/_posts/2015-10-12-formalizing-rust.md index 9f5dc7a..43a1a71 100644 --- a/personal/_posts/2015-10-12-formalizing-rust.md +++ b/personal/_posts/2015-10-12-formalizing-rust.md @@ -6,7 +6,7 @@ reddit: /rust/comments/3ofkz6/formalizing_rust/ My current research project -- and the main topic of my PhD thesis -- is about developing a *semantic model* of the [Rust programming language](https://www.rust-lang.org/) and, most importantly, its type system. Rust is an attempt of Mozilla to find a sweet spot in the design space of programming languages: A language that provides low-level resource management (making it a systems language), is convenient for programmers and guards against memory errors and thread unsafety. -Other have [said](https://www.youtube.com/watch?v=O5vzLKg7y-k) and [written](http://www.oreilly.com/programming/free/files/why-rust.pdf) a lot on why we need such a language, so I won't lose any more words on this. +Other have [said](https://www.youtube.com/watch?v=O5vzLKg7y-k) and [written](https://www.oreilly.com/programming/free/files/why-rust.pdf) a lot on why we need such a language, so I won't lose any more words on this. Let me just use this opportunity for a shameless plug: If you are curious and want to learn Rust, check out [Rust-101](https://www.ralfj.de/projects/rust-101/main.html), a hands-on Rust tutorial I wrote. I am going to assume some basic familiarity with Rust in the following. @@ -15,7 +15,9 @@ It's going to be fun! Honestly, that's enough of a reason for me. But there are other reasons: It shouldn't be a surprise that [bugs](https://github.com/rust-lang/rust/issues/24292) have [been](https://github.com/rust-lang/rust/issues/25860) [found](https://github.com/rust-lang/rust/issues/24880) in Rust. There are lots of things that can be done about such bugs -- my take on this is that we should try to *prove*, in a mathematical rigorous way, that no such bugs exist in Rust. This goes hand-in-hand with other approaches like testing, fuzzing and [static analysis](https://homes.cs.washington.edu/~emina/pubs/crust.ase15.pdf). -However, we (at my [research group](http://plv.mpi-sws.org/)) are into formalizing things, so that's what we are going to do. +However, we (at my [research group](https://plv.mpi-sws.org/)) are into formalizing things, so that's what we are going to do as part of the [RustBelt](https://plv.mpi-sws.org/rustbelt/) research project. + +**Update:** Added link to RustBelt website. @@ -37,13 +39,13 @@ The first version is also not going to cover automatic destructors, i.e., the `D This is, of course, a significant gap, even more so since dropck may well be the most subtle part of the type system, and the current version (as of Rust 1.3) is [known to be unsound](https://github.com/rust-lang/rust/issues/26656). But we have to start somewhere, and even without `Drop`, there is a lot to be done. -## So what's left? +## So What's Left? You may wonder now, what are we even doing then? The goal is to prove that programs following Rust's rules of ownership, borrowing and lifetimes are *memory and thread safe*, which, roughly speaking, means that all executed pointer accesses are valid, and there are no data races. I will often just say "memory safety", but concurrency is definitely part of the story. -## The syntactic approach and its limitations +## The Syntactic Approach and Its Limitations We could now be looking at the checks the Rust compiler is doing, and prove directly that these checks imply memory safety. Let us call these checks Rust's type system, then we are looking for the classic result of *type soundness*: Well-typed programs don't go wrong. @@ -54,7 +56,7 @@ In the following, we will consider programs containing `unsafe` as not being wel (We could also consider them well-typed in a more liberal type system that above result does not, and cannot, apply to. This is closer to what the Rust compiler does, but it's not helpful in our context.) Note that this issue is viral: `Vec`, `RefCell`, `Rc` -- all these and many more standard library data structures use `unsafe`, any if your program uses any of them, you're out of luck. -## Semantics to the rescue +## Semantics to the Rescue Intuitively, why do we even think that a program that uses `Vec`, but contains no `unsafe` block *itself*, should be safe? It's not just the compiler checks making sure that, e.g., we don't call `push` on a shared borrow. @@ -74,9 +76,9 @@ For example, to check that a function is semantically well-typed, you check what It doesn't matter *how* the function performs these operations. This is in contrast to the *syntactic* notion of well-typedness of a function, which involves looking at the *code* of the function and checking it against a bunch of rules. Only recently, research has been able to scale up such semantic methods to languages that combine state (i.e., mutable variables, a heap) with higher-order functions -- languages like Rust, where I can take a closure (a `Fn{,Mut,Once}`) and store it in memory, for later use. -Lucky enough, my advisor [Derek Dreyer](http://www.mpi-sws.org/~dreyer/) is one of the world experts in this field, which permits me to intensively study Rust (which I would have done anyways) and call it research! +Lucky enough, my advisor [Derek Dreyer](https://www.mpi-sws.org/~dreyer/) is one of the world experts in this field, which permits me to intensively study Rust (which I would have done anyways) and call it research! -## What we are doing +## What We Are Doing So, that's what we are trying to do: Come up not only with a formal version of Rust's (syntactic) type system, but also with appropriate *semantics* for these types. We then have to show that the syntactic types make semantic sense. This recovers, in a very roundabout way, the result I mentioned above: