From: Ralf Jung Date: Mon, 8 Aug 2022 01:55:48 +0000 (-0400) Subject: minirust post X-Git-Url: https://git.ralfj.de/web.git/commitdiff_plain/75f992257b9846515949093be7b35fcb1a818629?ds=inline;hp=e5957050934a7daa2c9d744f2cdc62525e7ee4e6 minirust post --- diff --git a/personal/_posts/2022-08-08-minirust.md b/personal/_posts/2022-08-08-minirust.md new file mode 100644 index 0000000..4d2752b --- /dev/null +++ b/personal/_posts/2022-08-08-minirust.md @@ -0,0 +1,53 @@ +--- +title: "Announcing: MiniRust" +categories: rust research +--- + +I have been thinking about the semantics of Rust -- as in, the intended behavior of Rust programs when executed, in particular those containing unsafe code -- a lot. +Probably too much. +But all of these thoughts are just in my head, which is not very useful when someone else wants to try and figure out how some tricky bit of unsafe Rust code behaves. +As part of the [Unsafe Code Guidelines](https://github.com/rust-lang/unsafe-code-guidelines/) project, we often get questions asking whether a *concrete* piece of code is fine or whether it has Undefined Behavior. +But clearly, that doesn't scale: there are just too many questions to be asked, and figuring out the semantics by interacting with an oracle with many-day latency is rather frustrating. +We have [Miri](https://github.com/rust-lang/miri/), which is a much quicker oracle, but it's also not always right and even then, it can just answer questions of the form "is this particular program fine"; users have to do all the work of figuring out the model that *generates* those answers themselves. + + + +So I have promised for a long time to find some more holistic way to write down my thoughts on unsafe Rust semantics. +I thought I could do it in 2021, but I, uh, "slightly" missed that deadline... but better late than never! +At long last, I can finally present to you: [**MiniRust**](https://github.com/RalfJung/minirust). + +The purpose of MiniRust is to describe the semantics of an interesting fragment of Rust in a way that is both precise and understandable to as many people as possible. +These goals are somewhat at odds with each other -- the most precise definitions, e.g. carried out in the Coq Proof Assistant, tend to not be very accessible. +English language, on the other hand, is not very precise. +So my compromise solution is to write down the semantics in a format that is hopefully known to everyone who could be interested: in Rust code. +Specifically, MiniRust is specified by a *reference interpreter* that describes the step-by-step process of executing a MiniRust program, *including* checking at each step whether the program has Undefined Behavior. + +"Hold on", I hear someone say, "you are defining Rust in Rust code? Isn't that cyclic?"[^bear] +Well, yes and no. It's not *really* Rust code. +It's what I call "pseudo Rust", uses only a tiny fragment of the language (in particular, no `unsafe`), and then extends the language with some conveniences to make things less verbose. +The idea is that anyone who knows Rust should immediately be able to understand what this code means, but also hopefully eventually if this idea pans out we can have tooling to translate pseudo Rust into "real" languages -- in particular, real Rust and Coq. +Translating it to real Rust means we can actually execute the reference interpreter and test it, and translating it to Coq means we can start proving theorems about it. +But I am getting waaaay ahead of myself, these are rather long-term plans. + +[^bear]: Is that someone the [cool bear](https://fasterthanli.me/articles/) making an appearance on my blog? We'll never know... and also I'd have to ask fasterthanlime for permission and didn't plan this well enough. ;) + +So, if you want to look into my brain to see how I see Rust programs, then please go check out [MiniRust](https://github.com/RalfJung/minirust). +The README explains the scope and goals, the general structure, and the details of pseudo Rust, as well as a comparison with some related efforts. + +In particular I find that the concept of "places" and "values", which can be rather mysterious, becomes a lot clearer when spelled out like that, but that might just be me. +I hasten to add that this is *very early work-in-progress*, and it is *my own personal experiment*, not necessarily reflecting the views of anyone else. +It is also *far from feature-complete*, in fact it has just barely enough to be interesting. +There are lots of small things missing (like integers that aren't exactly 2 bytes in size, or tuples that don't have exactly 2 elements), but the biggest omission by far is the total lack of an aliasing model. +And unsized types. And concurrency. And probably other things. + +On the other hand, there are many things that it *can* explain in full precision: +- validity invariants, and how they arise from the mapping between a high-level concept of "values" and a low-level concept of "sequences of bytes" +- the basic idea of provenance tracking the "allocation" a pointer points to, and how that interacts with pointer arithmetic (including `offset` and `wrapping_offset`) +- how pointer provenance behaves when doing transmutation between pointers and integers +- what happens when *casting* between pointers and integers +- padding (that's why tuples can have 2 elements, so there can be padding between them) + +If you re not used to reading interpreter source code, then I guess this can be rather jarring, and there is certainly a *lot* of work that could and should be done to make this more accessible. +But just being able to talk about these questions with precision *at all* has already lead to some interesting discussions in the UCG WG (some of which made me change my mind, and change MiniRust accordingly), so for now it is serving its purpose, and maybe some of you can find it useful, too. +And hopefully we can use it as a starting place for seriously tackling the issue of an *official* specification of Rust. +More on that soon. :)