From: Ralf Jung Date: Mon, 6 Jul 2015 16:32:54 +0000 (+0200) Subject: part 09: explain how Rust prevents iterator invalidation X-Git-Url: https://git.ralfj.de/rust-101.git/commitdiff_plain/bbd22a3d23af1ea8186c7a3bf733a07ba1f8ff70 part 09: explain how Rust prevents iterator invalidation --- diff --git a/src/part04.rs b/src/part04.rs index f39df37..301e15a 100644 --- a/src/part04.rs +++ b/src/part04.rs @@ -27,7 +27,7 @@ fn work_on_vector(v: Vec) { /* do something */ } fn ownership_demo() { let v = vec![1,2,3,4]; work_on_vector(v); - /* println!("The first element is: {}", v[0]); */ + /* println!("The first element is: {}", v[0]); */ /* BAD! */ } //@ Rust attaches additional meaning to the argument of `work_on_vector`: The function can assume //@ that it entirely *owns* `v`, and hence can do anything with it. When `work_on_vector` ends, @@ -114,7 +114,7 @@ fn mutable_borrow_demo() { /* let first = &v[0]; */ vec_inc(&mut v); vec_inc(&mut v); - /* println!("The first element is: {}", *first); */ + /* println!("The first element is: {}", *first); */ /* BAD! */ } //@ `&mut` is the operator to create a mutable borrow. We have to mark `v` as mutable in order to create such a //@ borrow. Because the borrow passed to `vec_inc` only lasts as long as the function call, we can still call diff --git a/src/part05.rs b/src/part05.rs index 3992fb1..7324d13 100644 --- a/src/part05.rs +++ b/src/part05.rs @@ -131,7 +131,7 @@ fn work_on_variant(mut var: Variant, text: String) { Variant::Number(ref mut n) => ptr = n, Variant::Text(_) => return, } - /* var = Variant::Text(text); */ + /* var = Variant::Text(text); */ /* BAD! */ *ptr = 1337; } //@ Now, imagine what would happen if we were permitted to also mutate `var`. We could, for example, diff --git a/src/part09.rs b/src/part09.rs index 482b293..63bab76 100644 --- a/src/part09.rs +++ b/src/part09.rs @@ -17,7 +17,8 @@ use part05::BigInt; // The only alternative is for the iterator to *borrow* the number. // In writing this down, we again have to be explicit about the lifetime of the borrow: We can't just have an -// `Iter`, we must have an `Iter<'a>` that borrowed the number for lifetime `'a`.
+// `Iter`, we must have an `Iter<'a>` that borrowed the number for lifetime `'a`. This is our first example of +// a datatype that's polymorphic in a lifetime, as opposed to a type.
// `usize` here is the type of unsigned, pointer-sized numbers. It is typically the type of "lengths of things", // in particular, it is the type of the length of a `Vec` and hence the right type to store an offset into the vector of digits. struct Iter<'a> { @@ -61,4 +62,64 @@ pub fn main() { } } +// Of course, we don't have to use `for` to apply the iterator. We can also explicitly call `next`. +fn print_digits_v1(b: &BigInt) { + let mut iter = b.iter(); + // `loop` is the keyword for a loop without a condition: It runs endlessly, or until you break out of + // it with `break` or `return`. + loop { + // Each time we go through the loop, we analyze the next element presented by the iterator - until it stops. + match iter.next() { + None => break, + Some(digit) => println!("{}", digit) + } + } +} + +// Now, it turns out that this combination of doing a loop and a pattern matching is fairly common, and Rust +// provides some convenient syntactic sugar for it. +fn print_digits_v2(b: &BigInt) { + let mut iter = b.iter(); + // `while let` performs the given pattern matching on every round of the loop, and cancels the loop if the pattern + // doesn't match. There's also `if let`, which works similar, but of course without the loopy part. + while let Some(digit) = iter.next() { + println!("{}", digit) + } +} + +// ## Iterator invalidation and lifetimes +// You may have been surprised that we had to explicitly annotate a lifetime when we wrote `Iter`. Of +// course, with lifetimes being present at every borrow in Rust, this is only consistent. But do we at +// least gain something from this extra annotation burden? (Thankfully, this burden only occurs when we +// define *types*, and not when we define functions - which is typically much more common.) +// +// It turns out that the answer to this question is yes! This particular aspect of the concept of +// lifetimes helps Rust to eliminate the issue of *iterator invalidation*. Consider the following +// piece of code. +fn iter_invalidation_demo() { + let mut b = BigInt::new(1 << 63) + BigInt::new(1 << 16) + BigInt::new(1 << 63); + for digit in b.iter() { + println!("{}", digit); + /*b = b + BigInt::new(1);*/ /* BAD! */ + } +} +// If you enable the bad line, Rust will reject the code. Why? The problem is that we are modifying the +// number while iterating over it. In other languages, this can have all sorts of effects from inconsistent +// data or throwing an exception (Java) to bad pointers being dereferenced (C++). Rust, however, is able to +// detect this situation. When you call `iter`, you have to borrow `b` for some lifetime `'a`, and you obtain +// `Iter<'a>`. This is an iterator that's only valid for lifetime `'a`. Gladly, we have this annotation available +// to make such a statement. Now, since we are using the iterator throughout the loop, `'a` has to span the loop. +// This `b` is borrowed for the duration of the loop, and we cannot mutate it. This is yet another example for +// how the combination of mutation and aliasing leads to undesired effects (not necessarily crashes, like in Java), +// which Rust successfully prevents. +// +// Technically speaking, there's one more subtlety that I did not explain yet. We never explicitly tied the lifetime `'a` of the +// iterator to the loop so how does this happen? The answer lies in the full type of `next()`: +// `fn<'a, 'b>(&'b mut Iter<'a>) -> Option`. Since `next()` takes a *borrowed* iterator, there are two lifetimes involved: +// The lifetime of the borrow of the iterator, and the lifetime of the iterator itself. In such a case of nested lifetimes, +// Rust implicitly adds the additional constraint that the inner lifetime *outlives* the outer one: The borrow of an iterator +// cannot be valid for longer than the iterator itself is valid. This means that the lifetime `'a` of the iterator needs +// to outlive every call to `next()`, and hence the loop. Lucky enough, this all happens without our intervention. + + //@ [index](main.html) | [previous](part08.html) | [next](main.html) diff --git a/workspace/src/part04.rs b/workspace/src/part04.rs index 4f2174a..c7969ac 100644 --- a/workspace/src/part04.rs +++ b/workspace/src/part04.rs @@ -14,7 +14,7 @@ fn work_on_vector(v: Vec) { /* do something */ } fn ownership_demo() { let v = vec![1,2,3,4]; work_on_vector(v); - /* println!("The first element is: {}", v[0]); */ + /* println!("The first element is: {}", v[0]); */ /* BAD! */ } // ## Shared borrowing @@ -55,7 +55,7 @@ fn mutable_borrow_demo() { /* let first = &v[0]; */ vec_inc(&mut v); vec_inc(&mut v); - /* println!("The first element is: {}", *first); */ + /* println!("The first element is: {}", *first); */ /* BAD! */ } // ## Summary diff --git a/workspace/src/part05.rs b/workspace/src/part05.rs index a2da18f..d3c9544 100644 --- a/workspace/src/part05.rs +++ b/workspace/src/part05.rs @@ -72,7 +72,7 @@ fn work_on_variant(mut var: Variant, text: String) { Variant::Number(ref mut n) => ptr = n, Variant::Text(_) => return, } - /* var = Variant::Text(text); */ + /* var = Variant::Text(text); */ /* BAD! */ *ptr = 1337; } diff --git a/workspace/src/part09.rs b/workspace/src/part09.rs index d6b8bba..6b3f737 100644 --- a/workspace/src/part09.rs +++ b/workspace/src/part09.rs @@ -17,7 +17,8 @@ use part05::BigInt; // The only alternative is for the iterator to *borrow* the number. // In writing this down, we again have to be explicit about the lifetime of the borrow: We can't just have an -// `Iter`, we must have an `Iter<'a>` that borrowed the number for lifetime `'a`.
+// `Iter`, we must have an `Iter<'a>` that borrowed the number for lifetime `'a`. This is our first example of +// a datatype that's polymorphic in a lifetime, as opposed to a type.
// `usize` here is the type of unsigned, pointer-sized numbers. It is typically the type of "lengths of things", // in particular, it is the type of the length of a `Vec` and hence the right type to store an offset into the vector of digits. struct Iter<'a> { @@ -60,3 +61,63 @@ pub fn main() { } } +// Of course, we don't have to use `for` to apply the iterator. We can also explicitly call `next`. +fn print_digits_v1(b: &BigInt) { + let mut iter = b.iter(); + // `loop` is the keyword for a loop without a condition: It runs endlessly, or until you break out of + // it with `break` or `return`. + loop { + // Each time we go through the loop, we analyze the next element presented by the iterator - until it stops. + match iter.next() { + None => break, + Some(digit) => println!("{}", digit) + } + } +} + +// Now, it turns out that this combination of doing a loop and a pattern matching is fairly common, and Rust +// provides some convenient syntactic sugar for it. +fn print_digits_v2(b: &BigInt) { + let mut iter = b.iter(); + // `while let` performs the given pattern matching on every round of the loop, and cancels the loop if the pattern + // doesn't match. There's also `if let`, which works similar, but of course without the loopy part. + while let Some(digit) = iter.next() { + println!("{}", digit) + } +} + +// ## Iterator invalidation and lifetimes +// You may have been surprised that we had to explicitly annotate a lifetime when we wrote `Iter`. Of +// course, with lifetimes being present at every borrow in Rust, this is only consistent. But do we at +// least gain something from this extra annotation burden? (Thankfully, this burden only occurs when we +// define *types*, and not when we define functions - which is typically much more common.) +// +// It turns out that the answer to this question is yes! This particular aspect of the concept of +// lifetimes helps Rust to eliminate the issue of *iterator invalidation*. Consider the following +// piece of code. +fn iter_invalidation_demo() { + let mut b = BigInt::new(1 << 63) + BigInt::new(1 << 16) + BigInt::new(1 << 63); + for digit in b.iter() { + println!("{}", digit); + /*b = b + BigInt::new(1);*/ /* BAD! */ + } +} +// If you enable the bad line, Rust will reject the code. Why? The problem is that we are modifying the +// number while iterating over it. In other languages, this can have all sorts of effects from inconsistent +// data or throwing an exception (Java) to bad pointers being dereferenced (C++). Rust, however, is able to +// detect this situation. When you call `iter`, you have to borrow `b` for some lifetime `'a`, and you obtain +// `Iter<'a>`. This is an iterator that's only valid for lifetime `'a`. Gladly, we have this annotation available +// to make such a statement. Now, since we are using the iterator throughout the loop, `'a` has to span the loop. +// This `b` is borrowed for the duration of the loop, and we cannot mutate it. This is yet another example for +// how the combination of mutation and aliasing leads to undesired effects (not necessarily crashes, like in Java), +// which Rust successfully prevents. +// +// Technically speaking, there's one more subtlety that I did not explain yet. We never explicitly tied the lifetime `'a` of the +// iterator to the loop so how does this happen? The answer lies in the full type of `next()`: +// `fn<'a, 'b>(&'b mut Iter<'a>) -> Option`. Since `next()` takes a *borrowed* iterator, there are two lifetimes involved: +// The lifetime of the borrow of the iterator, and the lifetime of the iterator itself. In such a case of nested lifetimes, +// Rust implicitly adds the additional constraint that the inner lifetime *outlives* the outer one: The borrow of an iterator +// cannot be valid for longer than the iterator itself is valid. This means that the lifetime `'a` of the iterator needs +// to outlive every call to `next()`, and hence the loop. Lucky enough, this all happens without our intervention. + +