1 // Rust-101, Part 14: Slices, Arrays, External Dependencies
2 // ========================================================
4 //@ To complete rgrep, there are two pieces we still need to implement: Sorting, and taking the job options
5 //@ as argument to the program, rather than hard-coding them. Let's start with sorting.
8 //@ Again, we first have to think about the type we want to give to our sorting function. We may be inclined to
9 //@ pass it a `Vec<T>`. Of course, sorting does not actually consume the argument, so we should make that a `&mut Vec<T>`.
10 //@ But there's a problem with that: If we want to implement some divide-and-conquer sorting algorithm (say,
11 //@ Quicksort), then we will have to *split* our argument at some point, and operate recursively on the two parts.
12 //@ But we can't split a `Vec`! We could now extend the function signature to also take some indices, marking the
13 //@ part of the vector we are supposed to sort, but that's all rather clumsy. Rust offers a nicer solution.
15 //@ `[T]` is the type of an (unsized) *array*, with elements of type `T`. All this means is that there's a contiguous
16 //@ region of memory, where a bunch of `T` are stored. How many? We can't tell! This is an unsized type. Just like for
17 //@ trait objects, this means we can only operate on pointers to that type, and these pointers will carry the missing
18 //@ information - namely, the length. Such a pointer is called a *slice*. As we will see, a slice can be split.
19 //@ Our function can thus take a borrowed slice, and promise to sort all elements in there.
20 pub fn sort<T: PartialOrd>(data: &mut [T]) {
21 if data.len() < 2 { return; }
23 // We decide that the element at 0 is our pivot, and then we move our cursors through the rest of the slice,
24 // making sure that everything on the left is no larger than the pivot, and everything on the right is no smaller.
26 let mut rpos = data.len();
27 /* Invariant: pivot is data[0]; everything with index (0,lpos) is <= pivot;
28 [rpos,len) is >= pivot; lpos < rpos */
30 // **Exercise 14.1**: Complete this Quicksort loop. You can use `swap` on slices to swap two elements. Write a
31 // test function for `sort`.
35 // Once our cursors met, we need to put the pivot in the right place.
38 // Finally, we split our slice to sort the two halves. The nice part about slices is that splitting them is cheap:
39 //@ They are just a pointer to a start address, and a length. We can thus get two pointers, one at the beginning and
40 //@ one in the middle, and set the lengths appropriately such that they don't overlap. This is what `split_at_mut` does.
41 //@ Since the two slices don't overlap, there is no aliasing and we can have them both mutably borrowed.
42 let (part1, part2) = data.split_at_mut(lpos);
43 //@ The index operation can not only be used to address certain elements, it can also be used for *slicing*: Giving a range
44 //@ of indices, and obtaining an appropriate part of the slice we started with. Here, we remove the last element from
45 //@ `part1`, which is the pivot. This makes sure both recursive calls work on strictly smaller slices.
46 sort(&mut part1[..lpos-1]); /*@*/
50 // **Exercise 14.2**: Since `String` implements `PartialEq`, you can now change the function `output_lines` in the previous part
51 // to call the sort function above. If you did exercise 13.1, you will have slightly more work. Make sure you sort by the matched line
52 // only, not by filename or line number!
54 // Now, we can sort, e.g., an vector of numbers.
55 fn sort_nums(data: &mut Vec<i32>) {
56 //@ Vectors support slicing, just like slices do. Here, `..` denotes the full range, which means we want to slice the entire vector.
57 //@ It is then passed to the `sort` function, which doesn't even know that it is working on data inside a vector.
62 //@ An *array* in Rust is given by the type `[T; n]`, where `n` is some *fixed* number. So, `[f64; 10]` is an array of 10 floating-point
63 //@ numbers, all one right next to the other in memory. Arrays are sized, and hence can be used like any other type. But we can also
64 //@ borrow them as slices, e.g., to sort them.
66 let mut array_of_data: [f64; 5] = [1.0, 3.4, 12.7, -9.12, 0.1];
67 sort(&mut array_of_data);
70 // ## External Dependencies
71 //@ This leaves us with just one more piece to complete rgrep: Taking arguments from the command-line. We could now directly work on
72 //@ [`std::env::args`](http://doc.rust-lang.org/stable/std/env/fn.args.html) to gain access to those arguments, and this would become
73 //@ a pretty boring lesson in string manipulation. Instead, I want to use this opportunity to show how easy it is to benefit from
74 //@ other people's work in your program.
76 //@ For sure, we are not the first to equip a Rust program with support for command-line arguments. Someone must have written a library
77 //@ for the job, right? Indeed, someone has. Rust has a central repository of published libraries, called [crates.io](https://crates.io/).
78 //@ It's a bit like [PyPI](https://pypi.python.org/pypi) or the [Ruby Gems](https://rubygems.org/): Everybody can upload their code,
79 //@ and there's tooling for importing that code into your project. This tooling is provided by `cargo`, the tool we are already using to
80 //@ build this tutorial. (`cargo` also has support for *publishing* your crate on crates.io, I refer you to [the documentation](http://doc.crates.io/crates-io.html) for more details.)
81 //@ In this case, we are going to use the [`docopt` crate](https://crates.io/crates/docopt), which creates a parser for command-line
82 //@ arguments based on the usage string. External dependencies are declared in the `Cargo.toml` file.
84 //@ I already prepared that file, but the declaration of the dependency is still commented out. So please open `Cargo.toml` of your workspace
85 //@ now, and enable the two commented-out lines. Then do `cargo build`. Cargo will now download the crate from crates.io, compile it,
86 //@ and link it to your program. In the future, you can do `cargo update` to make it download new versions of crates you depend on.
87 //@ Note that crates.io is only the default location for dependencies, you can also give it the URL of a git repository or some local
88 //@ path. All of this is explained in the [Cargo Guide](http://doc.crates.io/guide.html).
90 // I disabled the following module (using a rather bad hack), because it only compiles if `docopt` is linked.
91 // Remove the attribute of the `rgrep` module to enable compilation.
92 #[cfg(feature = "disabled")]
94 // Now that `docopt` is linked, we can first add it to the namespace with `extern crate` and then import shorter names with `use`.
95 // We also import some other pieces that we will need.
97 use self::docopt::Docopt;
98 use part12::{run, Options, OutputMode};
101 // The `USAGE` string documents how the program is to be called. It's written in a format that `docopt` can parse.
102 static USAGE: &'static str = "
103 Usage: rgrep [-c] [-s] <pattern> <file>...
106 -c, --count Count number of matching lines (rather than printing them).
107 -s, --sort Sort the lines before printing.
110 // This function extracts the rgrep options from the command-line arguments.
111 fn get_options() -> Options {
112 // This parses `argv` and exit the program with an error message if it fails. The code is taken from the [`docopt` documentation](http://burntsushi.net/rustdoc/docopt/). <br/>
113 //@ The function `and_then` takes a closure from `T` to `Result<U, E>`, and uses it to transform a `Result<T, E>` to a
114 //@ `Result<U, E>`. This way, we can chain computations that only happen if the previous one succeeded (and the error
115 //@ type has to stay the same). In case you know about monads, this style of programming will be familiar to you.
116 //@ There's a similar function for `Option`. `unwrap_or_else` is a bit like `unwrap`, but rather than panicking in
117 //@ case of an `Err`, it calls the closure.
118 let args = Docopt::new(USAGE).and_then(|d| d.parse()).unwrap_or_else(|e| e.exit());
119 // Now we can get all the values out.
120 let count = args.get_bool("-c");
121 let sort = args.get_bool("-s");
122 let pattern = args.get_str("<pattern>");
123 let files = args.get_vec("<file>");
125 println!("Setting both '-c' and '-s' at the same time does not make any sense.");
129 // We need to make the strings owned to construct the `Options` instance.
130 //@ If you check all the types carefully, you will notice that `pattern` above is of type `&str`. `str` is the type of a UTF-8
131 //@ encoded string, that is, a bunch of bytes in memory (`[u8]`) that are valid according of UTF-8. `str` is unsized. `&str`
132 //@ stores the address of the character data, and their length. String literals like "this one" are
133 //@ of type `&'static str`: They point right to the constant section of the binary, so
134 //@ However, the borrow is valid for as long as the program runs, hence it has lifetime `'static`. Calling
135 //@ `to_string` will copy the string data into an owned buffer on the heap, and thus convert it to `String`.
136 let mode = if count {
139 OutputMode::SortAndPrint
144 files: files.iter().map(|file| file.to_string()).collect(),
145 pattern: pattern.to_string(),
150 // Finally, we can call the `run` function from the previous part on the options extracted using `get_options`. Edit `main.rs` to call this function.
151 // You can now use `cargo run -- <pattern> <files>` to call your program, and see the argument parser and the threads we wrote previously in action!
153 run(get_options()); /*@*/
157 // **Exercise 14.3**: Wouldn't it be nice if rgrep supported regular expressions? There's already a crate that does all the parsing and matching on regular
158 // expression, it's called [regex](https://crates.io/crates/regex). Add this crate to the dependencies of your workspace, add an option ("-r") to switch
159 // the pattern to regular-expression mode, and change `filter_lines` to honor this option. The documentation of regex is available from its crates.io site.
160 // (You won't be able to use the `regex!` macro if you are on the stable or beta channel of Rust. But it wouldn't help for our use-case anyway.)
162 //@ [index](main.html) | [previous](part13.html) | [next](part15.html)