We’ve already set up the development environment, so let’s write some Rust code!
A good place to start is with the bottom layer data structures, known as Str and Ch in the C# version. To review, Ch
is an enumeration representing the characters A-Z (and a None
value == 0), while Str
packs up to 12 Ch
values into 5-bit chunks plus a 4-bit length for a total of 64 bits. Already we have one small point of contention, in that Rust has a very similarly named primitive string slice type called str
. To make things a bit less confusing, we can use the name St
here instead.
Going test first, we will begin by translating the following C# unit test into Rust:
[Fact] public void Empty() { Str s = default(Str); s.Length.Should().Be(0); s.ToString().Should().Be(string.Empty); Enumerable.Range(0, 12).Select(i => s[i]).Should().BeEquivalentTo( Ch.None, Ch.None, Ch.None, Ch.None, Ch.None, Ch.None, Ch.None, Ch.None, Ch.None, Ch.None, Ch.None, Ch.None); }
Here is one possible reimagining:
#[test] fn empty() { let s = St::empty(); assert_eq!(0, s.len()); assert_eq!("", s.to_string()); let expected = vec![ Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, ]; let actual: Vec<Ch> = (0..12).map(|i| s[i]).collect(); assert_eq!(expected, actual); }
This already gives us a lot to chew on. Going line by line, let’s dive into the details. First, we assign an empty string value using an associated function (acting as our constructor) of the St
struct. Then we ensure the len()
and to_string()
functions return sensible values (0 and empty string, respectively). At the end, we prepare a Vec
of expected Ch
values and use a range to select (i.e. map) each index of the St
and collect this into an “actual” Vec
, asserting that it has the right contents. Note that due to Rust’s excellent type inference, we only have to explicitly use a type for the result of the collect
method.
One possible implementation that satisfies this test is as follows:
use std::{ fmt::{Display, Formatter, Result}, ops::Index, }; #[derive(Clone, Copy, Debug, PartialEq)] pub enum Ch { None, } pub struct St(u64); impl St { fn empty() -> St { St(0) } fn len(&self) -> u8 { 0 } } impl Display for St { fn fmt(&self, f: &mut Formatter) -> Result { write!(f, "{}", "") } } impl Index<u8> for St { type Output = Ch; fn index(&self, _index: u8) -> &Self::Output { &Ch::None } }
Again, we’ll just go through this line by line. The top section is the necessary use declarations to avoid having to fully qualify the various library type names that follow. Next, we have the Ch
enumeration which consists of a single member for now. It is annotated with a derive attribute telling the compiler to produce default implementations of four traits (roughly equivalent to interfaces in other languages). We want the Copy trait because we want to be able to freely pass Ch
by value like any other primitive type. We also need Clone as it is a super-trait of Copy (which will implicitly reuse the “cheap” copy behavior in this case). Debug is a nice-to-have trait for most types, but we are actually forced to use it here because assert macros require it (e.g. in case they have to produce an error message). Similarly, we need PartialEq since we can’t reasonably assert equality without it! We could have also used strong equivalence via Eq, but for now we’re just doing the minimum necessary to pass the tests (or in this case, the compiler!).
The rest is code for the St
struct. We start with the definition which is just about the simplest tuple struct you can create — a single 64-bit unsigned integer. Then we have the implementation block with stubs for the empty
and len
functions. The last two blocks are trait implementations. The first is for Display, which is roughly equivalent to IFormattable in the .NET world — this is what backs the to_string method. The fmt
function here is basically just boilerplate; we use a Formatter with the write macro and format string to produce an empty string. Finally, we have a stub indexer implementation which enables us to use square brackets to return a Ch
item (defined as an associated type, similar to a class-level typedef in C++). Rust doesn’t have direct operator overloading as such, but achieves the same through the traits in std::ops.
The next test introduces the concept of appending a Ch
to create a longer St
:
#[test] fn one_char() { let s = St::empty() + Ch::A; assert_eq!(1, s.len()); assert_eq!("A", s.to_string()); let expected = vec![ Ch::A, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, Ch::None, ]; let actual: Vec<Ch> = (0..12).map(|i| s[i]).collect(); assert_eq!(expected, actual); }
Here we use the + operator, which requires us to implement the Add
trait for St
, as well as a Display implementation for Ch
and more reasonable definitions for len
and fmt
: [see commit on GitHub]
Once we get to the two character test case we end up with most of core implementation code: [see commit on GitHub]
After that, it’s all pretty mechanical, culminating in the 12 character test (the maximum our St
can hold) at which point all Ch
values from A-Z are accounted for: [see commit on GitHub]
While developing the above code, I benefited greatly from the exhaustiveness of the Rust match operator. When adding new enum values, Rust could automatically tell me if I was missing any cases, e.g.:
impl Add<Ch> for St { type Output = St; fn add(self, rhs: Ch) -> Self::Output { let c = match rhs { Ch::None => 0, Ch::A => 1, Ch::B => 2, Ch::C => 3, }; St((self.0 + 1) | (c << (4 + 5 * self.len()))) } }
This code simply doesn’t compile because it only covers four of the possible 27 cases (A-Z + None). Of course, this only works for certain kinds of matches (typically, those involving enums); less help is available for instance in the inverse of the above operation:
impl Index<u8> for St { type Output = Ch; fn index(&self, index: u8) -> &Self::Output { let c = (self.0 >> (4 + 5 * index)) & 0x1F; match c { 1 => &Ch::A, 2 => &Ch::B, 3 => &Ch::C, _ => &Ch::None, } } }
This code would not compile without the _ placeholder, unless we specifically listed out every possible numeric value. This is in theory achievable for a u8
type (“only” 256 values) but would be out of the question for any larger data type. The compiler is great at reducing the kinds of tests we need, but alas we can’t throw them all away.
We have a good start yet there is a lot more work to do. Let’s continue our Rust journey in the next post.
Pingback: Letter Boxed: Rust impl, part 2 (panicking, hashing, parsing) – WriteAsync .NET