Letter Boxed: Rust impl, part 2 (panicking, hashing, parsing)

Spread the love

Previously we began our Rust exploration of Letter Boxed with the core St and Ch structs. Now we’ll complete their functionality.

The next test on our TODO list looks like this in the C# version:

        [Fact]
        public void AppendTooMany()
        {
            Str max = default(Str)
                .Append(Ch.A)
                .Append(Ch.B)
                .Append(Ch.C)
                .Append(Ch.D)
                .Append(Ch.E)
                .Append(Ch.F)
                .Append(Ch.G)
                .Append(Ch.H)
                .Append(Ch.I)
                .Append(Ch.J)
                .Append(Ch.K)
                .Append(Ch.L);

            Action act = () => max.Append(Ch.X);

            act.Should().Throw<InvalidOperationException>();
        }

We’ve already encountered our first roadblock — Rust does not have exceptions. Instead, you either use an error result for things that could reasonably happen in a typical flow (e.g. when parsing external data) or you panic! We’ll choose the latter in this particular case; appending past the end of a buffer is an indication of an incorrect program and not something that we would want to compensate for or recover from.

Here is the Rust version of this test, using the [should_panic] annotation:

    #[test]
    #[should_panic(expected = "Cannot append any more")]
    fn append_too_many() {
        let max = St::empty()
            + Ch::A
            + Ch::B
            + Ch::C
            + Ch::D
            + Ch::E
            + Ch::F
            + Ch::G
            + Ch::H
            + Ch::I
            + Ch::J
            + Ch::K
            + Ch::L;

        let _ = max + Ch::X;
    }

Note the use of the underscore variable since we expect this line to panic and cannot do anything with the value. To pass this test, we only need a small tweak to the add function:

    fn add(self, rhs: Ch) -> Self::Output {
        if self.len() == 12 {
            panic!("Cannot append any more");
        }
        // . . .
    }

Not bad! We can follow this same pattern for the index out of range condition and the new chop function which drops the last Ch from the St.

The end goal with St is to support a fast lookup strategy, so naturally we should reach for the Hash trait next. It is auto-derivable, with the default implementation hashing all fields together. How convenient. There is an important difference in how hashing works in Rust compared to .NET, however. Since Rust has no implementation inheritance and no universal base class, we can’t just make a virtual GetHashCode method and be done with it. Instead, “hashable” types are supposed to produce their hash code when given a “hasher” which allows a bit more modularity between the what (i.e. data to consider) and the how (i.e. the algorithm) of hashing. You can see this difference by comparing the C# and Rust versions of the hash code test:

        [Fact]
        public void HashCode()
        {
            Str empty = default(Str);
            Str a = default(Str).Append(Ch.A);
            Str b = default(Str).Append(Ch.B);
            Str ab = default(Str).Append(Ch.A).Append(Ch.B);
            Str ba = default(Str).Append(Ch.B).Append(Ch.A);
            Str cdefgh = default(Str).Append(Ch.C).Append(Ch.D).Append(Ch.E).Append(Ch.F).Append(Ch.G).Append(Ch.H);

            HashSet<int> codes = new HashSet<int>(new Str[] { empty, a, b, ab, ba, cdefgh }.Select(s => s.GetHashCode()));

            codes.Should().HaveCount(6);
        }

In Rust, we have the extra step of working with a mutable hasher:

    #[test]
    fn hash_code() {
        let empty = St::empty();
        let a = St::empty() + Ch::A;
        let b = St::empty() + Ch::B;
        let ab = St::empty() + Ch::A + Ch::B;
        let ba = St::empty() + Ch::B + Ch::A;
        let cdefgh = St::empty() + Ch::C + Ch::D + Ch::E + Ch::F + Ch::G + Ch::H;
        let values = vec![empty, a, b, ab, ba, cdefgh];

        let codes: HashSet<u64> = values
            .into_iter()
            .map(|s| {
                let mut hasher = DefaultHasher::new();
                s.hash(&mut hasher);
                hasher.finish()
            })
            .collect();
        assert_eq!(6, codes.len());
    }

It’s a slight difference but not all that impactful outside the unit testing context. In a typical program, we are unlikely to directly ask for the hash code (we’d probably just put the item in a HashMap and let the collection do the heavy lifting).

The last requirement is the ability to convert a string like “HELLO” into a St with values Ch::H, Ch::E, Ch::L, Ch::L, Ch::O inside. As you may have guessed, there’s a trait for that — FromStr. Unlike some other traits, the from_str method here is not typically used directly. Rather, you call str::parse which in turn calls from_str. As the docs say, you will more than likely have to use the “turbofish syntax” to ensure Rust understands what the output type should be in context. Here is the unit test showing this in action:

    #[test]
    fn parse_from_string() {
        test_parse("");
        test_parse("A");
        test_parse("BC");
        // . . .
    }

    fn test_parse(expected: &str) {
        let s = expected.parse::<St>().unwrap();
        assert_eq!(expected, s.to_string());
    }

As a final note, you should be aware that Rust considers string parsing to be a fallible conversion. As such, the return value is a Result type that can have either a value or an error. In this case, we don’t want to bother with returning an error and will just assume that converting to St will either succeed or panic. While we cannot change the function signature, we can indicate there is no such thing as an error here by using the unit type (essentially “void”):

impl FromStr for St {
    type Err = ();

    fn from_str(s: &str) -> Result<St, ()> {
        let mut value = St::empty();
        for c in s.chars() {
            let c = match c {
                'A' => Ch::A,
                // . . .
                'Z' => Ch::Z,
                _ => Ch::None,
            };
            value = value + c;
        }

        Ok(value)
    }
}

The type Err = (); line says that our error type is nothing at all, and as a consequence, our function returns Ok in all (non-panic) paths.

With that, our base data structure logic is done: [see code on GitHub]

Going forward, we’ll get into the nuts and bolts of the higher level algorithms and structures of Letter Boxed. See you again soon!

One thought on “Letter Boxed: Rust impl, part 2 (panicking, hashing, parsing)

  1. Pingback: Letter Boxed: Rust impl, part 3 (modules, trie, I/O) – WriteAsync .NET

Leave a Reply

Your email address will not be published. Required fields are marked *