Letter Boxed: Rust impl, part 3 (modules, trie, I/O)

Spread the love

Our Rust-based Letter Boxed code so far has just the core character-based data types. Today we’ll add the trie.

Before we move on, we need to keep our house in order. Rather than have one massive lib.rs file, we should start splitting out some modules. The current lib.rs which has St and Ch will move to a module called core: [see commit on GitHub].

With that out of the way, we can begin a new module trie to hold our next chunk of code for StTrie: [see commit on GitHub]

The trie implementation is pretty much a straight port from the C# version. After several incremental TDD steps, we land on the following implementation:

use crate::core::St;
use std::collections::HashMap;

#[derive(Debug, PartialEq)]
pub enum NodeKind {
    None,
    Prefix,
    Terminal,
}

pub struct StTrie {
    nodes: HashMap<St, bool>,
    count: usize,
}

impl StTrie {
    fn new() -> StTrie {
        let nodes = HashMap::new();
        let count = 0;
        StTrie { nodes, count }
    }

    fn len(&self) -> usize {
        self.count
    }

    fn insert(&mut self, value: St) {
        if value.len() == 0 {
            return;
        }

        if let Some(&true) = self.nodes.get(&value) {
            return;
        }

        self.count += 1;
        self.nodes.insert(value, true);
        let mut value = value;
        for _ in 1..value.len() {
            value = value.chop();
            if let Some(_) = self.nodes.get(&value) {
                return;
            }

            self.nodes.insert(value, false);
        }
    }

    fn find(&self, value: St) -> NodeKind {
        match self.nodes.get(&value) {
            None => NodeKind::None,
            Some(&true) => NodeKind::Terminal,
            Some(&false) => NodeKind::Prefix,
        }
    }
}

There’s nothing too special here, other than maybe the if let expression when checking if the node already exists.

The last big feature for StTrie is the ability to load words from a stream of text. Thus, we need to get familiar with the Read trait, roughly analogous to the read side of System.IO.Stream in .NET. Of course, the API is rather primitive, so let’s instead start with the easier BufRead derived trait. Most notably, it allows us to use the lines() iterator for a much nicer experience:

impl StTrie {
    // . . .
    fn load<B: BufRead>(stream: &mut B) -> StTrie {
        let mut trie = StTrie::new();
        for line in stream.lines() {
            let word = line.unwrap();
            if word.len() >= 3 && word.len() <= 12 {
                let value = word.parse::<St>().unwrap();
                trie.insert(value);
            }
        }

        trie
    }
    // . . .
}

You might wonder what the MemoryStream equivalent is for the purposes of unit testing. It turns out we can use Cursor for that, as demonstrated in this test helper which converts a bunch of string slices to a big block of bytes:

    fn load_trie(lines: Vec<&str>) -> StTrie {
        let lines = lines.join("\r\n").as_bytes().to_vec();
        let mut stream = Cursor::new(lines);
        StTrie::load(&mut stream)
    }

At this point, we have enough code to write the trie-loading portion of the solver app. So let’s go ahead and do that:

use letter_boxed_solver::trie::StTrie;
use std::{env::args, fs::File, time::Instant};

fn main() {
    let start = Instant::now();
    let args = args();
    if args.len() != 3 {
        println!("Please specify a Letter Boxed puzzle and a word list file.");
        return;
    }

    log(start, "Loading trie...");
    let path = args.skip(2).next().unwrap();
    let mut stream = File::open(path).unwrap();
    let trie = StTrie::load(&mut stream);
    log(start, format!("Loaded {} words.", trie.len()).as_str());
}

fn log(start: Instant, message: &str) {
    println!("[{:07.3}] {}", start.elapsed().as_secs_f64(), message);
}

We’re doing basic measurement with std::time::Instant to see how fast this app really is! Let’s give it a spin now:

[000.000] Loading trie...
[000.059] Loaded 162309 words.

Hmm… this is a problem. The C++ implementation (after some tricky optimizations) was able to finish the trie load in ~45 ms as of the last benchmark. Even the C# implementation got to 52 ms… so honestly 59 ms seems quite slow. Could it be another case of slowdowns due to buffered I/O?

The only way to find out is to write the ugly raw implementation:

    // . . . 
    pub fn load<R: Read>(stream: &mut R) -> StTrie {
        let mut trie = StTrie::new();
        let mut buf = [0u8; 1024];
        let mut s = St::empty();
        loop {
            let result = stream.read(&mut buf[..]);
            match result {
                Ok(n) => {
                    s = StTrie::process_chunk(&buf[0..n], s, &mut trie);
                    if n == 0 {
                        return trie;
                    }
                }
                Err(e) => panic!(e),
            }
        }
    }

    fn process_chunk(chunk: &[u8], mut s: St, trie: &mut StTrie) -> St {
        let mut skip = false;
        for b in chunk {
            match *b as char {
                '\r' | '\n' => {
                    if !skip && s.len() > 2 {
                        trie.insert(s);
                    }

                    skip = false;
                    s = St::empty();
                }
                c => {
                    if skip || s.len() == 12 {
                        skip = true;
                        s = St::empty();
                    } else {
                        s = s + Ch::from(c);
                    }
                }
            }
        }

        if chunk.len() == 0 && !skip && s.len() > 2 {
            trie.insert(s);
        }

        s
    }
    // . . .

It just goes to show that you can write assembly code in any language. Though I have to admit it isn’t as bad as it could be — all hail Rust’s safe slice type. This would have all been for nothing if we can’t squeeze a bit more perf out of the app, though:

[000.000] Loading trie...
[000.040] Loaded 162309 words.

We won again! The Rust solver is now a full 5 ms faster than C++ — so far. Even better, we didn’t have to write a custom hash table this time.

We’ll pick this up again later with the end-to-end solution. Until then, I wish you happy and safe systems programming!

WriteAsync .NET

Testing, coding, in that order.

Letter Boxed: Rust impl, part 3 (modules, trie, I/O)

One thought on “Letter Boxed: Rust impl, part 3 (modules, trie, I/O)”

Leave a Reply Cancel reply