{"id":5828,"date":"2021-04-05T07:00:08","date_gmt":"2021-04-05T14:00:08","guid":{"rendered":"http:\/\/writeasync.net\/?p=5828"},"modified":"2021-04-02T19:42:09","modified_gmt":"2021-04-03T02:42:09","slug":"letter-boxed-rust-impl-part-3-modules-trie-i-o","status":"publish","type":"post","link":"https:\/\/writeasync.net\/?p=5828","title":{"rendered":"Letter Boxed: Rust impl, part 3 (modules, trie, I\/O)"},"content":{"rendered":"<p>Our <a href=\"http:\/\/writeasync.net\/?p=5822\">Rust-based Letter Boxed code<\/a> so far has just the core character-based data types. Today we&#8217;ll add the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Trie\">trie<\/a>.<\/p>\n<p>Before we move on, we need to keep our house in order. Rather than have one massive lib.rs file, we should start <a href=\"https:\/\/doc.rust-lang.org\/book\/ch07-02-defining-modules-to-control-scope-and-privacy.html\">splitting out some modules<\/a>. The current lib.rs which has <code>St<\/code> and <code>Ch<\/code> will move to a module called <code>core<\/code>: <a href=\"https:\/\/github.com\/brian-dot-net\/games\/commit\/9ad20515dd8dea1eca0353856d425b95925fd124\">[see commit on GitHub]<\/a>.<\/p>\n<p>With that out of the way, we can begin a new module <code>trie<\/code> to hold our next chunk of code for <code>StTrie<\/code>: <a href=\"https:\/\/github.com\/brian-dot-net\/games\/commit\/872f3c8b1b9ce51542fe7000c5f0e17a21c7a15d\">[see commit on GitHub]<\/a><\/p>\n<p>The trie implementation is pretty much a straight port from <a href=\"https:\/\/github.com\/brian-dot-net\/games\/blob\/master\/src\/Words.Core\/StrTrie.cs\">the C# version<\/a>. After several incremental TDD steps, we land on the following <a href=\"https:\/\/github.com\/brian-dot-net\/games\/blob\/46338e82df62170398901d8ac0c7c4690c50703d\/letter_boxed_solver\/src\/trie.rs\">implementation<\/a>:<\/p>\n<pre class=\"brush: rust; title: ; notranslate\" title=\"\">\r\nuse crate::core::St;\r\nuse std::collections::HashMap;\r\n\r\n#&#x5B;derive(Debug, PartialEq)]\r\npub enum NodeKind {\r\n    None,\r\n    Prefix,\r\n    Terminal,\r\n}\r\n\r\npub struct StTrie {\r\n    nodes: HashMap&lt;St, bool&gt;,\r\n    count: usize,\r\n}\r\n\r\nimpl StTrie {\r\n    fn new() -&gt; StTrie {\r\n        let nodes = HashMap::new();\r\n        let count = 0;\r\n        StTrie { nodes, count }\r\n    }\r\n\r\n    fn len(&amp;self) -&gt; usize {\r\n        self.count\r\n    }\r\n\r\n    fn insert(&amp;mut self, value: St) {\r\n        if value.len() == 0 {\r\n            return;\r\n        }\r\n\r\n        if let Some(&amp;true) = self.nodes.get(&amp;value) {\r\n            return;\r\n        }\r\n\r\n        self.count += 1;\r\n        self.nodes.insert(value, true);\r\n        let mut value = value;\r\n        for _ in 1..value.len() {\r\n            value = value.chop();\r\n            if let Some(_) = self.nodes.get(&amp;value) {\r\n                return;\r\n            }\r\n\r\n            self.nodes.insert(value, false);\r\n        }\r\n    }\r\n\r\n    fn find(&amp;self, value: St) -&gt; NodeKind {\r\n        match self.nodes.get(&amp;value) {\r\n            None =&gt; NodeKind::None,\r\n            Some(&amp;true) =&gt; NodeKind::Terminal,\r\n            Some(&amp;false) =&gt; NodeKind::Prefix,\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<p>There&#8217;s nothing too special here, other than maybe the <a href=\"https:\/\/stackoverflow.com\/questions\/65684540\/how-to-read-if-let-expressions\">if let expression<\/a> when checking if the node already exists.<\/p>\n<p>The last big feature for <code>StTrie<\/code> is the ability to load words from a stream of text. Thus, we need to get familiar with <a href=\"https:\/\/doc.rust-lang.org\/std\/io\/trait.Read.html\">the Read trait<\/a>, roughly analogous to the read side of <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.io.stream?view=net-5.0\"><code>System.IO.Stream<\/code> in .NET<\/a>. Of course, the API is rather primitive, so let&#8217;s instead start with the <a href=\"https:\/\/doc.rust-lang.org\/std\/io\/trait.BufRead.html\">easier BufRead derived trait<\/a>. Most notably, it allows us to use <a href=\"https:\/\/doc.rust-lang.org\/stable\/rust-by-example\/std_misc\/file\/read_lines.html\">the <code>lines()<\/code> iterator<\/a> for <a href=\"https:\/\/github.com\/brian-dot-net\/games\/blob\/0e294d0634ba627198cf93aa5bad481d5d436c72\/letter_boxed_solver\/src\/trie.rs#L23\">a much nicer experience<\/a>:<\/p>\n<pre class=\"brush: rust; title: ; notranslate\" title=\"\">\r\nimpl StTrie {\r\n    \/\/ . . .\r\n    fn load&lt;B: BufRead&gt;(stream: &amp;mut B) -&gt; StTrie {\r\n        let mut trie = StTrie::new();\r\n        for line in stream.lines() {\r\n            let word = line.unwrap();\r\n            if word.len() &gt;= 3 &amp;&amp; word.len() &lt;= 12 {\r\n                let value = word.parse::&lt;St&gt;().unwrap();\r\n                trie.insert(value);\r\n            }\r\n        }\r\n\r\n        trie\r\n    }\r\n    \/\/ . . .\r\n}\r\n<\/pre>\n<p>You might wonder what the <a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/system.io.memorystream?view=net-5.0\"><code>MemoryStream<\/code><\/a> equivalent is for the purposes of unit testing. It turns out we can use <a href=\"https:\/\/doc.rust-lang.org\/std\/io\/struct.Cursor.html\"><code>Cursor<\/code><\/a> for that, as demonstrated in this test helper which converts a bunch of string slices to a big block of bytes:<\/p>\n<pre class=\"brush: rust; title: ; notranslate\" title=\"\">\r\n    fn load_trie(lines: Vec&lt;&amp;str&gt;) -&gt; StTrie {\r\n        let lines = lines.join(&quot;\\r\\n&quot;).as_bytes().to_vec();\r\n        let mut stream = Cursor::new(lines);\r\n        StTrie::load(&amp;mut stream)\r\n    }\r\n<\/pre>\n<p>At this point, we have enough code to write the trie-loading portion of the solver app. So let&#8217;s <a href=\"https:\/\/github.com\/brian-dot-net\/games\/commit\/925c0a03015f3061426902503ac6c555fb4de55f\">go ahead and do that<\/a>:<\/p>\n<pre class=\"brush: rust; title: ; notranslate\" title=\"\">\r\nuse letter_boxed_solver::trie::StTrie;\r\nuse std::{env::args, fs::File, time::Instant};\r\n\r\nfn main() {\r\n    let start = Instant::now();\r\n    let args = args();\r\n    if args.len() != 3 {\r\n        println!(&quot;Please specify a Letter Boxed puzzle and a word list file.&quot;);\r\n        return;\r\n    }\r\n\r\n    log(start, &quot;Loading trie...&quot;);\r\n    let path = args.skip(2).next().unwrap();\r\n    let mut stream = File::open(path).unwrap();\r\n    let trie = StTrie::load(&amp;mut stream);\r\n    log(start, format!(&quot;Loaded {} words.&quot;, trie.len()).as_str());\r\n}\r\n\r\nfn log(start: Instant, message: &amp;str) {\r\n    println!(&quot;&#x5B;{:07.3}] {}&quot;, start.elapsed().as_secs_f64(), message);\r\n}\r\n<\/pre>\n<p>We&#8217;re doing basic measurement with <a href=\"https:\/\/doc.rust-lang.org\/std\/time\/struct.Instant.html\"><code>std::time::Instant<\/code><\/a> to see how fast this app really is! Let&#8217;s give it a spin now:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;000.000] Loading trie...\r\n&#x5B;000.059] Loaded 162309 words.\r\n<\/pre>\n<p>Hmm&#8230; this is a problem. The C++ implementation (<a href=\"http:\/\/writeasync.net\/?p=5650\">after some tricky optimizations<\/a>) was able to finish the trie load in ~45 ms <a href=\"http:\/\/writeasync.net\/?p=5802\">as of the last benchmark<\/a>. Even the C# implementation got to 52 ms&#8230; so honestly 59 ms seems quite slow. Could it be another case of <a href=\"http:\/\/directcp.sourceforge.net\/files\/performance.pdf\">slowdowns due to buffered I\/O<\/a>?<\/p>\n<p>The only way to find out is to write the <a href=\"https:\/\/github.com\/brian-dot-net\/games\/commit\/d74a06e27a7f90ab13189f00aaefebce268415d8\">ugly raw implementation<\/a>:<\/p>\n<pre class=\"brush: rust; title: ; notranslate\" title=\"\">\r\n    \/\/ . . . \r\n    pub fn load&lt;R: Read&gt;(stream: &amp;mut R) -&gt; StTrie {\r\n        let mut trie = StTrie::new();\r\n        let mut buf = &#x5B;0u8; 1024];\r\n        let mut s = St::empty();\r\n        loop {\r\n            let result = stream.read(&amp;mut buf&#x5B;..]);\r\n            match result {\r\n                Ok(n) =&gt; {\r\n                    s = StTrie::process_chunk(&amp;buf&#x5B;0..n], s, &amp;mut trie);\r\n                    if n == 0 {\r\n                        return trie;\r\n                    }\r\n                }\r\n                Err(e) =&gt; panic!(e),\r\n            }\r\n        }\r\n    }\r\n\r\n    fn process_chunk(chunk: &amp;&#x5B;u8], mut s: St, trie: &amp;mut StTrie) -&gt; St {\r\n        let mut skip = false;\r\n        for b in chunk {\r\n            match *b as char {\r\n                '\\r' | '\\n' =&gt; {\r\n                    if !skip &amp;&amp; s.len() &gt; 2 {\r\n                        trie.insert(s);\r\n                    }\r\n\r\n                    skip = false;\r\n                    s = St::empty();\r\n                }\r\n                c =&gt; {\r\n                    if skip || s.len() == 12 {\r\n                        skip = true;\r\n                        s = St::empty();\r\n                    } else {\r\n                        s = s + Ch::from(c);\r\n                    }\r\n                }\r\n            }\r\n        }\r\n\r\n        if chunk.len() == 0 &amp;&amp; !skip &amp;&amp; s.len() &gt; 2 {\r\n            trie.insert(s);\r\n        }\r\n\r\n        s\r\n    }\r\n    \/\/ . . .\r\n<\/pre>\n<p>It just goes to show that you <a href=\"https:\/\/softwareengineering.stackexchange.com\/questions\/97181\/is-it-true-that-real-programmers-can-write-assembly-code-in-any-language\">can write assembly code in any language<\/a>. Though I have to admit it isn&#8217;t as bad as it could be &#8212; all hail Rust&#8217;s <a href=\"https:\/\/doc.rust-lang.org\/std\/slice\/index.html\">safe slice type<\/a>. This would have all been for nothing if we can&#8217;t squeeze a bit more perf out of the app, though:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n&#x5B;000.000] Loading trie...\r\n&#x5B;000.040] Loaded 162309 words.\r\n<\/pre>\n<p>We won again! The Rust solver is now a full 5 ms faster than C++ &#8212; so far. Even better, we didn&#8217;t have to write a custom hash table this time.<\/p>\n<p>We&#8217;ll pick this up again later with the end-to-end solution. Until then, I wish you happy and <a href=\"https:\/\/cacm.acm.org\/magazines\/2021\/4\/251364-safe-systems-programming-in-rust\/fulltext\">safe systems programming<\/a>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Our Rust-based Letter Boxed code so far has just the core character-based data types. Today we&#8217;ll add the trie. Before we move on, we need to keep our house in order. Rather than have one massive lib.rs file, we should&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[91,101,41],"tags":[],"class_list":["post-5828","post","type-post","status-publish","format-standard","hentry","category-design","category-native","category-tdd"],"_links":{"self":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5828","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5828"}],"version-history":[{"count":3,"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5828\/revisions"}],"predecessor-version":[{"id":5831,"href":"https:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/5828\/revisions\/5831"}],"wp:attachment":[{"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5828"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5828"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5828"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}