Previously, we gave a send-off to the New York Times “Digits” game with a custom solver app. Unfortunately, the solver took quite some time (several hundred milliseconds) and even more space (several hundred megabytes) to analyze the solutions for one puzzle. I know we can do better than that!
Let’s start by looking more closely at the constraints of the actual game. You have at most six numbers on the board, all starting within the range of 1 – 25, and you won’t be given repeats. That means the biggest value you’d ever have to deal with would be a case where the starting board is 20, 21, 22, 23, 24, 25 and you choose to multiply all of them. That would yield 127,512,000 — a quantity that fits comfortably in a 32-bit integer.
Taking this further, if you consider that the starting numbers are all 5-bit integers (i.e., between 0 and 31), no combination of operations could ever lead to more than a 30-bit quantity at the end. (The product of an n-bit number and an m-bit number can have at most (n+m) bits.) And in fact, as you combine the numbers, you are combining their “bit space” — you start with six individual 5-bit slots and reduce down to (as an upper bound) one 30-bit slot.
Great, but how does this help? Well, this goes back to the question of allocations and its outsized effect on performance in these sorts of recursive algorithms. We can reduce allocation to almost nothing if we never have to deal with heap memory at all. So, what I’m getting at is that if we can think of a way to reduce the problem to manipulation of value types (C# structs), we might get a big performance win.
Let’s see if we can re-envision the entire board as a bounded bit space where you don’t need to deal with variable sized lists or arrays at all — the major source of allocation in this particular problem. Instead, you have to deal with a continually varying boundary between the numbers in the space.
Consider this conceptual memory layout, using the 20-25 example above (and wasting a few bits to encode each number in a whole byte):
|20 |21 |22 |23 |24 |25 |00010100|00010101|00010110|00010111|00011000|00011001
This could work as a board representation for this case, but only if we implicitly know each number is one byte long. As we continue to perform operations, the numbers could grow. Thus, we need to encode additional information about where each byte stream stops and where a new number begins.
This is where the variable-length quantity comes in. We can treat integers as variable length by using the top bit of each byte to mark whether we should append the next one or not. Since we are using only seven bits per byte to encode the numeric value, this is sometimes called Base-128. There is no change to the first example because each number already fits within the 7-bit limit. But let’s say we multiply 24 and 25 = 600, reducing the board down to five numbers. For variable-length encoding, we need to split 600 into 7-bit segments = 1011000 0000100
(assuming a little-endian byte layout), form these into normal 8-bit bytes =
x1011000 x0000100
and then finally set the continuation bit ‘x’ of the lower bytes to 1, and 0 for the highest byte. Our board would then look like this:
|20 |21 |22 |23 |600 |00010100|00010101|00010110|00010111|11011000 00000100
We still fit in six bytes overall. This will continue to be true no matter which operations we apply. The product 127,512,000 would look like this:
|127,512,000 |11000000 11011011 11100110 00111100
In this case, we actually have only four bytes to deal with.
This is a good start, but we will still need to track the total length of this number list. Since we can assume a maximum of six numbers using a maximum of six bytes, we can add a length byte prefix and still fit within a 64-bit integer. The overall memory layout for the 20-25 example would be thus:
|L=6 |20 |21 |22 |23 |24 |25 |(extra) | |00000110|00010100|00010101|00010110|00010111|00011000|00011001|00000000|
(Note that we have an extra padding byte at the end, to keep the structure as a fixed size of 64 bits.)
What about indexing this list? We have no easy way to jump to, say, the fifth item without starting from the beginning and counting the variable length quantities in the previous four. Is there a constant time and constant space solution to this problem, such that we can retain the O(1) lookup we get with a traditional list type?
First, let’s think of the possible starting bytes for each index:
n[0] | 1 |
n[1] | 2, 3, 4, 5, 6 |
n[2] | 3, 4, 5, 6 |
n[3] | 4, 5, 6 |
n[4] | 5, 6 |
n[5] | 6 |
The element n[0] has to occupy the first byte. However, it can be as short as one byte or as long as six (if the length is exactly one). That means n[1] can start anywhere from the second to the final (sixth) byte. This pattern continues until we get to the element n[5], which, if present, must be the sixth byte. So now we have the beginning of a formula for translating an element index to a byte position:
index switch { 0 => 1, 1 => 2 + /* 0, 1, 2, 3, 4 ??? */, 2 => 3 + /* 0, 1, 2, 3 ??? */, 3 => 4 + /* 0, 1, 2 ??? */, 4 => 5 + /* 0, 1 ??? */, 5 => 6, }
The problem now boils down to keeping a delta index ranging from 0 to 4 to correct our “guess” of where the number should be. We only need this delta for four cases, and the deltas get steadily smaller. In the spirit of frugality, we can keep aside three bits for the n[1] delta, two bits each for the n[2] and n[3] deltas, and a single bit for the n[4] delta. What luck, 3 + 2 + 2 + 1 = 8 bits, so we can scavenge this from the extra trailing byte.
Here is the final memory layout for one case:
|L=4 |20 |21 |506 |600 |deltas | |00000110|00010100|00010101|11111010 00000011|11011000 00000100|00100000|
The deltas are 0, 0, and 1 for n[1], n[2], and n[3] respectively.
Now we have a theoretically faster data type as the basis for our new solver:
public struct List6 { // The raw 8-byte layout private ulong raw; // Return the element at the given index (0 - 5) public int this[byte index] { get => /* TODO */ } // Return the length of the list (0 - 6) public byte Length { get => /* TODO */ } // Return a new list with the value appended public List6 Add(int value) => /* TODO */ }
How will this do in practice? We will find out next time.
Pingback: Digits: faster in practice? – WriteAsync .NET