Recently I encountered a situation where a seemingly innocuous change in the structure of a dictionary key had a noticeable negative performance impact (more on this later). This got me thinking, how exactly do different types of keys measure up…
An even faster hash table
Last week, in my continuing exploration of a C++ Letter Boxed solver, I had to build a faster hash table to improve performance. Today we’ll make it even faster, hoping to catch up to the performance of my .NET Core…
A faster hash table
Last time, I wrote about a native C++ implementation of the Letter Boxed solver. Vexingly, this native implementation was actually slower than the corresponding .NET Core app. Today we’ll try to fix that. As I mentioned, some basic profiling revealed…
Native code: always faster?
Last month, I explored some performance optimizations for a C# Letter Boxed solver. It was a fair bit of effort but in the end I could solve a typical puzzle in ~80 ms. Success! However, you might rightly point out…
Letter Boxed: perf gains and losses
After successfully optimizing the trie loading path, we apparently “pessimized” the solution path. Let’s see if we can find out why. I’ll start by building a benchmark for the word finding algorithm: FindWords benchmark Note that this benchmark eliminates all…
Letter Boxed: optimizing the trie
Last time, I demonstrated a relatively fast Letter Boxed solver. It took about 15 ms to solve the puzzle, but that was after a delay of over 200 ms while loading the trie. Surely we can squeeze some more performance…
A simple message bus: Java edition
In the previous post, we looked at competing message bus implementations in C# and C++. How about we give Java a try now? Converting to Java syntax and style conventions, the SendOneSubscriber test should look like this: Alas, there is…
A simple message bus
The message bus is a typical pattern to allow loosely coupled software components to communicate, usually in an event-driven manner. Don’t let the enterprise-y description deter you — a simple message bus for a single process scenario can be quite…
Even more performance experiments: queues and threads
Last time, we concluded that a simple producer/consumer pattern using BlockingCollection topped out at around 2200K items per second. But the profiler revealed that the Throttle itself was one major contributor to the total CPU time. Let’s first address this…
More performance experiments: queues and threads
Continuing from our previous performance experiment, I would like to see if there are any easy optimizations to apply to squeeze more throughput out of this producer/consumer queue. One possible angle of attack is to replace the implicit synchronization primitives…