Let’s do DHCP: sockets

Spread the love

After our foray into DHCP options, we are ready for the last piece of the puzzle — actually sending and receiving data on a socket.

First, we need to encode the high performance receive pattern. I’ve decided on an async DhcpReceiveLoop class structured as follows:

    public sealed class DhcpReceiveLoop
        public DhcpReceiveLoop(IInputSocket socket);

        public Task RunAsync(Memory<byte> buffer, IDhcpReceiveCallbacks callbacks, CancellationToken token);

The idea is that you pass a single socket to be shared (we know from before that the socket is thread safe) and start as many receive loops as you like. Rather than use a raw socket, we have a simple abstraction:

    public interface IInputSocket
        ValueTask<int> ReceiveAsync(Memory<byte> buffer, CancellationToken token);

(We’re able to use ValueTask here because the underlying Socket implementation does as well.)

To control the custom receive behavior we have a callbacks interface:

    public interface IDhcpReceiveCallbacks
        ValueTask OnReceiveAsync(DhcpMessageBuffer message, CancellationToken token);

        ValueTask OnErrorAsync(DhcpError error, CancellationToken token);

I chose an interface over raw delegates because we have a group of related methods and it’s nice to give them names. We also have ValueTask again because it is likely that these methods will often complete synchronously but we still have flexibility for asynchronous implementations (e.g. if you want send a network reply right after the receive).

This handles the receive side but for completeness we need send and socket disposal. No problem, we’ll just create two more interfaces:

    public interface IOutputSocket
        Task SendAsync(ReadOnlyMemory<byte> buffer, IPEndpointV4 endpoint);

    public interface ISocket : IInputSocket, IOutputSocket, IDisposable

Once we start exploring how to actually create and send on a socket, we run into an interesting challenge. These operations both require an IPEndPoint instance which is a reference type. If we had to create a new instance on every DHCP send, we would really cramp our zero-allocation style. Instead, we can create an IPEndpointV4 struct which combines our IPAddressV4 with a port number (16-bit of course — no bytes wasted!) and an ObjectCache which keys the cheap struct value to a pre-allocated instance (using ConcurrentDictionary for thread-safety). The specific EndpointCache implementation doesn’t need to worry about doing anything complicated; IPEndPoint is stateless so it is safe to new one up any time, even if we race with another invocation (remember that ConcurrentDictionary does not protect against such GetOrAdd race conditions). To prove that we aren’t allocating any unnecessary memory or wasting CPU cycles, we have an EndpointCache benchmark:

|   Method |     N |      Mean |    Error |   StdDev |  Gen 0 |  Gen 1 | Gen 2 | Allocated |
|--------- |------ |----------:|---------:|---------:|-------:|-------:|------:|----------:|
| Cache256 |   256 |  45.75 us | 1.002 us | 1.268 us | 8.3618 | 1.2817 |     - |  64.23 KB |
| Cache256 |  2048 | 116.75 us | 1.019 us | 0.851 us | 8.3008 | 1.2207 |     - |  64.23 KB |
| Cache256 | 16384 | 686.10 us | 3.293 us | 2.571 us | 7.8125 | 0.9766 |     - |  64.23 KB |

In this benchmark, we can see that caching 256 endpoints and then requesting them again has a fixed memory cost (~256 bytes per instance) and surprisingly low overhead (~180 ns per instance on initial add, ~40 ns on retrieve).

Now we’re ready to build the concrete DhcpSocket class. It is almost a pass-through implementation of the interfaces above with a bit of logic to translate the underlying socket exception contract into a domain-specific DhcpException class. Putting it all together, we have a DhcpReceiveLoopBenchmark class that implements a simple localhost scenario with a real socket:

| Method |    N |        Mean |     Error |    StdDev |   Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |----- |------------:|----------:|----------:|--------:|------:|------:|----------:|
|    Run |   10 |    244.1 us |   4.84 us |   7.24 us |  0.9766 |     - |     - |   9.21 KB |
|    Run |  100 |  1,495.9 us |  32.32 us |  35.93 us |  9.7656 |     - |     - |  78.05 KB |
|    Run | 1000 | 13,261.3 us | 288.20 us | 512.27 us | 93.7500 |     - |     - | 735.14 KB |

All told, this (relatively synthetic) scenario amortizes to ~13 usec per operation with the only memory overhead being whatever .NET decides to allocate behind the scenes (e.g. Task instances).

Also, as an addendum to the previous blog post, I decided after all to implement the enumerator pattern for options. The callback way was just too weird and inconvenient. Of course, we have to keep performance in mind, so instead of using IEnumerable I implemented a struct which meets all the requirements (i.e. it has “public parameterless GetEnumerator method whose return type is either class, struct, or interface type” which “has the public Current property and the public parameterless MoveNext method whose return type is Boolean”). The LoadWithOptions benchmark now looks like “normal” code:

public long LoadWithOptions()
    this.totalSize = 0;
    this.buffer.ReadOptions(this, (o, t) => t.totalSize += o.Data.Length);
    return this.totalSize;

public long LoadWithOptions()
    long totalSize = 0;
    foreach (DhcpOption option in this.buffer.Options)
        totalSize += option.Data.Length;

    return totalSize;

Sadly, the timing is ~270 ns now compared to ~180 ns before (still zero memory allocations, though!). This is also after a bunch of optimizations (see DhcpOptionsBuffer.cs for the gory details). Oh well… I suppose I can trade a couple nanoseconds for the improved usability.

With that, our DHCP journey has concluded. It just goes to show that .NET Core makes for a great low-overhead, high-performance server platform in a variety of scenarios — if you’re willing to put the work in.

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload the CAPTCHA.