{"id":3541,"date":"2015-04-15T13:00:51","date_gmt":"2015-04-15T13:00:51","guid":{"rendered":"http:\/\/writeasync.net\/?p=3541"},"modified":"2015-04-12T22:42:33","modified_gmt":"2015-04-12T22:42:33","slug":"orchestrating-race-conditions","status":"publish","type":"post","link":"http:\/\/writeasync.net\/?p=3541","title":{"rendered":"Orchestrating race conditions"},"content":{"rendered":"<p>Many a programmer has struggled with unit tests and those pesky race conditions that are <a href=\"http:\/\/stackoverflow.com\/questions\/176636\/unit-testing-deadlocks-and-race-conditions\"><br \/>\nseemingly<\/a> <a href=\"http:\/\/www.reddit.com\/r\/programming\/comments\/75lky\/unit_testing_deadlocks_and_race_conditions\/\">immune<\/a> to them. Is it even possible to verify concurrency correctness using TDD? I am going to tell you that it is &#8212; sometimes. There are two prerequisites: you have to know that a particular race condition is possible, and you have to be able to deterministically expose it.<\/p>\n<p>Awareness of a race can arise in several ways. It could be a thought exercise based on careful analysis of the code. Other times it is through a separate activity such as a multi-threaded load test. No matter how it comes up, a TDD practitioner upon learning of the race will not want to leave reproducibility to chance. But being able to expose such a race in a unit test will obviously require a &#8220;hook&#8221; to orchestrate a specific execution sequence. Basically, we need what <a href=\"https:\/\/michaelfeathers.silvrback.com\/\">Michael Feathers<\/a> calls a <a href=\"http:\/\/www.informit.com\/articles\/article.aspx?p=359417&#038;seqNum=2\">seam<\/a>.<\/p>\n<p>There are a few examples of this on the web already. For example, <a href=\"http:\/\/vance.com\/\">Stephen Vance<\/a>&#8216;s article <a href=\"http:\/\/www.informit.com\/articles\/article.aspx?p=1882162\">Reproducing Race Conditions in Tests<\/a> shows a Java-based approach using <a href=\"http:\/\/logging.apache.org\/log4j\/\">log4j<\/a> as an injection mechanism. Fortunately in asynchronous .NET code, a method returning a <code>Task<\/code> already provides a pretty good seam as a starting point, assuming we have some control of the creation of the task.<\/p>\n<p>Consider a simple example of a process that is restarted on demand if it has crashed or exited. Let&#8217;s call this <code>ImmortalProcess<\/code>. The first few behavioral specifications are pretty simple: we need the process to be created on first use, reused on subsequent calls, and recreated on the next use after a fault (notified by an event). The code and associated tests might look something like this:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\npublic class ImmortalProcess\r\n{\r\n    private readonly Func&lt;Task&lt;IProcess&gt;&gt; createProcessAsync;\r\n\r\n    private IProcess cachedProcess;\r\n\r\n    public ImmortalProcess(Func&lt;Task&lt;IProcess&gt;&gt; createProcessAsync)\r\n    {\r\n        this.createProcessAsync = createProcessAsync;\r\n    }\r\n\r\n    public async Task&lt;int&gt; GetIdAsync()\r\n    {\r\n        if (this.cachedProcess == null)\r\n        {\r\n            this.cachedProcess = await this.createProcessAsync();\r\n            this.cachedProcess.Faulted += this.OnProcessFaulted;\r\n        }\r\n\r\n        return this.cachedProcess.Id;\r\n    }\r\n\r\n    private void OnProcessFaulted(object sender, EventArgs e)\r\n    {\r\n        this.cachedProcess = null;\r\n    }\r\n}\r\n\r\npublic interface IProcess\r\n{\r\n    int Id { get; }\r\n\r\n    event EventHandler Faulted;\r\n}\r\n\r\n&#x5B;TestClass]\r\npublic class ImmortalProcessTest\r\n{\r\n    &#x5B;TestMethod]\r\n    public void ShouldCreateProcessOnFirstCall()\r\n    {\r\n        ImmortalProcess process = new ImmortalProcess(() =&gt; Task.FromResult&lt;IProcess&gt;(new ProcessStub() { Id = 0xA }));\r\n\r\n        Task&lt;int&gt; firstCall = process.GetIdAsync();\r\n\r\n        Assert.IsTrue(firstCall.IsCompleted);\r\n        Assert.AreEqual(0xA, firstCall.Result);\r\n    }\r\n\r\n    &#x5B;TestMethod]\r\n    public void ShouldReuseCreatedProcessOnSecondCall()\r\n    {\r\n        int currentId = 0;\r\n        ImmortalProcess process = new ImmortalProcess(() =&gt; Task.FromResult&lt;IProcess&gt;(new ProcessStub() { Id = ++currentId }));\r\n\r\n        Task&lt;int&gt; firstCall = process.GetIdAsync();\r\n        Assert.IsTrue(firstCall.IsCompleted);\r\n\r\n        Task&lt;int&gt; secondCall = process.GetIdAsync();\r\n\r\n        Assert.IsTrue(secondCall.IsCompleted);\r\n        Assert.AreEqual(1, currentId);\r\n        Assert.AreEqual(1, secondCall.Result);\r\n    }\r\n\r\n    &#x5B;TestMethod]\r\n    public void ShouldRecreateProcessOnNextCallAfterFault()\r\n    {\r\n        int currentId = 0;\r\n        ProcessStub currentProcess = null;\r\n        ImmortalProcess process = new ImmortalProcess(() =&gt; Task.FromResult&lt;IProcess&gt;(currentProcess = new ProcessStub() { Id = ++currentId }));\r\n\r\n        Task&lt;int&gt; firstCall = process.GetIdAsync();\r\n        Assert.IsTrue(firstCall.IsCompleted);\r\n\r\n        Assert.IsNotNull(currentProcess);\r\n\r\n        currentProcess.RaiseFaulted();\r\n\r\n        Task&lt;int&gt; secondCall = process.GetIdAsync();\r\n\r\n        Assert.IsTrue(secondCall.IsCompleted);\r\n        Assert.AreEqual(2, currentId);\r\n        Assert.AreEqual(2, secondCall.Result);\r\n    }\r\n\r\n    private sealed class ProcessStub : IProcess\r\n    {\r\n        public int Id { get; set; }\r\n\r\n        public event EventHandler Faulted;\r\n\r\n        public void RaiseFaulted()\r\n        {\r\n            this.Faulted(null, null);\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<p>This code is not thread-safe whatsoever. So let&#8217;s introduce a new requirement &#8212; the process creation must be done once only by the first caller. By inspection we know the current code won&#8217;t meet this condition, so let&#8217;s write the test to expose the issue. We can take advantage of the natural seam provided by the <code>createProcessAsync<\/code> delegate to control the lifetime of all returned <code>Task<\/code>s:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n&#x5B;TestMethod]\r\npublic void ShouldCreateProcessOnceOnlyByFirstConcurrentCaller()\r\n{\r\n    TaskCompletionSource&lt;IProcess&gt; currentTask = null;\r\n    Func&lt;Task&lt;IProcess&gt;&gt; createProcessAsync = delegate\r\n    {\r\n        currentTask = new TaskCompletionSource&lt;IProcess&gt;();\r\n        return currentTask.Task;\r\n    };\r\n\r\n    ImmortalProcess process = new ImmortalProcess(createProcessAsync);\r\n\r\n    Task&lt;int&gt; firstCall = process.GetIdAsync();\r\n    Assert.IsFalse(firstCall.IsCompleted);\r\n    Assert.IsNotNull(currentTask);\r\n\r\n    TaskCompletionSource&lt;IProcess&gt; firstTask = currentTask;\r\n    currentTask = null;\r\n\r\n    Task&lt;int&gt; secondCall = process.GetIdAsync();\r\n    Assert.IsFalse(secondCall.IsCompleted);\r\n    Assert.IsNull(currentTask); \/\/ FAILS\r\n\r\n    firstTask.SetResult(new ProcessStub() { Id = 0x12ACE });\r\n\r\n    Assert.IsTrue(firstCall.IsCompleted);\r\n    Assert.AreEqual(0x12ACE, firstCall.Result);\r\n\r\n    Assert.IsTrue(secondCall.IsCompleted);\r\n    Assert.AreEqual(0x12ACE, secondCall.Result);\r\n}\r\n<\/pre>\n<p>With the code as written, the assertion marked <code>\/\/ FAILS<\/code> will not pass because both callers are incorrectly allowed to create the process. Given that the process creation method is async, we will fix this by introducing an asynchronous exclusive lock (see <a href=\"http:\/\/writeasync.net\/?p=501\">&#8220;Building an async exclusive lock&#8221;<\/a> for an implementation).<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\npublic class ImmortalProcess\r\n{\r\n    private readonly ExclusiveLock exclusiveLock;\r\n    \/\/ ...\r\n    public ImmortalProcess(Func&lt;Task&lt;IProcess&gt;&gt; createProcessAsync)\r\n    {\r\n        \/\/ ...\r\n        this.exclusiveLock = new ExclusiveLock();\r\n    }\r\n\r\n    public async Task&lt;int&gt; GetIdAsync()\r\n    {\r\n        if (this.cachedProcess == null)\r\n        {\r\n            ExclusiveLock.Token token = await this.exclusiveLock.AcquireAsync();\r\n            try\r\n            {\r\n                if (this.cachedProcess == null)\r\n                {\r\n                    this.cachedProcess = await this.createProcessAsync();\r\n                    this.cachedProcess.Faulted += this.OnProcessFaulted;\r\n                }\r\n            }\r\n            finally\r\n            {\r\n                this.exclusiveLock.Release(token);\r\n            }\r\n        }\r\n\r\n        return this.cachedProcess.Id;\r\n    }\r\n    \/\/ ...\r\n}\r\n<\/pre>\n<p>First race condition down! (Note: For brevity, I&#8217;ve omitted two tests that would show the lock is released properly on synchronous and asynchronous exceptions from <code>createChannelAsync<\/code>.)<\/p>\n<p>There is still a very subtle race condition here, though. Imagine that the <code>Faulted<\/code> event is raised sometime between creation and first use of the process. Since we set the cached process to null on fault, we would probably see a <code>NullReferenceException<\/code>! Instead, what we&#8217;d rather do in this case is use the initially created process; the call would likely still fail but with a more sensible error (e.g. the process has exited\/is not available). But now we&#8217;re faced with a problem: how can we cause the fault happen deterministically after creation and event subscription, but before the first call we make?<\/p>\n<p>The only code between these two events right now is a call to release the lock, which might be an interesting place for a seam. So let&#8217;s introduce a new locking pattern via a <code>LockAsync<\/code> delegate to return an <code>IDisposable<\/code> representing the lock scope. By default we&#8217;ll wire it up to <code>ExclusiveLock<\/code>, but via <a href=\"http:\/\/jeremybytes.blogspot.com\/2014\/01\/dependency-injection-property-injection.html\">property injection<\/a> the caller can redirect it to a custom implementation:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\npublic class ImmortalProcess\r\n{\r\n    \/\/ ...\r\n    public ImmortalProcess(Func&lt;Task&lt;IProcess&gt;&gt; createProcessAsync)\r\n    {\r\n        \/\/ ...\r\n        this.LockAsync = this.AcquireAsync;\r\n    }\r\n\r\n    public Func&lt;Task&lt;IDisposable&gt;&gt; LockAsync { get; set; }\r\n\r\n    public async Task&lt;int&gt; GetIdAsync()\r\n    {\r\n        if (this.cachedProcess == null)\r\n        {\r\n            using (await this.LockAsync())\r\n            {\r\n                if (this.cachedProcess == null)\r\n                {\r\n                    this.cachedProcess = await this.createProcessAsync();\r\n                    this.cachedProcess.Faulted += this.OnProcessFaulted;\r\n                }\r\n            }\r\n        }\r\n\r\n        return this.cachedProcess.Id;\r\n    }\r\n    \/\/ ...\r\n    private async Task&lt;IDisposable&gt; AcquireAsync()\r\n    {\r\n        ExclusiveLock.Token token = await this.exclusiveLock.AcquireAsync();\r\n        return new TokenWrapper(this.exclusiveLock, token);\r\n    }\r\n\r\n    private sealed class TokenWrapper : IDisposable\r\n    {\r\n        private readonly ExclusiveLock parent;\r\n        private readonly ExclusiveLock.Token token;\r\n\r\n        public TokenWrapper(ExclusiveLock parent, ExclusiveLock.Token token)\r\n        {\r\n            this.parent = parent;\r\n            this.token = token;\r\n        }\r\n\r\n        public void Dispose()\r\n        {\r\n            this.parent.Release(this.token);\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<p>Now we can write the failing test:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n&#x5B;TestMethod]\r\npublic void ShouldUseInitiallyCreatedProcessIfFaultedBeforeEndOfFirstCall()\r\n{\r\n    int count = 0;\r\n    ProcessStub currentProcess = null;\r\n    ImmortalProcess process = new ImmortalProcess(() =&gt; Task.FromResult&lt;IProcess&gt;(currentProcess = new ProcessStub() { Id = ++count }));\r\n    TaskCompletionSource&lt;IDisposable&gt; tcs = new TaskCompletionSource&lt;IDisposable&gt;();\r\n    process.LockAsync = () =&gt; tcs.Task;\r\n    DisposableStub disposable = new DisposableStub() { OnDisposing = () =&gt; currentProcess.RaiseFaulted() };\r\n\r\n    Task&lt;int&gt; firstCall = process.GetIdAsync();\r\n    Assert.IsFalse(firstCall.IsCompleted);\r\n\r\n    tcs.SetResult(disposable);\r\n\r\n    Assert.IsTrue(firstCall.IsCompleted);\r\n    Assert.AreEqual(1, firstCall.Result);\r\n}\r\n<\/pre>\n<p>As expected, it will fail when getting Result at the end because the task will be faulted due to a null reference. Now for the fix which uses a local variable that will remain unaffected no matter what happens to the <code>cachedProcess<\/code> reference:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\npublic async Task&lt;int&gt; GetIdAsync()\r\n{\r\n    IProcess current = this.cachedProcess;\r\n    if (current == null)\r\n    {\r\n        using (await this.LockAsync())\r\n        {\r\n            current = this.cachedProcess;\r\n            if (current == null)\r\n            {\r\n                current = await this.createProcessAsync();\r\n                current.Faulted += this.OnProcessFaulted;\r\n                this.cachedProcess = current;\r\n            }\r\n        }\r\n    }\r\n\r\n    return current.Id;\r\n}\r\n<\/pre>\n<p>Race condition two vanquished! Are we done yet? Sadly, no, as there is one more <em>extremely<\/em> subtle race. What if the process is faulted just before we subscribe to the <code>Faulted<\/code> event? No matter how hard we try we could miss this event, so we need to do a last-chance check before we decide to publish the value. Failing to do this would result in a &#8220;perma-faulted&#8221; process object which we would never recreate.<\/p>\n<p>Here is the failing test, which requires a new <code>IsFaulted<\/code> property:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n&#x5B;TestMethod]\r\npublic void ShouldRecreateProcessOnNextCallIfFaultedBeforeSubscribed()\r\n{\r\n    int count = 0;\r\n    ImmortalProcess process = new ImmortalProcess(() =&gt; Task.FromResult&lt;IProcess&gt;(new ProcessStub() { IsFaulted = true, Id = ++count }));\r\n\r\n    Task&lt;int&gt; firstCall = process.GetIdAsync();\r\n    Assert.IsTrue(firstCall.IsCompleted);\r\n\r\n    Task&lt;int&gt; secondCall = process.GetIdAsync();\r\n    Assert.IsTrue(secondCall.IsCompleted);\r\n    Assert.AreEqual(2, secondCall.Result);\r\n}\r\n<\/pre>\n<p>It will fail as the <code>count<\/code> will forever be stuck at <code>1<\/code>. In the fix, we push the <code>IsFaulted<\/code> property definition to the <code>IProcess<\/code> interface and make use of it in <code>ImmortalProcess<\/code>:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\npublic async Task&lt;int&gt; GetIdAsync()\r\n{\r\n    IProcess current = this.cachedProcess;\r\n    if (current == null)\r\n    {\r\n        using (await this.LockAsync())\r\n        {\r\n            current = this.cachedProcess;\r\n            if (current == null)\r\n            {\r\n                current = await this.createProcessAsync();\r\n                current.Faulted += this.OnProcessFaulted;\r\n                if (!current.IsFaulted)\r\n                {\r\n                    this.cachedProcess = current;\r\n                }\r\n            }\r\n        }\r\n    }\r\n\r\n    return current.Id;\r\n}\r\n<\/pre>\n<p>Say goodbye to race condition three. But in a cruel twist of fate, we have <em>introduced<\/em> new race condition four. Consider the case where the <code>Faulted<\/code> event is raised just as we are checking the <code>IsFaulted<\/code> flag. The <code>cachedProcess<\/code> would be set to null by the event handler and then we would set it right back again to the now-faulted <code>current<\/code> process. To reproduce this race, we need to use the <code>IsFaulted<\/code> property as a seam:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\nprivate sealed class ProcessStub : IProcess\r\n{\r\n    private bool isFaulted;\r\n    \/\/ ...\r\n    public Action OnGetIsFaulted { get; set; }\r\n\r\n    public bool IsFaulted\r\n    {\r\n        get\r\n        {\r\n            if (this.OnGetIsFaulted != null)\r\n            {\r\n                this.OnGetIsFaulted();\r\n            }\r\n\r\n            return this.isFaulted;\r\n        }\r\n\r\n        set\r\n        {\r\n            this.isFaulted = value;\r\n        }\r\n    }\r\n    \/\/...\r\n}\r\n<\/pre>\n<p>Now the unit test can inject a call to <code>RaiseFaulted()<\/code> in the getter:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n&#x5B;TestMethod]\r\npublic void ShouldRecreateProcessOnNextCallIfFaultedAfterSubscribed()\r\n{\r\n    int count = 0;\r\n    Func&lt;Task&lt;IProcess&gt;&gt; createProcessAsync = delegate\r\n    {\r\n        ProcessStub currentProcess = new ProcessStub() { Id = ++count };\r\n        currentProcess.OnGetIsFaulted = () =&gt; currentProcess.RaiseFaulted();\r\n        return Task.FromResult&lt;IProcess&gt;(currentProcess);\r\n    };\r\n\r\n    ImmortalProcess process = new ImmortalProcess(createProcessAsync);\r\n\r\n    Task&lt;int&gt; firstCall = process.GetIdAsync();\r\n    Assert.IsTrue(firstCall.IsCompleted);\r\n\r\n    Task&lt;int&gt; secondCall = process.GetIdAsync();\r\n    Assert.IsTrue(secondCall.IsCompleted);\r\n    Assert.AreEqual(2, secondCall.Result);\r\n}\r\n<\/pre>\n<p>This test fails as we had expected with <code>count<\/code> stuck at <code>1<\/code>. To fix this, we need to do sort of the inverse of the current code. We will publish the value to <code>cachedProcess<\/code> as early as possible and set it to null if it happens to end up faulted. This makes the operation essentially idempotent &#8212; no harm will be done if we race with the event handler and set to null again:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\npublic async Task&lt;int&gt; GetIdAsync()\r\n{\r\n    IProcess current = this.cachedProcess;\r\n    if (current == null)\r\n    {\r\n        using (await this.LockAsync())\r\n        {\r\n            current = this.cachedProcess;\r\n            if (current == null)\r\n            {\r\n                current = await this.createProcessAsync();\r\n                this.cachedProcess = current;\r\n\r\n                current.Faulted += this.OnProcessFaulted;\r\n                if (current.IsFaulted)\r\n                {\r\n                    this.cachedProcess = null;\r\n                }\r\n            }\r\n        }\r\n    }\r\n\r\n    return current.Id;\r\n}\r\n<\/pre>\n<p>That concludes the removal of the fourth and (I think) final harmful race condition. In practice, it is difficult to be <em>certain<\/em> that your code will operate correctly in all possible concurrency situations. Armed with a few design strategies and cooperative test doubles, however, your async code can be shown &#8220;<a href=\"http:\/\/martinfowler.com\/bliki\/SpecificationByExample.html\">correct by example<\/a>&#8221; in a straightforward way.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many a programmer has struggled with unit tests and those pesky race conditions that are seemingly immune to them. Is it even possible to verify concurrency correctness using TDD? I am going to tell you that it is &#8212; sometimes.&hellip; <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[21,61,91,41],"tags":[],"class_list":["post-3541","post","type-post","status-publish","format-standard","hentry","category-async","category-concurrency","category-design","category-tdd"],"_links":{"self":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/3541","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3541"}],"version-history":[{"count":0,"href":"http:\/\/writeasync.net\/index.php?rest_route=\/wp\/v2\/posts\/3541\/revisions"}],"wp:attachment":[{"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3541"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3541"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/writeasync.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3541"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}