Oversubscribe now!

For small chunks of compute-bound work that must be offloaded to the background, you can choose from several APIs and patterns in .NET. In code written before .NET 4.0, you would probably use ThreadPool.QueueUserWorkItem. In modern day apps, you might instead reach for Task.Factory.StartNew.

Here is a simple method demonstrating the latter approach:

private static void RunTasks(int taskCount, bool longRunning)
{
    Task[] tasks = new Task[taskCount];
    Stopwatch stopwatch = Stopwatch.StartNew();

    using (CountdownEvent count = new CountdownEvent(taskCount))
    {
        Action action = delegate
        {
            Console.WriteLine("[{0:000.000}] {1} running.", stopwatch.Elapsed.TotalSeconds, Task.CurrentId);
            count.Signal();
            count.Wait();
        };

        for (int i = 0; i < taskCount; ++i)
        {
            tasks[i] = Task.Factory.StartNew(action, longRunning ? TaskCreationOptions.LongRunning : TaskCreationOptions.None);
        }

        Console.WriteLine("[{0:000.000}] All tasks ({1}) scheduled.", stopwatch.Elapsed.TotalSeconds, taskCount);
        Task.WaitAll(tasks);
        Console.WriteLine("[{0:000.000}] Done.", stopwatch.Elapsed.TotalSeconds);
    }
}

Pay close attention to the longRunning flag. If you pass false, the sample output looks like this for 1000 tasks (note that the specific results will vary depending on how many cores your system has):

[000.005] 4 running.
[000.005] 8 running.
[000.005] 9 running.
[000.005] 10 running.
[000.006] 12 running.
[000.005] 11 running.
[000.005] 1 running.
[000.005] 3 running.
[000.005] 2 running.
[000.005] 5 running.
[000.005] 7 running.
[000.005] 6 running.
[000.005] All tasks (1000) scheduled.
[001.004] 13 running.
[002.005] 14 running.
[003.004] 15 running.
  . . .
[987.004] 999 running.
[988.005] 1000 running.
[988.011] Done.

The first few tasks are quick to start but begin lagging behind, with the final one running 988 seconds later. What’s going on? Really, this is just the thread pool doing its normal job. My system has 12 logical cores so it can quickly assign tasks to threads up to this amount. When faced with oversubscription — more tasks to run than cores available — it prefers to wait to create more threads, since it is likely that existing threads will free up in a typical application workload. In my pathological case, the existing worker threads are never relinquished, so in practice it waits one second and then allows new thread pool threads to spin up to handle the new tasks.

If you need to force more responsive oversubscription behavior, you can use TaskCreationOptions.LongRunning option as shown in the sample code above when longRunning is true. The output in that case for 1000 tasks looks like this:

[000.005] 1 running.
[000.005] 5 running.
[000.005] 4 running.
[000.005] 9 running.
[000.005] 10 running.
[000.005] 3 running.
[000.005] 2 running.
[000.006] 11 running.
[000.006] 12 running.
[000.005] 6 running.
[000.005] 7 running.
[000.006] 13 running.
[000.006] 14 running.
[000.005] 8 running.
[000.006] 15 running.
[000.006] 16 running.
[000.006] 17 running.
[000.006] 18 running.
 . . .
[000.290] 998 running.
[000.290] 999 running.
[000.290] All tasks (1000) scheduled.
[000.290] 1000 running.
[000.316] Done.

The LongRunning option essentially bypasses the thread pool and explicitly creates a new thread for each task. The tasks now start almost immediately and the whole batch is running in a matter of milliseconds.

While it seems that the second example “solves” the obvious latency issues, it is not a great approach for general scalability. A well-designed system which deals with somewhat unpredictable dynamic workloads must limit the amount of active threads in play to minimize overhead such as context switches. James Rapp has a helpful visualization of this in his post “Oversubscription: a Classic Parallel Performance Problem.” And as Brad Werth of Intel points out in “Getting the Most from your Middleware”, work pulling (akin to the use of async tasks and continuations in .NET 4.5) and work pushing (like in the ThreadPool model) are often better patterns for achieving consistently higher performance.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.