In the last post, we began focusing on testability with COM interface stubs. It was a lot of work, and a good reminder of why backfilling test coverage via a test-last approach is not ideal. Still, it shows that through a largely mechanical process, you too can build a nearly complete test suite for those ugly legacy COM apps.
Now, we’ll finish up the tests. The procedure is not much different than before, so we’ll just highlight a few changes that deserve further explanation.
When building the test for add_time_trigger
, the stub data started to require information for multiple layers. For example, we needed to control ITimeTrigger::put_Id, which is called on a trigger created by ITriggerCollection::Create, the collection for which was returned from ITaskDefinition::get_Triggers. We’re three levels deep, and the single stub data struct we have now looks like this:
struct Data { HRESULT get_RegistrationInfo_result{}; HRESULT put_Author_result{}; HRESULT get_Principal_result{}; HRESULT put_LogonType{}; HRESULT get_Settings_result{}; HRESULT put_StartWhenAvailable_result{}; HRESULT get_IdleSettings_result{}; HRESULT put_WaitTimeout_result{}; HRESULT get_Triggers_result{}; HRESULT ITriggerCollection_Create_result{}; HRESULT ITimeTrigger_put_Id_result{}; };
The hint that something is not quite right is in the artificial namespacing we’re doing in those last two fields (“ITriggerCollection_Create”, “ITimeTrigger_put_Id”). Let’s try listening to our tests but use that feedback to fix the monolithic test design. The solution, obvious in retrospect, is to instead have data structs for each stub, and compose them one level at a time:
Stub::TaskDefinitionData data{ .get_Triggers_result = E_FAIL, .TriggerCollection = { .Create_result = E_FAIL, .TimeTrigger { .put_Id_result = E_FAIL, .put_EndBoundary_result = E_FAIL, .put_StartBoundary_result = E_FAIL, }, }, };
As the tests started to incorporate this multi-level logic, the next challenge was ensuring the behavior was realistic. In the real Task Scheduler API, if we ask for task settings, we’re going to get the same ITaskSettings instance each time. But the test stub violated this expectation (we’ll file this under principle of least surprise) by unconditionally creating a new object on every call. Since our stubs already track the pointers to every returned object, it’s an easy fix to avoid recreating them. Note however that this new, more accurate-to-reality behavior requires the tests to cope with partial state changes in the face of intermediate failures, e.g. if IActionCollection::Create succeeds but the inner call fails:
ASSERT_THROW(task.add_exec_action(L"X:\\act2.exe"), wil::ResultException); auto expected = L"<Task>" L"<Actions>" L"</Actions>" L"</Task>"; assert_xml(task, expected); data.ActionCollection.Create_result = S_OK; ASSERT_THROW(task.add_exec_action(L"X:\\act3.exe"), wil::ResultException); expected = L"<Task>" L"<Actions>" L"<ExecAction></ExecAction>" L"</Actions>" L"</Task>"; assert_xml(task, expected);
Here we have a partially created IExecAction in our list with no path defined. This is theoretically what should really happen with the Task Scheduler API, though it might be an undesirable side effect. Seeing this true-to-life but arguably “quirky” behavior asserted in the test, we could opt to build a more transactional API that would attempt to clean up these botched objects. But we’ll simply document the quirk (in the “tests as documentation” sense) and leave the fixes for another time.
The final test we needed was in the TaskService facade. It is an entry point of sorts which clients are expected to use to begin working with the Task Scheduler. But we have a snag here, as there is some code we simply cannot run in our test:
TaskService TaskService::connect() { TaskService service(wil::CoCreateInstance<ITaskService>(CLSID_TaskScheduler, CLSCTX_INPROC_SERVER)); THROW_IF_FAILED_MSG(service.m_service->Connect({}, {}, {}, {}), "ITaskService::Connect failed"); return service; }
Do you see it? We are trying to call CoCreateInstance, passing the actual Task Scheduler CLSID. Short of using some magical mocking framework to intercept this Win32 API call, we have few good options to handle this given the code as-is. But there is a minor change we can make which will still achieve the needed encapsulation without ruining the interface (we wouldn’t want callers to be stuck with a constructed but unconnected client). We can make the constructor protected rather than private, and move the connect logic there. Now we can just ignore the connect
named constructor and write our test in terms of a test derived class. It’s not a perfect solution — we have a bit of untested code here — but it hits our general testability targets without substantially (some would say unnecessarily) complicating the design.
So what did all of this cost us? Let’s go by the numbers, using bytes of source code as our useless metric of the day:
- 8.9 KB of production code
- 1.9 KB of application code (exe)
- 1.7 KB of headers (inc)
- 5.3 KB of library code (lib)
- 100 KB of test code
- 38 KB of direct test logic
- 62 KB of generic stubs
That does seem a bit out of proportion, but the numbers can easily be explained by the sheer breadth of the interfaces we are forced to use (many of which have 10+ methods). Even if our client code only requires two or three method calls, every stub must be fully built up from the entire interface definition. If we really had to game this metric, we could choose not to count the generic stubs at all — put them in some common library or try to generate them maybe? But any way you slice it, test code seems to trend larger than the product under test, especially when trying to achieve high coverage.
The conclusion? Writing fully tested COM code may be difficult but it is possible, and all you need are a few simple, mechanical recipes. Start typing!