In the previous post, I introduced a simple distributed service and some considerations that might drive a test planning effort. In this initial drill down, I will take a look at the tradeoffs of testing this system “from up high” — i.e. broad coverage, larger scale, lower precision.
First, let’s talk principles. The expectation that we should typically bring to the table when designing higher level tests is that the underlying components are already properly validated at lower levels. From a pure cost/benefit standpoint, this makes sense. Subjecting a largely untested system to a barrage of high-level tests will likely uncover a variety of trivial issues. Certainly this is better than not finding them at all, but the effort required to do so will be much greater. This in turn steals time from more valuable testing activities which are a better fit for a higher level testing environment — load tests, end-to-end scenarios, and so on.
Unfortunately, practices in the real world of software do not always match our principles. There will no doubt be untested or poorly tested code at low levels, rearing its ugly head at inopportune moments. What is a pragmatic tester to do?
For one, we should realize that “high-level” is relative. As I alluded to in the previous post, starting from the statement “every test will require a production-scale environment” is not likely to be efficient or effective. It is perfectly reasonable for a high-level test to operate in a “one-box” environment where scale is minimized. (I will delve into this topic in a future post, but for now you can read up on what Google testers call “hermetic servers” for a good overview.) This will at minimum reduce the time needed to set up and tear down the system under test and offer some investigation wins — it’s much easier to attach a debugger to a process or two running in one place as opposed to a geo-distributed server farm.
With a few of these well-placed “middle ground” tests to bridge the gaps, the surrounding tests can now be more effective. So what exactly makes an effective high-level test? At its core, the test must be unapologetically high-level.
A test at the upper layers of the system which also tries to probe the depths of each component is ill-advised, at best. If it is not inordinately complex (e.g. due to the difficulties of coordinating and orchestrating every test action), it will be overly simplistic (e.g. due to reliance on a lock-step, no-deviation-from-the-norm execution sequence).
Let’s return to the distributed service example to illustrate these points. One requirement stated that the service should respond with an error if a client requests a nonexistent resource. Now imagine an end-to-end test where we perform these steps:
- Create resource ‘R’.
- Delete resource ‘R’.
- Request resource ‘R’; expect “resource not found” failure.
This falls into the overly simplistic category. The test actions are very linear and could only hope to expose surface-level issues in the service. Besides, we should have already covered this in at least our “middle ground” tests. We can do a lot better by thinking about more realistic scenarios which cover broader interactions, e.g.:
- There could be a race condition in the handling of simultaneous requests and deletion of resources.
- Requests for nonexistent resources may slow down other requests in the system.
- A request for resource that never existed may be handled differently than a resource that existed but was later deleted.
- Resources that were previously deleted and then recreated with the same name may be improperly handled.
All of these behaviors warrant investigation through unapologetic high-level testing — the kind that eschews “one input/one output” style results. Try running multiple clients in parallel against the same service and send thousands of conflicting requests; the minimum expectation is that the service should never crash or return malformed responses. You can do slightly better by splitting up clients into several groups, some of which will intentionally cause conflicts and some which never will. A “conflict client” in this test may see and tolerate error responses, but a “good client” should not.
These kinds of tests are imprecise and proud of it! If you want precision, you want a low-level test — stay tuned for the next post where I will talk more about that.