Test environments are a liability. Sometimes they’re so good, they’re bad. Though most of the time they’re just bad. So what then, do we just test in production exclusively? Obviously it’s not that simple and there is a more nuanced viewpoint behind my rhetoric.
Sometimes a test environment is so well-maintained and accessible that it is an absolute joy to use. The hardware is as new — or newer — than what is currently in production. Reserving capacity is quick and easy, and there is low contention so no one has to wait long for their turn. In this situation, the average engineer won’t hesitate to deploy and run thorough integration tests using the development builds they produce as part of their daily rhythm.
Why is this a bad thing? In practice, I see two big problems here. One is the cost of maintaining such an environment. Environments like this require lots of people and lots of hardware which means lots of money. It is also pretty common to see separate teams involved in either side of the test/production split which further reduces the economy of scale. In general, you want to sell the capacity you build out to people who will pay you, and no one (directly) pays for a test environment.
The other factor is the seemingly paradoxical result of more testing leading to lower quality. The “in the large” style of integration testing that typically occurs in a high-fidelity test environment can easily become a substitute for the “in the small” basic design and correctness validation that should have occurred long before this stage. Even if you can avoid the common pitfalls of overzealous integration testing, the test environment is another stall in the delivery pipeline. More time spent in the test environment means less time for the more realistic and arguably more meaningful feedback that can really only start once you hit production.
Still, testing everything in production is clearly a non-starter. There are some system components that need validation which can only come from an environment that we can be sure has no customer exposure. A few examples might be device drivers for completely new hardware SKUs, sensitive areas like billing and payment systems, validation of disaster recovery procedures, and the like. A test environment can be very helpful on occasion, depending on what we mean by “test environment.”
An environment that is built out in production, using production hardware and resources, but not yet available to paying customers can be a test environment. A single machine from a production datacenter taken out of production rotation can be a test environment. A one-box version of the complete software system can be a test environment.
There are many ways to destroy the bad, time-and-money-losing test environments without destroying the legitimate ability to test what is needed. Consider a production-centric approach wherever possible.