One thing I noticed while researching back then which still remains true today is the relatively sparse selection of published resources on the topic. One of the first web search results on long-haul specifically is a write-up (which I referenced in my paper) about the approach as used in the Windows CE testing strategy. Repeating the search now, I did find one fairly comprehensive treatment of endurance testing by Steven Woody for a 2009 Sticky Minds article. (If I had been aware of this one at the time, I definitely would have referenced it; either I overlooked it or it was somehow not readily available on the web.) I suspect this is partly a terminology issue; in the industry at large, “long-haul testing” seems somewhat less common than terms like “soak testing” or “endurance testing”.
An aspect that I would give even more emphasis to today is the quality of the long-haul feedback loop. Broad, long-running tests are at their best when they act as gap detectors. As I said recently in “Destroy all* tests“, it is good to have tests which can periodically uncover new issues. It is decidedly not good, however, to see this and keep moving forward without making some sort of change in your testing strategy. Nearly any bug a long-haul test can find could also be found in a faster, more precise test — if only you knew to look. A single worrisome defect might open your mind to whole categories of issues which are simply awaiting the right set of unit tests to flush them out.
In my paper, I mentioned only in passing the execution environment of a long-haul test, giving a few examples of test topologies (multi-threaded, multi-machine, etc.). The Me of 2015 would say, loudly, do as much as you can as close to production as you can. In the more cloud-focused world, trying to maintain distinct and specialized test environments can be a liability. If your software already runs on a distributed server farm, it is worth exploring whether you can reserve real production capacity for validation purposes, e.g. by generating synthetic load. You would need a secure and reliable way of isolating the impact of this traffic and perhaps a partitioning strategy to ensure your test load lands on the right nodes. If using “real prod” is not in the cards, a fully maintained staging environment would be the next best option. Note that it is a good idea to treat stage just like prod and use all the same deployment, access control, and incident management strategies; you don’t want to go down the road of “it worked on my [environment]” only to find that everything is different (and broken) in the real world.
Would I still recommend long-haul testing? Absolutely! I would probably just “under-engineer” it a bit compared to how I described it in the paper, while stressing the above tenets.