The bane of any team that uses test-driven development is the intermittent test. If your engineering process doesn’t address them, they’ll slow you to a crawl and destroy your morale. In this post, find out what an intermittent test is, why it’s dangerous, and how you can fight back.
What’s an intermittent test?
Your team is practicing test-driven development. At the beginning, it’s going great. You’re more agile than you ever imagined. Everyone writes tests, tests pass, and everyone’s confidence level in the codebase is high. When someone accidentally breaks something that has test coverage, a test fails, and they can rerun the test to see what they broke and how to fix it.
You’re in flow. You wrote some code, with proper test coverage, and now you’re ready to commit. Your tests pass locally. Then you commit your new code to trunk, trigger a test build, and a test fails.
Now something weird happens. Your first step in investigating is to rerun the test that failed, but this time it doesn’t fail. What? Then why did it fail in the first place? You poke around for a while, but since you have no reproduction path, you can’t figure out why the test failed. You shrug, start a new build, and this time, the everything turns green.
If this doesn’t sound familiar, I promise, it’ll happen to you soon.
No big deal, right? Something borked for a second, but you fixed it by restarting. Same advice tech support gives to customers. Everything’s back to green, everyone’s unblocked, and process resumes as normal.
This approach will not scale. That test is intermittent, and someday it’s going to fail again.
Why it’s dangerous
In the above story, when the test failed, you paid two costs. First, you spent time investigating the failure and waiting for the new build. Second, you lost (a little) confidence in your tests. You will pay both of these costs again every time an intermittent test fails.
As you begin to practice TDD on a grand scale, you’ll be adding many, many tests. No matter how good you are at testing, some of them will fail one time per million or more. If you don’t find and fix the causes of intermittent failures, the number of intermittent tests in your codebase will strictly increase. Thus, so will the chance that any given build will fail. You pay the time cost and the confidence cost again and again, with increasing frequency. Eventually your builds will fail every other time for no reason, and nobody will trust tests. Every engineer loses an ever-increasing percentage of his or her time and concentration finding incessant meaningless test failures.
How you can fight back: organization-level changes
Every time a test fails intermittently, somebody commits to figuring out why it failed and getting it to a state where he or she believes it will not fail intermittently again. The details of your process may vary, but at some point somebody has to make this commitment, or your team will eventually fail.
When you fix an intermittent test, don’t just fix the test you worked on. Given that every intermittent test will eventually fail, the end state of this system is you fix intermittent tests the same rate as you write them, which is faster than you think. Given the high difficulty and cost of debugging a failure you can’t reliably reproduce, this solution, while scalable, will be crying out for improvement.
Every time an intermittent test fails, you should commit not just to fixing the test but to learning a lesson. Find out why the test was intermittent, fix all tests that have the same flaw, and make it impossible for future tests to have the same problem. Do deep root-cause analysis. For more on this, see Eric Ries on Five Whys.
How you can fight back: an intermittent test is like a crime scene
Credit where credit is due: this section is cribbed from an IMVU internal document that was written by Eric Prestemon.
First of all, a failed intermittent test leaves behind evidence. It’s like a crime scene. You want to reconstruct the events of the failure exactly as they happened, in every detail. You can’t do this by just re-running the test; it’s an intermittent, after all.
In the case of crashes, try to figure out the last line that was run. Did all of the stuff that happens for every test run? Everything in the test’s setup? OK, now line by line in the test, do you have evidence either way that a given line was executed? This takes time, but it will usually narrow things down so you can focus on where the test is crashing.
In the case of assertion errors, the data that was wrong was usually pulled from somewhere (a web get, memcache, a database). It should still be there for you to study. You then work backwards to figure out how the wrong data got there (it may be from way earlier in the test, or even leftover from another test).
In either case, rerunning the test over and over to see what happened is worse than useless. It’s like like trying to solve the crime at your crime scene by walking back and forth all over the crime scene for two hours and then giving up because you haven’t been mugged. All you have done is destroy all the evidence that was left for you to work with when you started.
How you can fight back: classes of intermittent tests
At IMVU, we’ve found that most intermittent tests fall into one of several categories. Here they are, along with some guidelines for diagnosing and fixing them.
- Hidden dependence – Tests can have hidden data dependence. If there’s a test in your codebase that depends on condition X, but it’s not explicitly setting condition X, then it will fail whenever condition X doesn’t happen to be set. This can be happen when a test slowly uses up its entire solution space, or when a previous test explicitly unsets condition X, etc. The bottom line here is that you need to force all your preconditions to be true for every test, then assert that they’re true before running the code that’s being tested.
- Time sensitive tests – Often a test author doesn’t consider what’s going to happen if the system clock passes midnight during the test. The test started running on Tuesday and finished on Wednesday, but the test assumed that the date wouldn’t change over the course of the test. Tests are also notorious for failing during the extra hour of daylight savings time, or on February 29th. You need to find a way to mock time out, and have control over what time your test thinks it is.
- Unstable 3rd party software – If you integrate with third party code, like Apache or memcached, you won’t have the ability to modify their source code. Their bugs are your bugs, and you can’t fix them except possibly by upgrading. You have to fix these by poking at the 3rd party’s application like a black box. Try to figure out what makes it die, and avoid those cases. 3rd party failures are the only class of failures that you may eventually have to shrug and give up on – when this happens, as a last resort, make a list of which 3rd party failures are currently considered unfixable.
- Intentional randomness in code– If your codebase is ever expected to do anything random, then this introduces randomness in all tests touching that area of the code. These dependencies can be indirect and difficult to detect. A common cause of this is A/B testing. This is simply inexcusable in tests. Find and mock out every random function call, so that your tests never get to them. Write error-checking code that will fail as loudly as possible as early as possible every time a test does something random.