Handling Intermittence: How to Survive Test Driven Development

Overview

The bane of any team that uses test-driven development is the intermittent test. If your engineering process doesn’t address them, they’ll slow you to a crawl and destroy your morale. In this post, find out what an intermittent test is, why it’s dangerous, and how you can fight back.

What’s an intermittent test?

Your team is practicing test-driven development. At the beginning, it’s going great. You’re more agile than you ever imagined. Everyone writes tests, tests pass, and everyone’s confidence level in the codebase is high. When someone accidentally breaks something that has test coverage, a test fails, and they can rerun the test to see what they broke and how to fix it.

You’re in flow. You wrote some code, with proper test coverage, and now you’re ready to commit. Your tests pass locally. Then you commit your new code to trunk, trigger a test build, and a test fails.

Now something weird happens. Your first step in investigating is to rerun the test that failed, but this time it doesn’t fail. What? Then why did it fail in the first place? You poke around for a while, but since you have no reproduction path, you can’t figure out why the test failed. You shrug, start a new build, and this time, the everything turns green.

If this doesn’t sound familiar, I promise, it’ll happen to you soon.

No big deal, right? Something borked for a second, but you fixed it by restarting. Same advice tech support gives to customers. Everything’s back to green, everyone’s unblocked, and process resumes as normal.

This approach will not scale. That test is intermittent, and someday it’s going to fail again.

Why it’s dangerous

In the above story, when the test failed, you paid two costs. First, you spent time investigating the failure and waiting for the new build. Second, you lost (a little) confidence in your tests. You will pay both of these costs again every time an intermittent test fails.

As you begin to practice TDD on a grand scale, you’ll be adding many, many tests. No matter how good you are at testing, some of them will fail one time per million or more. If you don’t find and fix the causes of intermittent failures, the number of intermittent tests in your codebase will strictly increase. Thus, so will the chance that any given build will fail. You pay the time cost and the confidence cost again and again, with increasing frequency. Eventually your builds will fail every other time for no reason, and nobody will trust tests. Every engineer loses an ever-increasing percentage of his or her time and concentration finding incessant meaningless test failures.

How you can fight back: organization-level changes

Every time a test fails intermittently, somebody commits to figuring out why it failed and getting it to a state where he or she believes it will not fail intermittently again. The details of your process may vary, but at some point somebody has to make this commitment, or your team will eventually fail.

When you fix an intermittent test, don’t just fix the test you worked on. Given that every intermittent test will eventually fail, the end state of this system is you fix intermittent tests the same rate as you write them, which is faster than you think. Given the high difficulty and cost of debugging a failure you can’t reliably reproduce, this solution, while scalable, will be crying out for improvement.

Every time an intermittent test fails, you should commit not just to fixing the test but to learning a lesson. Find out why the test was intermittent, fix all tests that have the same flaw, and make it impossible for future tests to have the same problem. Do deep root-cause analysis. For more on this, see Eric Ries on Five Whys.

How you can fight back: an intermittent test is like a crime scene

Credit where credit is due: this section is cribbed from an IMVU internal document that was written by Eric Prestemon.

First of all, a failed intermittent test leaves behind evidence. It’s like a crime scene. You want to reconstruct the events of the failure exactly as they happened, in every detail. You can’t do this by just re-running the test; it’s an intermittent, after all.

In the case of crashes, try to figure out the last line that was run. Did all of the stuff that happens for every test run? Everything in the test’s setup? OK, now line by line in the test, do you have evidence either way that a given line was executed? This takes time, but it will usually narrow things down so you can focus on where the test is crashing.

In the case of assertion errors, the data that was wrong was usually pulled from somewhere (a web get, memcache, a database). It should still be there for you to study. You then work backwards to figure out how the wrong data got there (it may be from way earlier in the test, or even leftover from another test).

In either case, rerunning the test over and over to see what happened is worse than useless. It’s like like trying to solve the crime at your crime scene by walking back and forth all over the crime scene for two hours and then giving up because you haven’t been mugged. All you have done is destroy all the evidence that was left for you to work with when you started.

How you can fight back: classes of intermittent tests

At IMVU, we’ve found that most intermittent tests fall into one of several categories. Here they are, along with some guidelines for diagnosing and fixing them.

  • Hidden dependence – Tests can have hidden data dependence.  If there’s a test in your codebase that depends on condition X, but it’s not explicitly setting condition X, then it will fail whenever condition X doesn’t happen to be set.  This can be happen when a test slowly uses up its entire solution space, or when a previous test explicitly unsets condition X, etc.  The bottom line here is that you need to force all your preconditions to be true for every test, then assert that they’re true before running the code that’s being tested.
  • Race conditions – If you’re testing in an environment that has multiple threads or multiple processes, you’re going to have races.  A good example of this is Selenium tests, which drive an actual browser.  In these, your selenium test runner will often be racing the javascript that needs to load up to handle the page.  Warning!  If these race conditions are in the code you ship, then your tests are intermittent because your product is intermittent, and you need to fix your product, not your tests.  Assuming this isn’t the case, your product is racing your tests.  Figure out what your tests need to wait for, and add waitFor statements.  You may need to add hooks for them to wait on.  DO NOT have your tests sleep for fixed amounts of time.  The test is still intermittent, albeit less so, and now it’s slower.
  • Time sensitive tests – Often a test author doesn’t consider what’s going to happen if the system clock passes midnight during the test.  The test started running on Tuesday and finished on Wednesday, but the test assumed that the date wouldn’t change over the course of the test.  Tests are also notorious for failing during the extra hour of daylight savings time, or on February 29th.   You need to find a way to mock time out, and have control over what time your test thinks it is.
  • Unstable 3rd party software – If you integrate with third party code, like Apache or memcached, you won’t have the ability to modify their source code.  Their bugs are your bugs, and you can’t fix them except possibly by upgrading.  You have to fix these by poking at the 3rd party’s application like a black box.  Try to figure out what makes it die, and avoid those cases.  3rd party failures are the only class of failures that you may eventually have to shrug and give up on – when this happens, as a last resort, make a list of which 3rd party failures are currently considered unfixable.
  • Intentional randomness in code– If your codebase is ever expected to do anything random, then this introduces randomness in all tests touching that area of the code.  These dependencies can be indirect and difficult to detect.  A common cause of this is A/B testing.  This is simply inexcusable in tests.  Find and mock out every random function call, so that your tests never get to them.  Write error-checking code that will fail as loudly as possible as early as possible every time a test does something random.

12 Responses to Handling Intermittence: How to Survive Test Driven Development

  1. […] software and web developers doing agile development, classes of Intermittent test: how to handle intermittent tests in test-driven development SubscribeDiggdel.icio.usFacebookRedditStumbleUpon  |  (No Ratings […]

  2. Fidel says:

    Very, very interesting. At my organization we’re starting an approach to Test Driven Development, I hope this helps us to identify these pitfalls early.

    I think “Unstable 3rd party software” should deserve a punctualization. The cited examples, Apache and memcached, are open source, so you don’t have to treat them like a “black box”, you could debug the internals and even find the bug in the original source code and fix it yourself. Of course I concede that this is quite rare and most people don’t/can’t do it, but the possibility is there with free software, unlike propietary software.

  3. […] les développeurs web et développeurs de logiciels, des classes d’échecs de tests intermittents, pour le développement agile SubscribeDiggdel.icio.usFacebookRedditStumbleUpon  |  […]

  4. joblivious says:

    That’s a good point, Fidel. You could fix 3rd party software yourself, if it’s open source. This has another hidden cost beyond the labor: now you’re running a branch of their software instead of the canonical version, so you will likely have to throw your labor away before you can apply any of the original author company’s updates.

    We’ve tried both approaches, and each has its merits. It has to be a case-by-case judgment call.

  5. Ben McGraw says:

    After time, “intermittent” will become a curse word.

    E.G., Those intermittenting intermitters in washington really know how to intermit things up.

  6. We have been fighting this battle in Mozilla for quite some time now. We have tens of thousands of tests now, running across three platforms (Windows, OS X, Linux), and the machines running the tests are VMs, so they’re very sensitive to timing issues. You can see the huge list of dependent bugs over here:
    https://bugzilla.mozilla.org/show_bug.cgi?id=438871

    Thanks for posting this, this is valuable stuff! We as an organization need to get to a point where we have zero tolerance for intermittent tests.

  7. […] Handling Intermittence: How to Survive Test Driven Development Overview The bane of any team that uses test-driven development is the intermittent test. If your engineering process […] […]

  8. […] behavior. Software that randomly does different things is intermittent software, which means it has intermittent tests. The tests that directly assert on the behavior of the experiment can be written such that fake […]

  9. […] So how does that work with test-driven development (TDD)?  If you write functional tests around your use of the third party’s system, they will fail whenever the third party’s code fails.  That makes them intermittent tests. […]

  10. I noticed that this is not the first time at all that you mention the topic. Why have you chosen it again?

    • joblivious says:

      Thanks for noticing! I guess the real reason is because I deal with intermittent tests so often and nobody seems to write about them online.
      The first post was more about getting A/B experiments to work with automated tests, but I felt that intermittent tests were worth their own article, since there appeared to be no resources anywhere on the internet on the subject.

      • georgejo says:

        That’s odd because while I’ve never yet tried formal TDD, “intermittant problems” was the first thing I googled. Often while coding directly, we can code defensively. Scale and forcing people to relearn are definitely concerns I have. I worked on a project with 1200 unit test cases. Said code was complicated and it was full of “intermittant problems”. The application was real time, coding “directly” not TDD. The system in question was too. complicated for any one person to understand, long story. Call processing is complicated, …

Leave a reply to Ben McGraw Cancel reply