I heard a phrase I liked recently: software testing is ensuring goodness. [Alexander Tarlinder on Software Engineering Radio]

How do we ensure that our software is good, and further, what does good even mean? For a first approximation, let’s say that good software does what it’s supposed to.

How do we define what the software is supposed to do? That could mean it does what the design says it does, that it does what the user wants, or that it does what the user actually needs. This gives us the first few things to test against: we check that the design accurately reflects the users’ wants and needs, and that the software accurately reflects the design.

Of course, there’s a lot involved in writing software that isn’t generally covered in the design. Ideally we’ve specified how the program will react to any given class of input, but in practice users tend to do things that don’t make any sense. I had a bug recently that only appeared if the user opened a section of the page, clicked on a table row, opened a second section, opened a third section, hit the button to revert all the changes, reopened the third section, and then clicked a row in that section. There was certainly nothing in the design stating that if this particular sequence of events was to happen, the activity would not crash!

Ok, so that’s reaching a bit – of course we assume that the software shouldn’t crash (provided we’re not making a kart racing game). The design covers the expected responses to user stimuli, but we assume that it will not crash, freeze up, turn the screen all black, etc. Unfortunately, for a non-trivial piece of software the number of possible things to try quickly becomes exponential. How do we ensure that we’ve tested completely enough to have a reasonable chance of catching any critical bugs?

Finding the Bugs

Photograph of a mantis shrimp.

Photograph of a mantis shrimp. Public domain, National Science Foundation.

At some point, we have to determine how much effort to put into finding (or better yet, avoiding) bugs in our software. The more mission-critical the program is, obviously, the more time and money it’s worth investing into finding bugs and the less disruptive a bug has to be to be worth finding and fixing.

I’m a strong believer in the value of having a separate, independent quality assurance team to test the software. Testing is a completely separate skill from coding – rather than trying to build software that works, you’re trying to find all the ways that it could possibly break. So I think it’s valuable to have people skilled in the art of creative destruction, who can approach the software from the perspective of the end user, and who have the authority to stop code changes from moving forward if they believe those changes to be damaging to the quality of the code.

At the same time, there’s no guarantee that a few QAers will be able to try all the weird things that thousands of users might do once your software is out in the wild, which is why we also need code review (or PQA, programmer quality assurance). In code review we have a better chance of catching the one-time-in-a-million bugs that will never show up in testing and yet, somehow, will always pop up in the released code. One of the senior developers on my team was really good at this; I hated having him review my code because I knew he would nitpick every little thing, but I would still choose him to do PQA on my development for the same reason – he was really good at finding anything that might end up being a problem.

How to not code bugs

Speaking for myself, I’m not a fan of doing PQA – it gets boring really fast. Ironically, the better the code is the more boring PQA can be: with the junior developers there tends to be a lot of things you can make suggestions on, but when the code you’re looking at is very well done, you can spend an awfully long time examining it without finding anything wrong, and it takes an effort to get more in depth and concentrate more on finding subtle logic errors and race conditions rather than correcting bad habits and obvious errors in less developed code. Not that you don’t look for the subtle errors in the less developed code too, of course, but you’re not spending an hour looking through the code without finding anything.

On the other side of that, of course, you want to be the person writing the code where your PQAers can’t find anything to complain about. I have not yet figured out how to do this – not even close – but there are a few things that help.

  • Testing. Ok, this one is obvious, but it can be surprising how often someone makes a very minor change and then doesn’t go back and test afterwards. Even a change that can’t possibly break anything, often does.
  • Following standards. My team has a set way of doing things, and yours probably does as well. Why? Because we’ve broken stuff in the past, and figured out how to not break them the same way in the future.
  • Refactoring. See below.

So we’ve been doing unit testing for about the last year and a half, now. Most of the unit tests so far are in code that hasn’t really been updated a lot since the tests were written, so they aren’t yet serving one of the primary purposes of unit tests: letting the next person to work on your code (who might be you) know if he breaks it. What they are doing is making the code better.

When I’m writing unit tests for a function, sometimes I run into problems:

  • It’s not obvious what the function is doing. This generally means that the function (and possibly some parameters) needs to be renamed so that it’s very clear what will happen when it’s called.
  • The function is doing too much. In this case, I split it up into multiple functions – even when the subfunctions will only be called from one place. This gives me several functions that only do one thing, which makes it easier to verify that the logic behind each piece of functionality is correct.
  • The function depends on something being set prior to being called (generally uncovered by a crash when the unit test first calls it with no prep work). This is a good time to verify that under no circumstances can the function ever be called without its prerequisites ever being set (and possibly document the preconditions).
  • The function contains logic which is actually wrong. In this case, I’ve just uncovered a bug, and can fix it, before my code goes to testing.

Most of the time when I’m refactoring a function there aren’t actually any bugs present, yet… but conditions exist that make it easier for bugs to be inserted. By cleaning up the code, I then improve my chances of being able to make changes without introducing new bugs in the code. Further, because the code is now easier to read, not only are future changes less likely to break it but they’ll be made faster and with less frustration.

So how do we ensure goodness? Very carefully, I suppose. In my experience, the best way to do this is just to make it easy to do the right thing. Make it clear from the names what functions do, keep the code as simple and straightforward as possible, and be sure you understand exactly what the code is doing before you make changes (or, if you’ve written it, before you commit it).

Of course, having a can of RAID on hand never hurts.