The Hidden Power of Testing
Originally posted on Waterbearlang blog.
Not along ago, when programming was just a hobby, I thought there was nothing more to testing code than:
- writing some code
- checking that it compiles
- running it
- checking that it did what you expected it to do.
And if steps 2–4 failed, go back to 1 and try again.
With a little more experience in programming and having had hands on experience working on various projects with other actual human beings, I have a greater appreciation for testing and the role it plays in development. For the Waterbear team, I was responsible for setting up our unit tests and our continuous development environment. As well, I designed acceptance tests for developing the debugging feature.
This post is for the beginner or the intermediate programmer. Testing—whether it’s unit tests or acceptance tests—is not only a way of making sure your code works—it’s also a way to discover how to design software.
And because I cannot imagine any topic quite as dry as “Eddie extols the virtues of testing” I’ll be filling this post up with tangentially related animated GIFs.
So let’s get started!
But Eddie! What is testing?
The word “testing” itself is pretty intuitive. You have a thing. You check if it works. Boom. Testing! Let’s go for a celebratory pint!
But what you’re really doing is ensuring the quality of the program. Testing is not only checking that the code works, but that it works well, that the software behaves as expected, and—and this one takes a while to learn—that the code is resilient to future modifications. Resilient, like the extremophiles of the phylum tardigrada, the hearty waterbears!
What I’ve come to find is that this process of defining and having a clear vision of what it means for your code to be right, gives you a clear vision of how to write your code right. It might just be that when you take on a project, you have no where to start. The task of programming may seem daunting, but it is more so when there’s no clear place to begin. In this case… may I suggest testing?
Despite the abundance of methods, methodology and testing fanboys (yes, even more than myself), there is no one way to do testing The Right Way™. The right way is the way that benefits you, the programmer. That said, testing does comes in many forms, and there a lot of buzzwords and funny language that people speak in when talking about testing. I’m going to try to clear up a few major concepts: unit testing and acceptance testing.
Unit testing!
This is probably what most people refer to when they’re talking about “the tests”. It might be that you think unit testing is the only kind of testing that is considered to be Real Testing™ (spoilers: nope).
Unit tests are all about:
- breaking down your program into little tiny, self-contained pieces (the so called units).
- and making sure those itty-bitty pieces work well on their own.
The astute reader may notice something: this assumes that the problem can be nicely broken down into little tiny, self-contained pieces. For a number of reasons, this is not always an easy thing to accomplish, and the benefits of breaking down your program into testable units may not fully justify the effort. For example, if your program needs to deal with external systems, you have to go through the effort of emulating the external system and all of its possible conditions, which is not unheard of, but certainly requires considerable effort. Nevertheless, unit testing is must have in your testing arsenal.
How do you do it?
Unit testing is so prevalent, most language have standard or de facto
ways of unit testing. For example, Java has jUnit, Python has the
unittest
module, and for client-side JavaScript, there are
many options, including qUnit. Basically, if you want a unit testing
framework look for a thing for your language that ends in
“unit”. Of course, these aren’t the only options, but without
putting much effort, you’ve probably got something that will easily
integrate with your system.
But that misses the point. The most important part of unit testing is identifying the units. You gotta find the smallest pieces of whole cohesive units of code and only then can you assert that these units work. As long as you know the small parts work, then you’ll have the confidence to assemble them into a larger system. But, er… how do you identify the units?
Maybe the better option is figure out what what’s easy and obvious to test before you write the code for it. Then, it becomes clear what you need to code, and what is a good chunk of code to call a “unit”.
Take this actual example of actual Python programming that I actually
wrote. For this script, I wanted to do a syntax check on Python
files that it downloaded. So I started defining a function along with
its docstring. Docstrings, for those uninformed, are a super nifty
language feature in Python. Wanna document a function, class, or module?
Then the first line of said function, class, or module should be
a string which will automatically become associated with the __doc__
special attribute of that object. This is super rad, you guys! But it
gets even radder.
def syntax_ok(contents):
"""
Given a source file, returns True if the file compiles.
"""
Okay, so we’ve got the purpose down. That’s one step of the process of
understanding what task you have to accomplish as a programmer. But how
will you know that this works? Enter the doctest: beginning Python
programmers are familiar with the interactive console where
they can try lines of code and see their result. The prompt for the
interactive console is three right-facing arrows: >>>
. Say I’m
testing the completed syntax_ok()
function. A sample session in the
interactive console may look like this:
>>> syntax_ok('print("Hello, World!")')
True
>>> syntax_ok('import java.util.*;')
False
>>> syntax_ok('\x89PNG\x0D\x0A\x1A\x0A\x00\x00\x00\x0D')
False
>>> syntax_ok(r"AWESOME_CHAR_ESCAPE = '\x0G'")
False
So, why not put what you expect to happen on the console into your documentation and call it test? That’s exactly what a doctest is. Now we know what we want to both in human terms, and what we want to accomplish in computer terms—in this case, the return value of this function. Additionally, since, we embedded the test in the documentation, we have precise documentation for how to use the function in the future. And if it passes the tests, then we know our documentation is correct. I told you it got even raddererer! Putting it together, it looks like this:
def syntax_ok(contents):
r"""
Given a source file, returns True if the file compiles.
>>> syntax_ok('print("Hello, World!")')
True
>>> syntax_ok('import java.util.*;')
False
>>> syntax_ok('\x89PNG\x0D\x0A\x1A\x0A\x00\x00\x00\x0D')
False
>>> syntax_ok(r"AWESOME_CHAR_ESCAPE = '\x0G'")
False
"""
At this stage, I knew what I wanted, but didn’t know how to accomplish it, so I just looked up appropriate documentation. But the important part was that I now knew where to start. I implemented it thusly:
try:
compile(contents, '<unknown>', 'exec')
# why does compile throw so many generic exceptions...? >.<
except (SyntaxError, TypeError, ValueError):
return False
return True
There we go! As soon as I implemented it, I had working tests that I could run like this1:
python -m doctest ghdwn.py
As with standard counterintuitive Unix fashion, no output means that my all of my tests passed!
But lo! As syntax_ok()
was being called in a long-running script that
checked the syntax of many, many, many Python files, an enormous flaw
soon became apparent. After a while, my script would crash with a
MemoryError
, indicating that my program had somehow run out of memory.
Evidentially, calling compile()
cached the results of compiling
code—code that I never used, since I merely called compile()
for its
side-effect of reporting whether a file contained syntax errors. As
a result, I had to fix this dreaded memory leak to stop randomly
crashing my long running script.
This is where having unit tests really pays off. The inevitable
occurred: I had to modify my code and had to make sure that it did
the same thing. Luckily, I could assert that my code behaved the same
since I had basic tests in place. Now all I had to do was figure out how
to patch that memory leak. It occurred to me that I can’t have a memory
leak in a process that exits immediately, so using some Unix voodoo,
I fork
‘d my process into a parent–the process expecting an
answer—and a child—the process that would compile the code, cache
the result and promptly exit, destroying along with it the cached
results of compilation. My finished product looked like this. Note the
doctest within the documentation:
def syntax_ok(contents):
r"""
Given a source file, returns True if the file compiles.
>>> syntax_ok('print("Hello, World!")')
True
>>> syntax_ok('import java.util.*;')
False
>>> syntax_ok('\x89PNG\x0D\x0A\x1A\x0A\x00\x00\x00\x0D')
False
>>> syntax_ok(r"AWESOME_CHAR_ESCAPE = '\x0G'")
False
"""
pid = os.fork()
if pid == 0:
# Child process. Let it crash!!!
try:
compile(contents, '<unknown>', 'exec')
except:
# Use _exit so it doesn't raise a SystemExit exception.
os._exit(-1)
else:
os._exit(0)
else:
# Parent process.
child_pid, status = os.waitpid(pid, 0)
return status == 0
In this way, my simple unit test helped me:
- Figure out what task I need to accomplish. This became my “unit”.
- Determine what would be the correct output of said unit.
- Document how to use my function.
- When it came time to change my function, ensure that its behaviour would stay the same.
How we use unit tests in Waterbear
In Waterbear, we use unit testing to ensure that underlying block implementations—the runtime functions—return the proper results. Since this code is the code that ultimately runs when a block is used in a Waterbear program, we must ensure that the behaviour remains consistent as development progresses. For this, we created a QUnit test suite, which can be run in a browser, such as Chrome or Firefox. In addition to testing in a browser, we can also run it in a headless browser like PhantomJS. This allows us to run tests in the command line, and even on a foreign server every time we update some code.
Enter continuous integration. Whenever we push code to GitHub, a worker on TravisCI clones a fresh copy of the new code and runs our unit tests. Whenever an update fails any of the tests, the team gets a notification. This lets us know that the update is definitely not quite ready yet, and allows us to take action into making sure the fresh code achieves our standards before we pull it into the working copy we share with our users.
One of my first tasks on Waterbear was setting up TravisCI for our unit tests. Much like a Pokémon trainer in Kanto, any open source project worth its salt is nothing without a veritable boatload of build badges. Obviously, my most important contribution to Waterbear was to place the much coveted build badge on our README.
It was a hard job, but I managed to pull it off.
A word on methodologies
Some people call what I did in my Python example test-driven development (TDD) or if you really want to be pedantic test-first development. Either way, your code lives to serve the test. Under this framework, your code’s only purpose is to ensure that the tests pass. Some people are really adamant about this process, and assert that the only way to know your code will end up working properly is if you write the tests first. I… remain skeptical. It’s certainly a nifty technique, and one I use often, but it’s not the only way to do things. Another method is acceptance testing.
Huzzah! Acceptance Testing!
Acceptance testing is simply defining the behaviour we expect, and under what circumstances should we say that a thing fulfils its duty. Wait… this is sounding familiar… didn’t we just talk about this? Well, kind of. While unit testing focuses on the smaller parts, acceptance testing is much more high level—often done by a living, breathing human, rather than run automatically by a continuous integration robot.
Testing doesn’t have to be automatic or focused on specific, small units of code. Definitely, it’s nice when we can test individual units of code—plus, it’s generally indicative of a cleaner, more modular design—but this doesn’t necessarily say anything about the ultimate awesomeness of the software. Besides, sometimes it’s just straight up difficult to break the code up perfectly in this way. It happens.
Don’t beat yourself up for it. There are a lot of design decisions to make and you’re never gonna make everything perfect; a 1-1 correspondence between code unit and function is not the ultimate goal of programming: it’s making a system that works! …uh. Whatever that means.
The point of acceptance testing is to define scenarios that define the requirements of your code, i.e., what your code must ultimately achieve to be considered “good”. You define any conditions that must be setup prior to the test, the tangible steps that a person has to undertake to make the scenario happen, and the criteria for saying “yep, this sure did work”. It’s like a checklist, that you check off all of the steps, and at the end, you know that the code works.
This is a relatively new concept for me, so I don’t apply any serious
formality to it. I did find myself using this in Waterbear recently to
determine whether I was writing the right thing for the Waterbear
debugger. Before I started writing any substantial amount of code for
the debugger, I had no idea where to start. I legitimately struggled for
a while with a file in my editor that just read // debugger
. It was…
embarrassing.
But a chance encounter with my UCOSP supervisor, Eleni Stroulia, reminded me about acceptance testing—something I had only ever practiced once. So I got to it! I looked at the informal list of requirements that we collected on our issue tracker. Then I edited these initial feature requests into a feature list that was a bit more fleshed out. After this, I got started writing the tests!
The template I followed contained the following sections:
- Setup
- How to get the system into a state necessary to begin the test. For the Waterbear debugger, most of these came along with an example script that would demonstrate the desired phenomenon.
- Preconditions
- Any special state that the system must be in prior to the test.
- Test
- The steps necessary that a user would do to accomplish the given task.
- Acceptance criteria
- This is the checklist: the list of things that should happen.
Of course, as this type of testing is usually performed by a human and not automatically by a computer, care should be taken in sucking out any subjectivity, vagueness, and ambiguity in the script.
Once everything has been defined, go do it! In my case, writing even half of the tests helped me think of the tasks that I had to accomplish and gave me a good idea of the architecture I had to build in order to reach that goal. After writing enough code to fulfil even one of these task, I’d go test it. And out of the process, I got this illuminating state diagram that displays all of the ways that a user can plausibly go through execution states—it turned out to be way more involved than I expected.
(Apologies for the criminal unfunniness of the last still image.)
The end result should be a list that should be clear to follow in order check that the debugger is working properly. And hey! We now have a clear (or rather, more clear) definition of what it means when we say “the debugger is working properly.”
The process of writing the acceptance tests gave me a fresh look at the problem I had, and allowed me to think, and visualize it in different ways; it allowed me to organize the complexity of the task. And for that reason alone, I’d recommend perhaps writing an acceptance test when you have a large task and no idea how to tackle it.
Conclusion
In the end, testing may seem like a mindless process—something that hoighty toighty software engineering types (of which I definitely am one) are always goading other coders into doing. But the fact is, despite the obvious motivation of “checking that it works right”, testing also yields a method for discovering how to solve a problem. And I think that’s pretty neat.
-
Bonus PROTIP: I like to automatically run my doctests whenever I save my work. I use
pytest
with thexdist
plugin. Install them like so:pip install pytest pytest-xdist
Then to start running tests continuously, I open a new terminal and type the following in the same directory as the file I’m working on:
py.test -f --doctest-mod
Alternatively, to save myself a bit of typing, I put this in my
.aliases
file (if you don’t know what this, you probably want to put this in your~/.bash_profile
):alias doctest='python -m doctest'
Which allows me to run the doctests of any file by simply:
doctest FILE.py