Writing code that humans can understand

I’m often asked why I put so much emphasis on writing code that contains relevant comments, code that is readable, and code that is maintainable. After all, isn’t it enough to get the project to build properly and pass any required test cases? Who cares about comments or formatting? The compiler just throws them away. Who cares about variable names? The compiler just reduces them to memory addresses. Who cares if the code is easy to maintain? I’ll be long gone from this job before anyone has to look at the code.

Who is your audience?

Martin Folwer, arguably the father of code refactoring, said:

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”

The source code we write has two target audiences:

  1. The compiler (or interpreter or assembler), of course, which must understand the our code well enough to carry out our precise instructions, to correctly accomplish the task at hand.
  2. Actual human beings, including the future you, your boss, your team, your employees, potential future employers, and all the people who need to read, comprehend, fix, enhance, refactor, or otherwise maintain your code, now and far into the future.

We use the syntax and semantics of the programming language to express what we want the system to do. But the programming language is both very strict, in terms of its vocabulary and syntax, and very loose, in terms of how we can name entities, format the code, and annotate the code. Thus, in most general-purpose programming languages, a developer can follow all the rules of the language to the letter, and still manage produce source code that’s virtually unreadable by any human being — even by the original developer!

How Not to Code

If you want to see examples of unreadable source code, have a look at the entries in the International Obfuscated C Code Contest. There you’ll find purposely clever abuses of the C language to produce working-but-virtually-unreadable source code. Clever, yes. Recommended for real-world production code, no.

Although the maintenance phase of a large software project typically accounts for around 80% of the overall cost of the project, developers who crank out non-human-readable code neglect this fact, incurring a huge technical debt that someone will have to pay later on. Sometimes that debt is ultimately paid by completely abandoning the code and starting over from scratch, if the code is simply too costly to maintain because no one can read it efficiently.

Excuses, Excuses

Here, in no particular order, are a few of the justifications I hear most frequently, from individual developers, team leads, and managers, that de-emphasize the importance of writing human-readable code:

  1. “Paying attention to readability takes too much time. We just need to get this beast working.” Beast is a great characterization. That’s who you’re listening to on your shoulder when you come up with an excuse like this one. The fact is that, while writing readable source code might feel awkward at first, it soon becomes second-nature, and the added cost becomes negligible. And even if there is a measurable up-front cost, it will be far outweighed by the resulting cost savings in the maintenance phase of the project.
  2. “Anyone who knows the language will be able to figure out what the code is doing.” I can’t completely disagree with this one, as far as it goes. The question is really one of cost. How much time and effort and frustration will the person have to invest and endure to reach that point of complete understanding?  How quickly will they reach a high level of confidence that their understanding is thorough enough to modify the code without breaking anything? Just knowing the syntax and semantics of the language doesn’t guarantee they won’t have to spend significant brain power to decipher what your code is trying to do.
  3. “No one but me ever has to read this code.” Unless you’re working on something trivial that will never see the light of day, this is the most misguided excuse of all. In the real world, code is reviewed, read, and maintained over time by many people. Every line of code you write is part of a message in a bottle that someone downstream is going to have to read and understand someday. And even if it’s true that you’ll be the one maintaining your own code, ask yourself this: “If I had to pick up this code seven years from now, having not looked at it in all that time, will I be able to immediately understand it and make changes to it without breaking anything?”
  4. “This is just a one-off program. It won’t ever need to be maintained.” This can end up being a silly prediction, on the order of “I think there is a world market for maybe five computers.” uttered in 1943 by IBM chairman Thomas J. Watson. You just never know if the simple one-off project you’re working on today might turn into a much bigger deal later. If a colleague or your boss gets wind of it, you could end up with an actual product on your hands. So, you need to code as if you’re developing something others will use and maintain.

You Just Never Know

When I was a software design engineer at Texas Instruments, I developed a small remote administration utility to save myself the time and effort of going across the street and suiting up to enter our semiconductor factory clean-room. I was young and innocent at the time, so I didn’t consider the possibility that someone else might benefit from it or want to extend it. Thankfully, the code I wrote was readable and easily extensible. I ended up getting a cost-savings award for its development, and it was used and extended by many people over a long period of time. You just never know the ultimate destiny of the little project you’re working on today.

What should this variable name be?

I teach and mentor many people who are learning to program — both college students and working engineers. One of the surprisingly common frustrations I see among new programmers is their struggle to choose good variable names. In fact, even among those who clearly have an aptitude for creative problem solving and writing otherwise clear and correct code, I hear complaints that it takes “way too much time, thought, and creativity” to come up with good variable names that tell the reader what the variables represent. As a result, they’ll often choose arbitrary single letters, arbitrary single letters followed by arbitrary numbers, or ridiculously random or silly names for their variables.

For example, I had one student choose the variable names artichoke, jaguar, and chocolate for variables that would contain a velocity, a launch angle, and Earth’s gravitational constant, respectively. I suggested using the names velocity, launchAngle, and g, the latter being the common mathematical symbol for the gravitational constant. “But that’s a single letter!” Well, single-letter variables are not inherently evil, as long as they aren’t just arbitrary, and as long as they make complete sense in the context of how the variable will actually be used. If you’re writing code that involves that quadratic formula, in which a, b, and c are commonly used, it’s perfectly alright to use a, b, and c as variable names. Likewise, using a variable such as i as a simple integer loop index is fine, but if you’ve got nested loops running through indexes in a two-dimensional array, don’t use i and j for row and column — use row and column, so that the purpose is clear to the human reader.

Think in terms of a human being reading the code out loud. What variable name will really help tell the story, making the purpose of the variable and the surrounding code crystal clear. Before you name a variable, actually take a moment to think about it. Using between one and three words, describe what will be stored in that variable? If you can come up with those words, then name that variable accordingly, using those words. If you can’t come up with the words, perhaps because the variable is being reused for different purposes, or because you don’t fully understand what the variable will be used for, it’s time to stop coding, step back, and think it through. Perhaps you need multiple variables, each with a specific purpose.

Good comments can enhance understanding

Good comments enhance the reader’s understanding of the code, the intent of the developer, and the bigger picture that might not be immediately evident just looking at the program statements. Comments that don’t do this should be avoided. For example, avoid adding comments that simply restate what the code is doing:

This comment might be useful to someone who is learning the language, but doesn’t add any useful information. A comment explaining why the velocity is being incremented would be more useful.

Likewise, comments that are out of date (i.e., not in synch with the current program statement) are worse than no comment at all, because they mislead the reader and contradict what the code is actually doing. It’s important to update the comments when you change the program statements.

But don’t comments slow things down?

In an interpreted language, comments can indeed make a program’s execution time slower, because the comments have to be parsed (and thrown away) every time you run the program. For example, if you have an interpreted BASIC program which contains many REM statements (comments) inside a loop, those REM statements have to be detected and ignored on every pass through the loop, when the program is executed. Likewise, JavaScript comments have to be detected and ignored at execution time, and they can make the file larger, which can lead to a longer download time.

In a compiled language, comments are detected by the compiler (or preprocessor, in some languages), and are thrown away. Their presence can slow the compilation (or preprocessing) step, but that’s a small one-time cost. Once you have an executable file, the comments are long gone, and will have absolutely no effect on the execution time of the program whenever you run it.

That said, it’s better to include relevant comments that improve code readability and maintainability, than to worry about their effect on compilation or interpreted execution. Code that’s difficult to read and maintain will have a much shorter lifespan. Minifiers that strip out comments are available for some interpreted languages, but you still want to maintain your original source code with its useful comments.

A matter of style

Religious wars have been fought over indentation policy, brace placement, the use of spaces vs tabs, and other source code formatting issues. Rather than discussing style specifics here, I’ll just offer two thoughts:

  1. A reasonable approach to source code formatting is important to achieving readability and maintainability.
  2. Consistency of style within each source file, across the entire project, and across the entire team is more important than which specific style is used. By being consistent, you avoid forcing the reader to continually shift gears mentally from one style to another. And you can avoid embarrassing eye-rolling observations like, “Oh, this part must have been written by Steve.” In this situation, conformity is better than individual whimsicality.

Not an afterthought

Don’t leave “making the code readable” as an extra task, after you get the code working. First, there may not be time at the end of the project, because of the pressure to ship it. Second, if you do manage to carve out time to do it at the end, it will take longer because you have to figure out all over again what the unreadable code is doing before you can do the right things to make it readable. And you may have forgotten important details that should be commented.

The right time to make your code readable is when you’re first writing it. Only then do you have all the details, intentions, and assumptions right there in your short-term memory.

Job insecurity?

There’s a common sarcastic view that writing unreadable code is the key to job security, because only the original developer can ever maintain it, if they can remember how it works. In reality, writing unreadable code costs a project extra time and money in the long run, and can ultimately lead to cancelled projects, complete rewrites, poor reputations, and lost jobs. I’ve seen all of these happen as a result of writing code that isn’t readable by humans.

Leave a Reply