Loop unrolling as a symptom of premature optimization

I was asked a question recently that went something like this:

“If a for loop is faster when it’s unrolled, why does my boss always want me to make my code shorter by using for loops, when my way (loop unrolling) is faster and easier to read?”

There are a few issues to address in this question, including:

  • Is an unrolled loop always faster than the equivalent for loop?  (Spoiler alert…no.)
  • Is an unrolled loop easier to read than the equivalent for loop? (Spoiler alert…no.)
  • Is an unrolled loop easier to maintain than the equivalent for loop? (Spoiler alert…no.)
  • Is the boss right?  (Spoiler alert…yes.)

I suggested that this software developer start by hanging this Donald Knuth quote on their office wall (if office policy allows you to hang things on the wall):

“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.”

You really shouldn’t even think about loop unrolling in your source code, unless you have actual performance measurement data for your specific program that shows this loop to be a performance hot spot. If you don’t have the actual performance data from your actual program, you’re very likely wasting your time optimizing an area of the code that won’t make any significant overall difference.

Even if you have the performance measurement data that shows this loop to be a hot spot, check to make sure that your compiler’s optimization options have been set appropriately. Most modern, mature, optimizing compilers will do a lot of complex optimizations, and sometimes even do loop unrolling in the generated code. So, make sure you’re using the tools properly, before you unroll the loop in your source code.

In fact, unrolling the loop in your source code could actually generate slower code than the compiler might generate from the for loop. Knowing that it’s a loop and understanding exactly what’s inside the loop can help the compiler perform all sorts of optimizations (e.g., better register allocation, more efficient reordering of instructions for the target architectures, etc.) that it wouldn’t be able to discern when compiling your unrolled source code.

Of course, the unrolled loop will likely have a larger memory footprint than the for loop version, and the larger footprint can also adversely affect performance, in some cases.

As for readability and maintainability of the code, a for loop is always going to be more readable and more maintainable than the equivalent unrolled source code. (If the for loop version isn’t more readable and maintainable, something is very very wrong with your for loop.)

Consider the maintainer fixing a bug in that section of code. How many changes will they have to apply to the unrolled loop code? Potentially many. How many changes will they have to apply to the for loop version? Probably one. Every change during maintenance involves risk. And when code is duplicated, there’s always a chance that some bug fixes won’t get applied to every copy of the code. Are you 100% sure that every maintainer of the code for the rest of time will always catch every place in the unrolled loop that has to be changed? The correct answer is “no.”

Unrolled loops execute a fixed number of iterations. When it’s time to change that number during maintenance, is it:

  • (A) easier and less error-prone to change the number of iterations in the for loop, or
  • (B) easier and less error-prone to have to change (potentially) multiple lines of code, in the right place, to add an iteration to the unrolled source code?

The answer, of course, is (A). The for loop will win this debate every time. The unrolled version will be more brittle during maintenance.

I’ve written and optimized a lot of code over the past several decades, and have managed many software projects. The boss is correct. Use a for loop.

80% or more of the cost of the software you’re developing occurs in the maintenance phase. Anything you do to slow down maintenance has a huge impact on the overall cost of the project. Always write readable, maintainable source code. Writing code that isn’t maintainable severely shortens the life of your code…it will probably be completely rewritten at the first opportunity.

It’s a common rookie mistake to go down the source code optimization road. Be less of a rookie, and learn from your boss and from the senior people in your team.

Update: A few people have mentioned Duff’s Device in C, as a proven mechanism for loop unrolling. While the approach of interleaving a loop with a switch statement, and replicating the body of the loop in multiple switch cases is an approach to loop unrolling, the fact is that it can actually lower performance, depending on the specific compiler implementation, optimization options, the target CPU’s pipelining and branch prediction mechanisms, etc. There have been cases in which use of this technique resulted in slower execution speeds than functionally equivalent, far more readable, traditional loops.

The point is, before you go down the path of loop unrolling in source code, you should measure performance and focus on hot spots. And if you do attempt to apply something like Duff’s Device, be sure to measure performance before and after, using your intended final optimization options and your target CPU. If you’re targeting a range of CPUs, be sure to test all of them before and after. Otherwise, you could be making things worse.