The choice of the most appropriate and efficient algorithm is the most important optimization
there is. And it needs to be implemented correctly, of course. A compiler can’t do this for you. It
requires a human. And most of the time, that’s really all the optimization you need to do.
Once we’re past that hurdle, optimizers in most modern compilers are extremely good at what
they do, assuming you enable the appropriate optimization options. Typically, many person-years of
effort have gone into the development of today’s optimizers. But they’re only as good as
the programmer’s understanding and use of the options available. Some IDE environments default to little
or no optimization, to aid in the debugging process. So, to get effectively compiler optimization,
the programmer needs to learn about the implementation’s optimization options, and use them
appropriately.
That said, a good compiler optimizer is not better than a human at optimizing code if:
- the human is a highly experienced assembly language programmer in the specific CPU
target architecture, and is aware of all available instructions, and
- the human fully understands all the standard optimization techniques a good compiler
optimizer uses, and
- the human has intimate knowledge of specific nuances of the CPU target, including branch
prediction fetches, instruction reordering and scheduling, pipeline stalls, cache line issues,
instruction timing, instruction size, temporal and spatial locality, etc., and
- the human has enough time to devote to doing this, including striking the right balance
between size and speed, without increasing the size enough to reduce overall performance.
As CPUs become more complex, the human assembly language programmer has to know more
than they did in the past to beat a good compiler optimizer. The optimizer (whether it’s a human
or a in compiler) has to be aware of a wide variety of factors, and has to take many of them into
account simultaneously.
It is indeed possible to write assembly code that’s more efficient than optimized compiled code.
How difficult it is to do depends on what you already know and how much time and effort you’re
willing to invest. And, of course, you’ll want to do performance measurements first to see if the
code you’re looking at actually needs any more optimization. Without knowing where the
trouble spots are, you can waste a lot of effort with little or no impact on overall performance