Do all programming languages compile into the same machine code?

I hear this question frequently.

The short answer is no, all programming languages do not compile into the same machine code.

Of course, the whole story has some twists and turns.  Here’s a quick rundown of the possibilities:

  • Compiling to Machine Code — Some programming languages do typically compile into machine code, but each CPU architecture has its own unique machine code. So, for languages that can be compiled into machine code, the actual machine code generated depends on the target CPU architecture.
    • In some cases, a different compiler is used to generate the machine code for each targeted CPU architecture.
    • In other cases, a single compiler is capable to generating the machine code for several different targeted CPU architectures, simply by specifying the desired CPU target as one of the compiler’s options.

The Faces of Intermediate Code

Historically, intermediate code was used internally within a compiler, to make it easier to decouple the front end of the compiler (lexical analyzer and parser) from the back end (code generator and optimizer). It is still used this way, so that a single front end can be implemented, and many different back ends can be developed separately, to target different CPU architectures.

As early as the mid-1960s, the use of intermediate code was extended to become the output of some compilers, instead of native machine code. The idea was to develop a single compiler, and then develop separate runtime interpreters for the intermediate code on each target CPU architecture. The idea was to be able to “write once, run anywhere,” as long as “anywhere” has an available runtime environment to perform the interpretation. Notable examples include:

  • UCSD Pascal, which requires the UCSD p-System to interpret the compiler’s generated p-code
  • Java, which requires the JVM to interpret the compiler’s generated bytecode
  • C#, which requires the CLR to interpret the compiler’s generated CIL

Having to reinterpret every intermediate code instruction at execution time proved to be a performance issue in some cases, so many runtime environments have been fitted with just-in-time (JIT) compilers, so that chunks of intermediate code are compiled into native machine code the first time they’re encountered. This approach can significantly improve performance, but adds more machine-specific complexity to the runtime environment.

  • Compiling to Intermediate Code — Some programming languages don’t typically compile into machine code at all, but instead compile into an intermediate language that doesn’t correspond to any actual hardware. Then, when that intermediate language program is executed, a runtime environment is required on the target CPU to interpret the intermediate code and cause the right things to happen. For example, the compiled bytecode from a Java program requires the JVM (Java Virtual Machine) on the target CPU; the compiled CIL from a C# program requires the CLR (Common Language Runtime) on the target CPU.
    • To improve performance, some of these runtime environments have been equipped with a JIT (just-in-time) compiler, which converts the intermediate code into machine code for the target CPU the first time that section of the intermediate code is executed.
  • Interpreting (not compiling at all) — Some programming languages are not typically compiled at all. Instead, the source code is interpreted at the time the program is run. The interpreter performs the actions specified by the interpreted source code.

You may have noticed that I said “typically” in each of the above scenarios. Programming languages themselves are defined as a set of syntax rules, semantic rules, lexical rules, etc. A specific programming language is not necessarily always compiled into machine code, or always compiled into an intermediate code, or always interpreted.

For example, many dialects of the BASIC language have traditionally been (and still are) interpreted, but there are many BASIC compilers available that generate machine code, and Visual Basic .NET compiles to CIL (aka MSIL) intermediate code. Likewise, the Java language is typically compiled into an intermediate code known as bytecode, but there are some Java compilers that will generate machine code, and there are even a few interpreters for Java out there.

Leave a Reply