I hear this question frequently.
The short answer is no, all programming languages do not compile into the same machine
code.
Of course, the whole story has some twists and turns. Here’s a quick rundown of the
possibilities:
- Compiling to Machine Code
Some programming languages do typically compile into machine code, but each CPU
architecture has its own unique machine code. So, for languages that can be compiled
into machine code, the actual machine code generated depends on the target CPU
architecture. - In some cases, a different compiler is used to generate the machine code for each
targeted CPU architecture.
- In other cases, a single compiler is capable to generating the machine code for
several different targeted CPU architectures, simply by specifying the desired
CPU target as one of the compiler’s options.
- Compiling to Intermediate Code
Some programming languages don’t
typically compile into machine code at all,
but instead compile into an intermediate
language that doesn’t correspond to any
actual hardware. Then, when that
intermediate language program is
executed, a runtime environment is
required on the target CPU to interpret the
intermediate code and cause the right
things to happen. For example, the
compiled bytecode from a Java program
requires the JVM (Java Virtual Machine) on
the target CPU; the compiled CIL from a C#
program requires the CLR (Common
Language Runtime) on the target CPU. - To improve performance, some of
these runtime environments have
been equipped with a JIT (just-in-time) compiler, which converts the
intermediate code into machine
code for the target CPU the first
time that section of the intermediate
code is executed.
- Interpreting (no compiling at all)
Some programming languages are not
typically compiled at all. Instead, the source
code is interpreted at the time the program
is run. The interpreter performs the actions
specified by the interpreted source code.
You may have noticed that I said “typically” in
each of the above scenarios. Programming
languages themselves are defined as a set of
syntax rules, semantic rules, lexical rules, etc. A
specific programming language is not
necessarily always compiled into machine code,
or always compiled into an intermediate code, or
always interpreted.
For example, many dialects of the BASIC
language have traditionally been (and still are)
interpreted, but there are many BASIC
compilers available that generate machine code,
and Visual Basic .NET compiles to CIL (aka MSIL)
intermediate code. Likewise, the Java language
is typically compiled into an intermediate code
known as bytecode, but there are some Java
compilers that will generate machine code, and
there are even a few interpreters for Java out
there.
The Faces of Intermediate Code
Historically, intermediate code was
used internally within a compiler, to
make it easier to decouple the front
end of the compiler (lexical analyzer
and parser) from the back end (code
generator and optimizer). It is still used
this way, so that a single front end can
be implemented, and many different
back ends can be developed separately,
to target different CPU architectures.
As early as the mid-1960s, the use of
intermediate code was extended to
become the output of some compilers,
instead of native machine code. The
idea was to develop a single compiler,
and then develop separate runtime
interpreters for the intermediate code
on each target CPU architecture. The
idea was to be able to “write once, run
anywhere,” as long as “anywhere” has
an available runtime environment to
perform the interpretation. Notable
examples include:
- UCSD Pascal, which requires the
UCSD p-System to interpret the
compiler’s generated p-code
- Java, which requires the JVM to
interpret the compiler’s
generated bytecode
- C#, which requires the CLR to
interpret the compiler’s
generated CIL
Having to reinterpret every
intermediate code instruction at
execution time proved to be a
performance issue in some cases, so
many runtime environments have been
fitted with just-in-time (JIT) compilers,
so that chunks of intermediate code
are compiled into native machine code
the first time they’re encountered. This
approach can significantly improve
performance, but adds more machine-specific complexity to the runtime
environment.