When we talk about C source code portability, we’re talking about writing the code so that it
can be easily moved (ported) to another environment, so that after recompiling and relinking,
it will behave the same way it did originally (ideally without any changes to the source code
itself, but in practice, with only minimal changes)
What do we mean by “another environment?”
- Moving to a different operating system on the same hardware.
- Moving to a different version of the same operating system on the same hardware.
- Moving to a different variant/flavor/distribution of an operating system.
- Moving to a different CPU hardware architecture.
- Moving to a different C compiler on the same hardware and operating system.
- Moving to a different version of the same C compiler on the same hardware and
operating system. (Yes, code can break between versions of a compiler offered by the
same compiler vendor.)
Why do we want portable code?
The act of porting the code takes time and effort (and therefore has a cost), in terms of
understanding exactly what has to change, making the change(s), and testing the modified
code. If we can reduce the number of required changes to zero, or to a very small number of
isolated changes, we can reduce the porting effort and project cost.
How do we strive to achieve portability?
Here is a partial list to give you some idea of what to worry about:
- Don’t assume the size of any data type.
Data type sizes can and do vary from one environment to another. For example, an int
might be 16 bits, 32 bits, 64 bits, or more. It might vary between compilers for the exact
same hardware. It might change from one compiler version to another. The size of an int
may or may not have any relationship to the natural word size of the CPU hardware.
- Don’t assume that a pointer (to anything) is the same size as an int or the same size as
any other data type. Pointers are sometimes the same size as an int, but are often a different size from an int.
For example, in many popular compilers, building for a 32-bit target gives you a 32-bit int
and a 32-bit pointer, but building the same code with the same compiler for a 64-bit
target gives you a 32-bit int and a 64-bit pointer.
- Don’t make calls directly to the operating system.
Instead, use standard library functions. If none are available, then if possible, define an
platform-agnostic abstraction, and have that internally call into the operating system.
Don’t make assumptions about the underlying hardware.
including speed, memory size, memory map, I/O map, number of registers, etc.
- Don’t assume a specific endianness (byte ordering) of the target system.
Not only can endianness vary from one CPU architecture to another, but some CPU
architectures allow switching between big endian and little endian.
- Avoid the use of bit fields in structures, if you’re relying on a specific packing/ordering of the bit fields. Handling of
bit fields varies between implementations.
- Don’t assume that structure packing/padding will be the same in all environments.
Packing and padding behaviors can and do vary between compiler implementations,
even when targeting the same CPU hardware.
- Don’t embed assembly language in the source code. By definition, the code will break if you try to port it to a different CPU architecture.
- Don’t use compiler intrinsics or implementation-specific keywords and pragmas.
Obviously, not ever compiler implementation will have these features, so for maximum
portability, avoid them.
- Avoid the use of newer language features that have not been widely adopted.
Some people are really taken aback by this rule, but it has a very practical purpose. For
example, variable-length arrays have been part of the C standard since C99. But many
compiler implementations have never supported the feature, so porting code that uses
this feature becomes a problem. (The C11 standard has demoted this language feature
to optional, so it’s even more likely that many compilers will never implement the
feature.)
- Avoid all other undefined behavior and implementation-specific behavior.
Your code might appear to work in one environment, and fall apart as soon as you try to
port it another environment. This requires some common sense and a knowledge of
what is undefined and implementation-specific. Many compilers produce helpful
warnings when code ventures into these areas, but many don’t say anything at all.
If you must violate these rules – and there are sometimes excellent reasons to do so – it’s best
to isolate that non-portable code into a separate module, so that the porting work is isolated
and minimized.
Even if you don’t ever intend to port your code beyond its initial target, it’s a good idea to keep
portability in mind in all projects – you just never know where your code is going to end up.
Portability vs Efficiency – An Ancient Struggle
Portability is not a new concept, nor is it unique to the C programming language. In a source
code portability experiment performed in 1960 between two competing COBOL compiler
vendors, it was found that only a minimum number of modifications was required, due to
slight differences in the two compiler implementations. One member of the team said that
COBOL does not simultaneously preserve efficiency and compatibility across machines. The
same could be said today, nearly 60 yeas later, for many general-purpose high-level languages,
including C. If you can’t use compiler intrinsics and other non-portable local performance
enhancers in your source code, you may be trading portability for efficiency.