Linux Portability Tips

Adam Greenblatt

Last Revised 17 July 2023

I use multiple versions of linux on 64-bit x86_64 CPUs. To the extent possible, it is nice to be able to build software on one version of linux and run it on another. Throughout much of the 1990s and 2000s, careful attention to cross-version compatibility in key linux system libraries and kernel APIs made this pretty straightforward. As long as you didn't explicitly use exotic system calls or arcane third-party libraries, code built on, say, a 2007 Ubuntu machine would run perfectly well on a 2011 Red Hat machine, or vice versa.

In the modern era this is no longer the case: instabilities in the C library's API and in the way the C library uses the kernel's API have made such cross-OS compatibility much more complicated. Generally speaking, forward compatibility still works well: you can build software on Fedora 14 and it will run without modification on Fedora 35 with no special effort. The converse is very much not true: even the simplest "hello world" C program built on Fedora 35 will not run on Fedora 34 by default. But all is not lost: here's a concrete example.

Suppose we have three (in my case virtual) machines:
vf14 (running Fedora 14 released in November of 2010 with linux kernel 2.6.35 and glibc 2.12.90)
vf34 (running Fedora 34 released in April of 2021 with linux kernel 5.14.14 and glibc 2.33)
vf35 (running Fedora 35 released in November of 2021 with kernel 5.14.14 and glibc 2.34)

Consider the following minimal C program built on vf35:

vf35 $ echo "int main () { return 1; }" > hello.c
vf35 $ scp hello.c adam@vf34:
vf35 $ scp hello.c adam@vf14:
vf35 $ gcc -o hello-vf35 hello.c
vf35 $ ./hello-vf35
vf35 $ echo $?
1
vf35 $ scp hello-vf35 adam@vf34:
vf35 $ scp hello-vf35 adam@vf14:
It will not run on vf34 due to changes in glibc:
vf34 $ ./hello-vf35
./hello-vf35: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by ./hello)
vf34 $
What's missing? Well:
vf35 $ readelf --wide -a ./hello-vf35 | grep GLIBC_2\.34
 0000000000403ff0  0000000100000006 R_X86_64_GLOB_DAT      0000000000000000 __libc_start_main@GLIBC_2.34 + 0
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.34 (2)
    50: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.34
  000:   0 (*local*)       2 (GLIBC_2.34)    1 (*global*)
  0x0010:   Name: GLIBC_2.34  Flags: none  Version: 2
readelf: ./hello: Warning: Gap in build notes detected from 0x401021 to 0x401045
readelf: ./hello: Warning: Gap in build notes detected from 0x401021 to 0x40104f
readelf: ./hello: Warning: Gap in build notes detected from 0x401047 to 0x401054
vf35 $
Looks like glibc changed some undocumented internal API involved in program initialization in between glibc 2.33 and glibc 2.34. The same code built on Fedora 34 looks like:
vf34 $ gcc -o hello-vf34 hello.c
vf34 $ readelf --wide -a ./hello-vf34 | grep __libc_start_main
0000000000403ff0  0000000100000006 R_X86_64_GLOB_DAT      0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (2)
    81: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5
vf34 $ scp hello-vf34 adam@vf14:
Looks like __libc_start_main has changed in glibc 2.34, but before that has remained unchanged all the way back to glibc 2.2.5, which predates even Fedora 14 by some years. Sure enough Fedora 14 interoperates with Fedora 34 just fine:
vf14 $ ./hello-vf34
vf14 $ echo $?
1
vf14 $
How can we fix this? The traditional technique is to statically link our application, using the same underlying glibc code everywhere, instead of using each OS's native glibc. This works fine for Fedora 35 and Fedora 34:
vf35 $ gcc -o hello-vf35-static hello.c -static
vf35 $ scp hello-vf35-static adam@vf34:
vf35 $ scp hello-vf35-static adam@vf14:
On Fedora 34:
vf34 $ ./hello-vf35-static
vf34 $ echo $?
1
vf34 $
But on Fedora 14:
vf14 $ ./hello-vf35-static
FATAL: kernel too old
Aborted (core dumped)
vf14 $
A little digging turns up the fact that glibc 2.26 changed the minimum required linux kernel version from 2.6.x to 3.2.x. (That minimum is still in place today, with glibc 2.34.) So static linking won't work if we need to run on vintage 2010 2.6 Kernels: we must use dynamic linking to get a C library compatible with the kernel we're running on.

I first ran into this problem back in 2015 with memcpy instead of the private __libc_start_main. The solution in that case was to insert an assembly language symver directive into my source code such that references to memcpy are diverted to references to memcpy@GLIBC_2.2.5 by the dynamic loader. That is the version of memcpy supplied in all versions of glibc prior to glibc 2.14. This works when all references to memcpy originate in code I control. See here for details.

More complex programs use external libraries such as libz that in turn call memcpy themselves. A symver directive will not fix those references: I'm not building libz from source, I'm using the already built version supplied with the OS. I used to patch the final ELF executable for programs like that to reroute *any* calls to memcpy to the older version -- see here for the gory details.

That was good enough through Fedora 28 / glibc 2.27. Newer glibc versions have introduced numerous additional incompatibilities, some of which cannot be fixed by simple symbol renaming. For example, stat in older glibc is a macro that turns into a call to an internal __xstat function. In newer glibc, the stat function is a first class function call. When building with newer gilbc, we need some way to force all stat calls to be rewritten as calls to __xstat, since that is all that is available when running under older glibc.

Fortunately, binutils and the dynamic loader come to the rescue: the linker, ld, supports a --wrap option that produces a binary that reroutes all calls to a given function foo -- including any calls to foo from an external library like libz -- into calls to a new function named __wrap_foo that we must write. (The linker can't do this by itself -- this requires support from ld-linux-x86-64.so.2, the dynamic loader, too. Luckily, this support has been around for a long time, all the way back to Fedora 14.)

The details of how each __wrap function must be implemented vary; the implementation of __wrap___libc_start_main was particularly tricky. For more details, read glibcwrap.c.