Rambles around computer science

Diverting trains of thought, wasting precious time

Thu, 16 Jul 2015

ELF introspection, robustly and portably

I often find myself writing code that does introspection at the ELF level. This is for things like looking up symbols, walking the link map, figuring out which shared object a particular memory address belongs to, which memory regions are executable, and so on. These are fairly common things to want to do in ptrace()- or LD_PRELOAD-based tools, or binary interpreter-like (Valgrind-style) ones. This post is about how to do these things robustly from within a process, where “robustly” brings some particular restrictions.

What introspection?

What does our address space look like? To spare myself the pain of drawing diagrams, we can ask Linux to dump a somewhat-pictorial view of a simple cat process by doing cat /proc/self/maps. I have reversed the order of the lines so that higher addresses go higher up, and tweaked the formatting slightly.

          address range           flgs  offset   dev   inode                  pathname or other name   
------------------------------------------------------------------------------------------------------------
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

    7fff301fe000-    7fff30200000 r-xp 00000000 00:00 0                  [vdso]

    7fff301a5000-    7fff301c7000 rw-p 00000000 00:00 0                  [stack]

    7f4a574bc000-    7f4a574bd000 rw-p 00000000 00:00 0 
    7f4a574bb000-    7f4a574bc000 rw-p 00023000 08:01 889388             /lib/x86_64-linux-gnu/ld-2.19.so
    7f4a574ba000-    7f4a574bb000 r--p 00022000 08:01 889388             /lib/x86_64-linux-gnu/ld-2.19.so
    7f4a57493000-    7f4a574ba000 rw-p 00000000 00:00 0 

    7f4a57298000-    7f4a572bb000 r-xp 00000000 08:01 889388             /lib/x86_64-linux-gnu/ld-2.19.so
    7f4a57293000-    7f4a57298000 rw-p 00000000 00:00 0 
    7f4a57291000-    7f4a57293000 rw-p 001bf000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a5728d000-    7f4a57291000 r--p 001bb000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a5708e000-    7f4a5728d000 ---p 001bc000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a56ed2000-    7f4a5708e000 r-xp 00000000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a56c09000-    7f4a56ed2000 r--p 00000000 08:05 2750795            /usr/lib/locale/locale-archive
        00838000-        00859000 rw-p 00000000 00:00 0                  [heap]
        0060c000-        0060d000 rw-p 0000c000 08:01 286233             /bin/cat
        0060b000-        0060c000 r--p 0000b000 08:01 286233             /bin/cat
        00400000-        0040c000 r-xp 00000000 08:01 286233             /bin/cat

This picture is a refinement of the usual OS textbook diagram of a Unix address space. As usual, we have the executable's text segment (bottom), read-only and then writable data (next two up), heap (next up), a stack (much higher) and the kernel region (up top). (Actually the kernel region is not shown, except for one small part, the vsyscall page, which is the only user-accessible area.) The biggest difference is that we have not only the executable but some libraries—primarily the C library (since cat doesn't use others), but also the dynamic linker (itself a library) and the mysterious vdso. All these are ELF objects. In general, a large subset of a process's structure is actually defined by the collection of loaded ELF objects and a little per-process structure also defined by ELF. Our problem is now: how can we acquire a view on this ELF-level structure, from within the program itself at run time, efficiently and robustly?

(The above listing also shows a few other flavours of mapping that won't concern us here: a memory-mapped file /usr/lib/locale/locale-archive, and two anonymous mappings, beginning at 0x7f4a57493000 and 0x7f4a57293000, which are almost certainly heap areas—we can have many heaps, not just the one that grows out of the executable's data segment. And of course, although we see only one mapping labelled [stack], in a large multithreaded process there will be many such.)

What's ELF?

The basics of ELF linking were first defined in System V Release 4 in 1988, and eventually adopted by non-commercial Unices starting in the mid-1990s (1995 for Linux, with kernel 1.2; 1998 for FreeBSD, with version 3; NetBSD and OpenBSD in 2001 and 2003 respectively). The “classic” picture of a loadable ELF file (direct from the spec) looks like the following.

**Execution View**
ELF Header
Program header table required
Segment 1
Segment 2
Segment 3
...
Section header table optional

But actually, that omits some of the most interesting stuff! Let's try again.

**Execution View**
ELF Header
Program header table required
Dynamic linking metadata
Dynamic symbol table
Dynamic relocation table
Segment 1
Segment 2
Segment 3
...
Section header table optional
Symbol table optional

Dynamic symbols, program headers and the miscellaneous dynamic linking metadata are what we're interested in introspecting on. The POSIX libdl API offers one interface for doing introspection at the ELF level, at least for looking up symbols by name. Non-portable extensions, like SunOS/Solaris's dladdr() and the later-added dladdr1(), together with GNU's dlvsym() and dl_iterate_phdr(), get us a bit further, walking program headers and looking up symbols by address or version. We get further still if we allow ourselves OS-specific features, like Linux's helpful maps file in the /proc filesystem that generated the dump above, together with the auxv file or its associated library call getauxval(). If we're happy to parse the ELF file itself, by doing file I/O, we can get at anything we like.

What's “robustly”?

There are problems with all of the preceding approaches. Portability is lacking from many—only the POSIX APIs and ELF file I/O are widely portable. Robustness is lacking from all of them, in the sense that it may not be safe or reliable to use them from “sensitive contexts” where ELF introspection may be required. The main kind of “sensitive context” is one where making system calls or library calls would cause unwanted reentrancy. For example, if we want access to metadata from instrumentation logic that runs in the middle of a malloc() call, calling out to a library which might itself call malloc() is a bad idea. If we're instrumenting the code paths that make system calls (ask me how!), calling out to the OS might also be a bad idea. These unintended reentrancies can easily cause infinite loops, deadlock or state corruption. There are also security reasons why system calls and library calls are less reliable than working entirely within the process: the filesystem might have changed since the filenames in our maps or link map were recorded, for example.

So, how much ELF introspection can we do without the help of system calls, library calls or non-portable code? The answer surprised me: quite a lot, but not quite everything. There is some low-hanging fruit, then some that we can just about reach (provided a couple of assumptions which, although not strictly portable, probably hold for most platforms), then some that absolutely no portable method can let us get (without resorting to file I/O).

Let's state our problem slightly more precisely. Our address space includes a selection of objects (the executable and all loaded libraries), mapped at particular addresses. We'd first like to enumerate these mappings, giving us a look-up from load addresses to the object's originating binary file (pathname)—more-or-less the structure we saw in the /proc/self/maps listing earlier. Furthermore, each object defines some symbols. We'd like to be able to resolve symbols in each object, as well as any other useful other metadata, like their soname, the libraries they depend on, the locations of support structures like procedure linkage table (PLT), and the relocations performed when they were loaded. This is what we saw in the elaborated ELF file picture. The symbols-and-addresses stuff is the most important; the relocation and miscellaneous information isn't useful as often, but we'd like to have it anyway. Finally, there's some useful metadata that the operating system has passed down about the process, in an ELF-defined structure called the auxiliary vector. It turns out that we'll use this to get at some of the other metadata, but some other parts of it are useful in its own right—they tell us the process ID, page size, and various properties of the hardware.

The link map

To enumerate the mapped objects, we need access to a structure called the link map. The portable interface for walking the link map is the mysterious r_debug protocol. This is surprisingly murky in its origins, but it probably comes from SunOS; it was definitely included in System V release 4 (the only solid reference I know is the relevant Programmer's Guide). It's actually not specified in most ABIs, although some seem to do so. The r_debug protocol defines a structure containing pointers to the link map. From an executable, it's easy to find this structure: we usually have a DT_DEBUG entry in PT_DYNAMIC, and the dynamic linker updates to points to a r_debug structure it allocates. This contains a pair of pointers to the link map, represented as a doubly-linked list.

What about from a shared library? Getting a pointer to the r_debug is trickier here, because shared libraries don't usually have a DT_DEBUG entry. This is a chicken-and-egg problem: we need the executable's r_debug to get to other objects, but if we're not starting in the executable, how do we get there? So here comes our first somewhat-non-portable assumption: that a global symbol _r_debug is available, and marks the r_debug structure in the dynamic linker. This symbol is commonly present, but I don't believe it's standard—POSIX generally specifies programmer-facing APIs, not linker-level symbols. (As we'll see shortly, another method for getting at it is available, but it is also non-portable and imposes some other requirements. Anyway, by combining these two options, we have a pretty good chance that we can find the link map. So, let's continue on the assumption that we have the link map.)

Symbols

Once we have the link map, we'd like to do symbol lookups in each object. For this, we need (at least) the dynamic symbol tables, which we can find via the PT_DYNAMIC segments of each object. (We probably also want the symbol hash tables; the method is similar.) The link map helpfully keeps a pointer to this specific segment (rather than to the overall program header table). The segment's content consists of key-value pairs, and it's the pair whose key is DT_SYMTAB that points to the dynamic symbol table. (On most ABIs, a short cut to get a pointer to the containing object's PT_DYNAMIC segment is to make a link-time reference to the symbol _DYNAMIC. But this only lets us get the containing object's PT_DYNAMIC, whereas walking the link map gets us all of them.)

One caveat about symbol lookups is that only dynamic symbols are looked up. The other symbols, if there are any (not needed for dynamic linking, but useful for debugging), live in a different table, .symtab, which needn't be mapped at run time. Depending on how your object was linked, perhaps most symbols became dynamic symbols (--export-dynamic) or perhaps only a minimal selection did. If you've ever tried to look up stat() in the GNU C library, using dlsym(), and been surprised not to find it, this is why. The stat() function is one of a few that are always linked statically, even when the rest of the C library is linked dynamically. In general, not all code or data in our dynamic objects is actually described by dynamic symbols. I'll return to this shortly.

The auxiliary vector

The link map gave us a list of loaded objects and their base addresses, but it doesn't tell us where each individual segment is mapped. This is where we begin our voyage into the not-really-portable. In the case of the executable, we can get the program headers via another useful metadata structure: the auxiliary vector. This is created by the operating system to supply various information of use to startup code (typically the bowels of the C library) and the dynamic linker (which, if one is used, also mutates the metadata to make it look as though the executable was loaded directly). (If we failed to get the link map using the previous _r_debug symbol method, we can also use the auxiliary vector to get to the executable's DT_DEBUG, and hence to the link map.) Here's a very nice article all about auxiliary vectors.

The auxiliary vector lives high on the initial thread's stack, a fixed offset above the stack pointer when the program's first instruction receives control. If we could walk the main thread's stack right to the top, we could find the auxiliary vector easily. However, nowadays, since we lack frame pointers, that means making a library call to something like libunwind, and that might allocate memory. Even then, it's not guaranteed that we can walk the stack all the way to the top, since unwind information might be patchy up there. So we'll need a different approach.

I devised a nasty but ultimately pleasing hack for this. When we say “auxiliary vector” we really mean specifically a list of key-value pairs containing the useful metadata I mentioned. But there's also a bunch of other things up there, in a large contiguous structure (see the helpful article for a diagram): the environment strings (a blob of characters), argument strings (ditto), and vectors of pointers into these blobs (our initial environ and argv). Instead of walking the stack, can we get hold of a pointer into one of these blobs, and then parse the surrounding memory to find the start of the key-value pairs? It's very likely that we can, since we can expect to find a handy global variable, environ, that points into them.

Now, it's important to note that this isn't completely robust. The auxiliary vector only records the initial environment. A program can modify the current environment strings to its heart's content by using C library calls like putenv() and getenv(). These will mutate and, if necessary, reallocate the vector pointed at by environ. However, unless a program deletes its entire initial environment, and assuming the C library doesn't eagerly copy things out of it, that vector should always contain one or more pointers into the initial environment blob. So, we want to walk environ and look for such a pointer.

The next problem: how will we recognise such a pointer when we see it? For this, we need to fix some more very-likely-true assumptions. Firstly, we assume either that the initial stack is the highest user mapping in the process, or that if something is higher, it's a shared object (perhaps Linux's vdso) which we can identify in the link map. (Put differently: what's not allowed is for there to be a memory-mapped file above the initial user stack, other than a shared object. This doesn't seem too restrictive; we could even imagine an ABI-level guarantee from the kernel that it will never create such mappings.) This assumption gives us an upper bound to compare our environment pointers against. What about a lower bound? For that, we assume that the caller can supply us one, in the form of a pointer higher than any other stack-allocated string in environ. The address of a local variable in main() could be one such pointer. In fact, any address on the initial stack will probably be good enough, since putting a stack-allocated string in your environment would be a very quirky thing to do. Anyway, we suppose that our caller, the client code which wants to get hold of the auxiliary vector, can give us a such a pointer.

Now we're ready to go. We scan *environ for pointers to strings within these bounds. Then we navigate downwards in memory to find the end of the auxiliary vector, then keep going to find its beginning, after which we'll see the terminator of the argument or environment pointer vector.

Once we have the auxiliary vector, we have the executable's program headers, from which it's easy to get at most other things we need, again using the link map to access other objects.

Program headers

One challenge remains for us: getting at shared objects' program headers. The problem here is that these headers needn't, in principle, be found anywhere in memory. (The ELF format allows a file to require that its program headers are mapped, using PT_PHDR, but it's not required and its use is uncommon.) Typically a dynamic linker will always keep each object's program headers in memory somewhere, but it's not obliged to do so. Even if it does, that memory might be anywhere. The GNU dynamic linker sometimes stores just a malloc'd copy of the data, perhaps originally read through a file descriptor rather than a memory mapping. Either way, the pointer it maintains to the header data lives in private state (in extra non-public fields in the link map structure); there's no portable access to it. So, to get shared objects' program headers there's no avoiding the use of some platform-specific (and loader-specific) code to fish it out.

The major limitation of doing without program headers is that we don't have a complete map of the address space. The link map lets us enumerate objects and their start addresses, and we can then use the symbol tables to do a “forward lookup” from symbols to addresses. We can invert the symbol tables to give us a partial backward lookup from addresses to symbols. But we can't do quite as complete a backward lookup as the dladdr() function can. If you give dladdr() an address that falls within an object's mappings but isn't covered by an exported symbol, it can still tell you which object it's in. Only the program headers contain the information necessary for this. Another thing we can't figure out is which memory access permissions a segment was mapped with—again, that means looking at the relevant program header.

Maximising ELF introspectability

Armed with these observations, we could imagine dynamically rewriting the binaries on our system slightly to optimise for ELF introspectability. Firstly we would insert a PT_PHDR, and define a symbol (maybe _PHDR), roughly analogous to _DYNAMIC), to help us find it. Secondly, going back to the restriction that only dynamic symbols were available for lookup, we could export all .symtab entries as “hidden” dynamic symbols. The obvious objection to this is that it will slow down linking. However, symbol lookup by name happens via the hash table (DT_HASH or GNU_HASH). I haven't checked all the details yet, but it appears that adding them to the symbol table needn't necessarily mean including them in the hash table. There has to be a hash-chain “next pointer” per symbol, but it's permitted not to hook that entry into any bucket. Since link-time lookups occur via the hash table, leaving the hidden symbols out of the hash table seems to defeat the objection. They'd be present for a “deep” introspector to find, by scanning the symbol table. But they wouldn't interfere with the symbol lookup process, nor with code using dlsym().

Code

If you'd like to use this introspection goodness, I've coded it all up in a single header, relf.h, which is fairly self-contained and should be self-explanatory. Please let me know if you find it useful!

[/devel] permanent link contact

validate this page