Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 03 Aug 2022

How and why to do link-time symbol wrapping (or not?)

It's sometimes useful to hook functions at link time. On ELF platforms (among others) a GNU-style linker offers the --wrap option which seems to be exactly what we want.

           Use a wrapper function for symbol.  Any undefined reference to symbol will be
           resolved to "__wrap_symbol".  Any undefined reference to "__real_symbol" will be
           resolved to symbol.

Unfortunately there are several problems that arise when using this. In this post I'll cover some ways to solve these problems. Some of the ways I've found, others I've created. At the end, I'll discuss whether we should be doing any of this.

Problem 1: wrapping file-internal references

As the helpful manual page states, --wrap is not a complete solution.

Only undefined references are replaced by the linker.  So, translation unit
internal references to symbol are not resolved to "__wrap_symbol".  In the next
example, the call to "f" in "g" is not resolved to "__wrap_f".

       f (void)
         return 123;

       g (void)
         return f();

Why not? The text implies that it's because there's no undefined reference to f. This means a relocation record referring to a symbol that is undefined (UND) in the referring file. So, if we move either f or g into a separate file, we can ask objdump -rd to show us the two files.

$ objdump -rd f.o g.o
f.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 7b 00 00 00          mov    $0x7b,%eax
   9:   5d                      pop    %rbp
   a:   c3                      retq 

g.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   e8 00 00 00 00          callq  9 
                        5: R_X86_64_PLT32       f-0x4
   9:   5d                      pop    %rbp
   a:   c3                      retq 

As expected, we see a reference to f from g, marked by a R_X86_64_PLT32 relocation record. In this case, --wrap will work perfectly, by binding instead to __wrap_f (assuming we supply one). This works because f is an undefined symbol in g.o.

$ objdump -t g.o | grep 'f$'
0000000000000000         *UND*  0000000000000000 f

By contrast, if we keep f and g in the same file, we get something that looks similar but is actually quite different.

$ objdump -rd testwrap.o
testwrap.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 7b 00 00 00          mov    $0x7b,%eax
   9:   5d                      pop    %rbp
   a:   c3                      retq 

000000000000000b :
   b:   55                      push   %rbp
   c:   48 89 e5                mov    %rsp,%rbp
   f:   e8 00 00 00 00          callq  14 
                        10: R_X86_64_PLT32      f-0x4
  14:   5d                      pop    %rbp
  15:   c3                      retq 

To see the difference we have to look at the symbol for f that the relocation record is referencing.

$ objdump -t testwrap.o | grep 'f$'
0000000000000000 g     F .text  000000000000000b f

This time it's a defined symbol. That means g's reference to f is already bound to the definition of f. There is no separate identity for “the thing, called f within the code of g, that g calls”. There is just “the definition called f”. Worse, that's directly how things came out of the assembler. Since no link-time action binds this reference, link time is too late to interfere.

This is an awkward non-compositionality when assembling and linking ELF or most other object code: if we group multiple parts into a single file, such as by refactoring what was once many source files into one, at link time we get a less descriptive model of the referential structure of the whole. Binding has occurred eagerly, and we have no way to talk about “references requesting a thing called f” as distinct from “references to a definite thing, namely f”. Since --wrap relies on the distinction, it can't have any effect.

From now on I'll refer to this pre-binding issue as “problem 1a”.

In fact there is another way for --wrap to go wrong, which is if we want to wrap a static (file-local) function. Although wrapping a static function is less common, it might still be desirable. Reliably wrapping local calls to statics is intractable without going back to the source, because they will often have been inlined. But when a static function is address-taken, it might end up being called from far and wide within the program; we might reasonably want to instrument this so that actually, a wrapper is the thing that gets address-taken. Problem 1a guarantees that this cannot possibly work: references to static functions are always intra-file, and therefore to a defined symbol. However, even if problem 1a were solved, we'd have another problem: at the point where we reference static function f, on many architectures, the compiler can use a relative jump or call. Within the same text section, such jumps don't even need a relocation record, so the linker has no clue that reference is even happening. This is problem 1b. Later on I'll discuss ways to solve it, but let's worry about the basic non-static case initially.

Naive solution 1a: symbol unbinding

To solve problem 1a, intuitively we want some way to “unbind” these assembler-bound references. There is no off-the-shelf way to do this (trust me! but I discuss this further below). Way back in 2008 I wrote a patch (unfortunately buggy graduate-student code) to GNU objcopy that extends it to perform unbinding, adding an option --unbind-sym SYM.

$ ./objcopy --help
     --unbind-sym             Separate <sym> into  __def_<sym> and __ref_<sym>

For wrapping, I would use this followed by a second objcopy to change __def_f back to f and __ref_f to __wrap_f, so that references will go to the wrapper, just as we wanted.

I submitted my patch on the Bugzilla in July 2008. (Intriguingly, to this day it still shows as “NEW”. Clearly the Bugzilla was the wrong place to post it.) Some years later I even fixed the code, integrated it into a forked binutils tree which I then kept vaguely up-to-date for a few years. The story doesn't end there, but it's a viable start. So let's return to problem 1b.

Naive solution 1b: -ffunction-sections and objcopy --globalize-symbol

I mentioned that wrapping a static function is sometimes desired, but that a call to a static function might not even have a relocation record. There's no hope of diverting to a wrapper function at link time if the linker doesn't have any relocation to apply. Consider this code.

static void *myalloc(size_t size)
        return malloc(size);
void *(*get_allocator(void))(size_t)
        return myalloc;

If we use objdump -rd on the resulting object code, we see this.

Disassembly of section .text:

0000000000000000 :
  (snip code of myalloc)

000000000000001a :
  1a:   55                      push   %rbp
  1b:   48 89 e5                mov    %rsp,%rbp
  1e:   48 8d 05 db ff ff ff    lea    -0x25(%rip),%rax        # 0 
  25:   5d                      pop    %rbp
  26:   c3                      retq 

Notice the lea whose argument points back to the definition of myalloc. It is just referring to a point earlier in the same text section; no relocation is needed.

The fix is to compile with -ffunction-sections. Since each function gets its own section, calls to a static function are nevertheless exiting the section, so require relocation (sections are the unit of placement in memory). Even references to static functions will come with a relocation record.

(In contexts where you don't control compilation, -ffunction-sections might not be an option, but you could in principle write a tool that walks over the binary, decodes its instructions and re-adds relocs at instructions that address in-section memory locations.)

If we do this, we get the following.

Disassembly of section .text.myalloc:

0000000000000000 :
  (snip code for myalloc)

Disassembly of section .text.get_allocator:

0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # b 
                        7: R_X86_64_PC32        .text.myalloc-0x4
   b:   5d                      pop    %rbp
   c:   c3                      retq 

In other words we now have two different sections and a relocated reference between them. If we compile like this, then promote the local symbol to a global, using objcopy --globalize-symbol we should be able to wrap it like any other... job done? Not quite!

Real solution 1b: preferring non-section relocs

Notice that the relocation record above isn't relative to a symbol called myalloc; it's relative to the section symbol .text.myalloc. So the reference won't be unbound if we ask to unbind myalloc! We also can't just name .text.myalloc as the symbol to unbind since (among other problems) the linker doesn't perform wrapping on section symbols anyway.

As a quick fix, I provided a way to eliminate use of section symbols in relocs where a named function or object symbol was available and could be used instead.

$ ./objcopy --help
     --prefer-non-section-relocs   Avoid section symbols in relocations, where possible

This is progress... although it seems nasty. The nastiness does not end there. Let's keep track of what we've learned about how to handle these tricky questions.

Uses of a definition should be wrapped, so force them to use an undefined symbol, unbinding reference from definition if both are present in the same file.
Uses of a definition should be wrapped, and only ordinary symbol-based relocs can be wrapped, so force these references to use ordinary-symbol-based relocs, not section-symbol-based relocs.

Real solution 1a: refinements to unbinding

Even without worrying about static functions, some similar issues emerged in the course of using my naive unbinding tool. Just as above we worried about “section references”, there are also “self-reference”, and “meta-reference” cases.

Meta-references are (my name here for) references from debugging information, or similar metadata, to “base-level” code or data. This doesn't just affect debugging per se; unwinding information also includes these meta-references, and is critical to C++-style exception handling. Naturally, metadata about compiled code wants to refer to specific places in that code. And naturally, to do so it uses relocation records bound to defined symbols. For example, the .eh_frame unwind information section for a hello.o, when dumped by readelf, looks like this:

00000018 000000000000001c 0000001c FDE cie=00000000 pc=0000000000000000..0000000000000017
  DW_CFA_advance_loc: 1 to 0000000000000001
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r6 (rbp) at cfa-16
  DW_CFA_advance_loc: 3 to 0000000000000004
  DW_CFA_def_cfa_register: r6 (rbp)
  DW_CFA_advance_loc: 18 to 0000000000000016
  DW_CFA_def_cfa: r7 (rsp) ofs 8

... but actually the binary also includes relocation records for the section, to refer to the actual address of its text section.

Raw data contents of section .eh_frame:
  0000 14000000 00000000 017a5200 01781001  .........zR..x..
  0010 1b0c0708 90010000 1c000000 1c000000  ................
  0020 00000000 17000000 00410e10 8602430d  .........A....C.
                        20: R_X86_64_PC32       .text
  0030 06520c07 08000000                    .R......      

This is what ensures that if you dump a fully linked hello binary, the line showing pc=0000000000000000 above will instead actually mention the real address of the main function. (If you're wondering: stock objdump can't dump data sections with relocation records interspersed like I've shown above. But my objdump-all script can.)

The whole idea of “unbinding” is to wrench apart bindings to defined symbols. But clearly we don't want to unbind debugging information from the code is describes! We just want to unbind calls and address-takes. Unfortunately, at the object code level, it's not trivial to distinguish these from meta-references. In my patch, the best I could do was the following.

+  /* What is a self reloc? It's either
+   * - in a debug section;
+   * - in .eh_frame (HACK);
+   * - within the dynamic extent of the symbol to which it refers. */
+  bfd_boolean is_debug_or_eh_sect =
+    (bfd_get_section_flags (ibfd, isection) & SEC_DEBUGGING)
+         || 0 == strcmp(bfd_get_section_name (ibfd, isection), ".eh_frame");

We then avoid unbinding references coming from a debug or unwind section. For better or worse, BFD doesn't count .eh_frame as a debugging section (certainly one wouldn't want strip -g to strip it) so I had to recognise .eh_frame by name.

Another way one might think we could have got the same effect is simply by exploiting the behaviour that gave us problems with statics earlier: we don't unbind relocations going via section symbols. We saw above that the relocation used the .text symbol. Maybe debugging relocs always use section symbols? If so, we'd be all clear. Sadly, this is mostly true but it is not always true. If you grep the objdump of a large build tree, you'll find some debugging sections using ordinary symbol relocs. We could imagine handling these with a converse of --prefer-non-section-relocs, which for debugging sections prefers the section-symbol-based relocs. I haven't coded this up in binutils, but it would be worth having; let's add it to our rules.

Meta-references should not be wrapped. They are always local within a file, so force them to use section-symbol-based relocs, which won't be wrapped.

One reason I haven't implemented it yet is that we'll soon see some cases it can't help with. (However, actually I have coded it up in a standalone tool; more below.)

As well as meta-references, there are self-references. Many cases of self-reference are actually, like meta-reference, referencing the internals of a function. For example, on architectures with only absolute branches, a goto would generate a relocation record. Again, we don't want to unbind it so that it gets redirected to the same offset within a wrapper function; that would be meaningless! Modern architectures use relative not absolute branches in the direct case. But once we start doing indirect branches we usually do start needing relocations. Here is some GNU C code that generates a self-reference on many architectures (albeit still not x86-64) by taking an internal address.

#include <stdio.h>
int main(void)
        printf("A function-internal address: %p\n", &&resume);
        return 0;

On 32-bit x86 we get the following.

Disassembly of section .text:

00000000 <main>
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  -0x4(%ecx)
   a:   55                      push   %ebp
   b:   89 e5                   mov    %esp,%ebp
   d:   51                      push   %ecx
   e:   83 ec 04                sub    $0x4,%esp
  11:   83 ec 08                sub    $0x8,%esp
  14:   68 26 00 00 00          push   $0x26
                        15: R_386_32    .text

At the end of the listing we see the rightmost argument to printf, which is pushed first, being formed by a self-referencing reloc against the .text section.

We clearly don't want to unbind this sort of thing. It's tempting to think that we should just let all self-references go undisturbed. But what about this?

int call_me(void *arg)
    if (...) return 0; else return call_me(...);

Or this?

int is_me(void *arg)
    return (arg == is_me);

What should wrapping mean in the case of self-reference via an ordinary symbol? Both of the above are compiled as referencing an actual public definition, via its function symbol, rather than referencing internals via a section symbol. The compiled code for is_me looks like this.

0000000000000000 <is_me>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
   8:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # is_me <is_me+0xf>
                        b: R_X86_64_PC32        is_me-0x4
   f:   48 39 45 f8             cmp    %rax,-0x8(%rbp)
  13:   0f 94 c0                sete   %al
  16:   0f b6 c0                movzbl %al,%eax
  19:   5d                      pop    %rbp
  1a:   c3                      retq 

Intuition suggests that probably the recursive call should be wrapped, but the address take not. But why?

To explore just how tricky the semantics of identity can get, let's divert ourselves to ask: can the is_me function above ever return false when passed itself? On dynamically linked ELF platforms (or in fact any SunOS-style dynamic linking system), the answer is yes, because is_me can be overridden by another object (e.g. the executable or a preloaded library). In such a context, the identity comparison is then done against this other definition, i.e. the function isn't really checking for itself at all. That's a bug: the code really doesn't want to refer to “the active definition of a thing called ‘is_me’”. It wants to really refer to itself, really! To make it reliable under dynamic linking, one would change the code above to instead test against a hidden alias, which can not be overridden.

extern int really_me(void *arg) __attribute__((alias("is_me"),visibility("hidden")));

What about the analogous problem with wrapping? If we really want to reference ourselves, there should be some wrap-proof way to do so. Luckily, in my current objcopy unbind logic, I don't unbind self-references at all.

+              /* Okay, the reloc refers to the __ref_ symbol. Is the relocation site
+               * in the same section as the __def_ symbol? */
+              bfd_boolean is_self_section = nu->def_sym && nu->def_sym->section == isection;
+              bfd_boolean is_self = is_debug_or_eh_sect || is_self_section;
+              if (!is_self) continue;

Here the code treats specially any intra-section references. If we compile with -ffunction-sections, only true self-references will be intra-section. In more general contexts, the code above it will leave alone any calls that are bound to a target within the same section, whether to the same function or not. (Note that in relocatable code, even ordinary symbols are defined relative to a section. The section of the defined symbol is what matters in the test above; the symbol itself could be a section symbol or an ordinary symbol.)

Since we currently exclude self-calls from unbinding, then by analogy with our dynamic linking case, the effect is equivalent to “all self-reference is as if to a hidden alias”. In short, self-reference is non-interposable by --wrap.

This, however, is the wrong thing. In the earlier example of a recursive call, we said that intuitively, we might well want the self-reference to be unbound and (hence) wrapped, at least in some cases. The function is simply using itself; it can happily do so indirectly via the wrapper. I went with the simpler no-unbind behaviour only because I'm lazy and I didn't face such cases in my application, which is wrapping allocation functions. These are non-recursive by nature. However, my application does have code similar to is_me: a function that wants to be able to recognise its own address. Since unbinding that self-reference broke my code—the function ended up erroneously looking for its wrapper's address—I just stopped unbinding self-reference, by adding the is_self_section predicate above.

If that's wrong, what is right? We need a more nuanced understanding of reference. It's the nature of wrapping that a reference to plain f has become ambiguous; we have to resolve the ambiguity one way or another, either to the wrapper or to the real thing. The old linguist's distinction of “use” versus “mention” is a good fit here. Meta-references, which we've already dealt with, are pretty clearly mentions, not uses. Self-knowledge, like with is_me, is also a mention. But there is also self-use; a recursive call would count. Correct wrapping requires telling these apart, so that self-use can go via a wrapper but self-mention can be direct. This is important.

Self-references may be self-mentions or self-uses.

How can we distinguish self-reference from self-use at the object code level? Unlike with interposition in dynamic linking, an ELF-level protected or hidden-visibility self-alias would not help, because --wrap does not pay any attention to symbol visibilities. It happily wraps a reference to a hidden definition, and/or from a hidden UND. We could layer a rule on top, particular to our unbinding tool, say skipping self-unbinding in the case of hidden symbols. However, this is an abuse of hidden and would stop us ever wrapping a hidden symbol, which we might still want to do.

Instead of a protected or hidden alias, there's another alias which is tailor-made for this ambiguity-resolving purpose: the “__real_” alias! It's an alias that is magically defined only when wrapping, and it's exactly what we want. To do self-reference that is not wrapped, this wrapped function f needs to refer to to __real_f instead of just f. It does mean that the function needs to be written in the knowledge that it's going to be wrapped (or might be). If it wasn't written with this knowledge, a custom fixup pass could be done on its assembly or object code.

The program must disambiguate self-mention from self-use, by using __real_ for self-mention, and leaving self-use the default.

Finally, self-reference has yet another case to worry about, which I'll call “self-reference at a distance”. and arises if inlining is involved. It poses a real problem for correct wrapping, even without unbinding. Imagine an is_me-style function that gets inlined into its caller. This will generate an outgoing reference to is_me, a symbol defined elsewhere, maybe in the same file or maybe further afield. Now suppose we decide to wrap is_me. If it is defined in the same file, we will unbind this outgoing reference (it's not a self-reference) and wrap it. If it isn't—say if is_me is an inline and its out-of-line copy is instantiated in a different file—we don't need unbinding; plain old wrapping will divert the reference into the wrapper. Both of these are wrong! It's still a mention, not a use. The function really wants to get hold of its own identity, not its wrapper's identity. The only change is that the reference now appears to be non-self; it is coming from somewhere distant, thanks to the inlining. As before, the solution is that it must refer to __real_f. In short: in the presence of inlining, any self-mentioning function must at source level use __real_ if it is possibly going to be wrapped. The compiler will know how to emit that reference whatever the inlining context.

Here we can further critique our earlier not-quite-right idea of leaving section-symbol-relative relocs undisturbed but always wrapping ordinary-symbol-relative ones. I mentioned that debugging information sometimes makes reference via ordinary symbols. The cases we just considered must use an ordinary symbol reference, both with or without the use of __real_, because it may refer to a definition in another file. In non-inline cases, instead of using __real_ we could imagine asm hackery inside a function to get hold of a section-relative reference to its own start address. In fact, I tried it. But if inlining might happen, this won't work. The moment this asm trick gets inlined into a caller, it will at best generate a reference to the entry point of the enclosing non-inlined caller, which is not right. My attempt generated a completely spurious offset in the middle of the enclosing function's code, which was not fun to debug. (The intersection of self-referencing functions and inline functions is small, so this issue will arise rarely. But I managed to find it.)

What's to say this use-versus-mention distinction is only an issue for self-reference? There might be mentions that are non-self, i.e. one function mentioning another, but not using it, meaning that semantically it should not be wrapped. An example might be a function that inspects the on-stack return address in order to behave differently depending on which function is calling it. If the caller happens to be wrapped, it still wouldn't be correct to test for the wrapper's address range, since the wrapper is never going to be the immediate caller. Since introspecting on one's caller is a kind of metaprogramming, we could bucket this as another kind of meta-reference; either way, it's clearly a mention not a use. This practice is rather perverse and I haven't run into such cases. But the same solution would apply: mentions of a possibly-wrapped function need to reference __real_.

Another issue with meta-levels is that there can be more than two. I have in the past had cause to write functions of the form __wrap___real_f, i.e. linking with both --wrap f and --wrap __real_f. Perhaps surprisingly, this works to an extent—the first wrapper, which appears to call __real_f, does indeed call into the second, __wrap___real_f. Annoyingly though, referencing __real___real_f doesn't work. This is because there never was a genuine __real_f symbol definition; binding of a __real_-prefixed reference relies on a fiction in the linker, and that is not applied recursively to generate a fictional __real___real_f and so on. Instead I resort to finding the real definition through an alias I created specially earlier.

This is suspiciously starting to resemble an alternative way of doing wrapping without --wrap, as I'll cover next.

Solutions 1a′ and 1b′: doing it in standalone tools

At some point it became clear that my objcopy patches weren't going to be mergeable into GNU binutils. The treatment of debugging sections alone is a giant hack—especially the unwind sections, which I could recognise only by name. I did persist with maintaining my patch for a while (or even letting someone else do it) but maintaining a fork of binutils is no fun. Its long build time was a burden for people wanting to try out my project.

I spent ages trying to find a way to get the desired unbinding effect using only stock objcopy and ld, but concluded that it's not possible. While objcopy lets you add and remove and rename symbols, none of these is right. Adding a new symbol won't make relocs use it, and in any case it won't let us add a new undefined symbol, only a defined one. Renaming a symbol simply leaves all the relocs pointing at this now-renamed symbol. And removing a symbol isn't what we want—in fact objcopy will refuse to do it, on the grounds that there are relocation records depending on it. We want to somehow “make undefined” the existing symbol, so that previously bound relocs will become unbound. Even if we could do this by first adding a new undefined symbol (which we can't), we'd need a way to “switch over” relocation records from one symbol to another. Sadly, objcopy gives us no way to manipulate relocation records as distinct from the symbols they reference.

I therefore wrote a really minimal tool, sym2und, which looks for a symbol of a given name in a given .o file, and changes it to be undefined. With that, we can use the stock tools to complete the job by first adding __def_SYM as an alias of SYM (using ld -r or objcopy --define-sym), then changing SYM to be undefined (using sym2und), then renaming the undefined symbol to __ref_SYM (using objcopy again).

What about the static function case (problem 1b)? For that, as before, we need -ffunction-sections and objcopy --globalize-symbol. And we want a standalone tool in place of my patch's --prefer-non-section-relocs option, to ensure an address-taken function is referenced using the symbol name we expect, rather than a section symbol that we don't. This is again quite simple to write as a standalone tool over ELF files; my version is called normrelocs. Just like my patched objcopy, it needs to skip references in debug sections, to avoid hooking meta-references that happen to point to the entry point of the static function.

What about meta-references, self-references and self reference at a distance?

Meta-references via ordinary (non-section) symbols will be disrupted if the symbol is wrapped, because they will keep referencing the old symbol, which becomes undefined and is later resolved elsewhere. We can use the same debugging-specific hack to extend normrelocs with a converse case which flips from-debug references to use the section symbol.

Self-references that are “mentions” need to be directed at __real_, which will work fine (not wrapped). Self-uses will be diverted to __ref_ which mens they will be wrapped, as we intend.

Self-mentions-at-a-distance simply need to use the __real_ symbol. That is never defined locally in the file, so these will not be wrapped, which is correct.

Solutions 1a′′ and 1b′′: an equivalent?

The approaches we just saw worked by very local surgery to single .o files, in order that a later link-time --wrap option would take fuller effect.

There is a way to reduce (but not eliminate) the need for these custom tools, if we don't mind abandoning --wrap and using another linker feature to do the wrapping. It's a bit fragile, because it is sensitive to the order of .o files on the command line, which normally doesn't matter and which build systems and compiler drivers can make surprisingly difficult to control. It's a big hammer, because it suppresses what would otherwise be link errors. And it doesn't escape the need for an earlier fixup pass, both to handle the statics issue (by normrelocs or similar) and also to create an alias serving the purpose of __real_SYM. It does, however, avoid any need for unbinding, since it works even if SYM is already defined and used in the same .o file. So in some ways it's cleaner and shows that the --wrap option didn't really need to exist.

The feature is the -z muldefs command-line option, which tells the linker to ignore the original symbol definition and use a new one. This does not remove any content from the link—content meaning sections, not symbols. It just affects how that content is labelled, i.e. which symbols end up decorating particular positions or ranges within it, and hence how the content becomes interconnected when relocations are applied.

Instead of --wrap SYM after unbinding, we do the following.

  1. add a fresh name for SYM; we choose __real_SYM by analogy with --wrap
  2. create the wrapper definition called simply SYM (not __wrap_SYM), using __real_SYM if it wants, in a file wrapper.o
  3. include wrapper.o before the original definition on the command line, meaning -z muldefs will give it precedence

We use link order and muldefs to take control of SYM, but the code from its old definition still gets included in the link output; it is just “de-symbol'd” since the earlier definition takes priority. But since earlier we created an alternative symbol name for it, we can still refer to it. Unlike --wrap, this approach happily diverts references that were previously bound to that definition—by providing a new definition of the same name, replacing the old one and acquiring its relocations in the process.

Using this approach is quite a disruptive change, if you're used to using --wrap. You now have to pull out the wrappers into one or more separate objects, so they can be specially placed before the “real” definitions in link order. Previously, your __wrap_ functions might be defined in any of the input .o files. To arrange this “placed before” command-line property in a makefile, my best attempt would be for the link target to set something like CC := $(CC) -Wl,wrapper.o (and hope that no other part of the build logic clobbers CC in a conflicting way). This isn't ideal.

I don't know a way to avoid that problem, but we can at least make things more compatible with --wrap, by changing our wrapper.o to be like the wrappers before: define __wrap_SYM and put it anywhere on the command line. Then we augment it with a linker script wrapper.lds, which does have to go first on the command line, but says just the following.

SYM = __wrap_SYM;

This must be placed first in the command line. This brings us two benefits: we can still call our wrapper __wrap_SYM, rather than SYM, and we can have the wrapper logic originate in any object on our command line. We only require the new wrapper.lds to appear before the other, now-wrapped definition of SYM, whichever object that happens to be in. (This is especially handy if the wrapper is in an archive; usually, moving an archive to the front of the command line is not an option.)

What language is the above snippet in? It's the linker script language. There are two main reasons to write a linker script: to provide additional link inputs or definitions (often also doable via the command line, but unwieldy), and to control the memory layout of the final binary (not doable via the command line). These are more-or-less disjoint. The latter is handled by the “default linker script”. The former can be done by naming additional text files in the linker command, each containing declarations in the linker script language as we saw above. (The boundary between these two use cases is a bit fuzzy; I've found that a dropped-in linker script can also contain SECTIONS rules, which are normally the preserve of the default script. From what I've seen, the semantics seems to be that these are concatenated with those of the default script's SECTIONS rules, although I could be wrong.)

You might ask whether we could avoid our pre-aliasing step that created __real_SYM, by instead making the linker script say something like this.

__real_SYM = SYM;   /* let __real_SYM refer to whatever SYM refers to */
SYM = __wrap_SYM;        /* now let SYM refer to __wrap_SYM? */

This seems like a nice pithy definition of what wrapping should do. Unfortunately it doesn't work. Despite the semicolons and equals signs, this isn't an imperative language. Symbol definitions are treated equationally: the defining expressions are all evaluated at the same time. So what the above really does is set SYM and __real_SYM both to equal __wrap_SYM. This breaks the wrapper, making it recurse infinitely.

(In our use case, therefore, we might like have a “predefine symbol” feature, whose defining expressions are evaluated in an “old” context that is then updated, giving a two-stage semantics. Of course that is still limiting; a fuller solution, with arbitrary dependencies between definitions, would require the linker's algorithm to change from a single “build symbol table” pass into something more complicated and with more failure cases—effectively embedding a “proper” programming language, whether imperative or declarative.)

How does our new approach fare with meta-references, self-reference and self-reference at a distance?

As before, it will screw with debug info in the occasional cases where that's relocated via an ordinary (non-section) symbol. So, we really need our version of normrelocs that does the converse rewriting in the case of debug info sections, making them always use section-symbol-based relocations.

With self-reference, again it will wrap unless you reference the __real_ symbol, for both apparent and at-a-distance kinds of self-reference. As we decided earlier, this is the right thing.

Recap of the methods and test cases

We accumulated the following principles.

Uses of a definition should be wrapped, so force them to use an undefined symbol, unbinding reference from definition if both are present in the same file.
Uses of a definition should be wrapped, and only ordinary symbol-based relocs can be wrapped, so force these references to use ordinary-symbol-based relocs, not section-relative relocs.
Meta-references are mentions and should not be wrapped. They are always local within a file, so force them to use section-symbol-based relocs, which won't be wrapped.
Self-references may be self-mentions or self-uses. The program must disambiguate these, by using __real_ for self-mention, and leaving self-use the default.

Meanwhile, we saw various approaches to realising wrapping, which we developed in several stages.

Can we systematically test and compare these against each other? I'll happily rule out testing those stages with asterisks, since they need some kind of manual intervention (and I never wrote a patch to objcopy that will prefer section-based relocs for meta-references). In our tests we simply assume that -ffunction-sections is already in use if desired and any symbol that's to be wrapped is already a global.

That still leaves us many cases! In total there are eight.

a reference is made to code, ...e.g.?
1inter-sectionfrom codeby named defnormal call or address-take or self-mention/use at a distance
2inter-sectionfrom codeby section(nothing realistic I can think of)
3intra-sectionfrom codeby named defself-call
4intra-sectionfrom codeby sectionself-mention; address-taken label
5inter-sectionfrom databy named deffunction pointer initializer
6inter-sectionfrom databy section(almost nothing realistic I can think of)
7inter-sectionfrom debuginfoby named defsome debuginfo references
8inter-sectionfrom debuginfoby sectionmost debuginfo references

We can write a test that is a single assembly file. Semantically, the executable it builds is nonsense and would crash immediately if run. But we just care about what it looks like after linking. The assembly code includes eight references. For each one, we want to see whether it gets wrapped or not.

.globl callee
.globl caller
.section .text.caller
# (1) inter-section, from code to code, by named def
    .quad callee
# (2) inter-section, from code to code, by section
    .quad .text.99callee
.section .text.99callee, "ax"
# (3) intra-section, from code to code, by named def
    .quad callee
# (4) intra-section, from code to code, by section
    .quad .text.99callee
.section .data, "aw"
# (5) inter-section, from data to code, by named def
    .quad callee
# (6) inter-section, from data to code, by section
    .quad .text.99callee
.section .debug_info.text.99callee, "",@progbits
# (7) inter-section, from debuginfo to code, by named def
    .quad callee
# (8) inter-section, from debuginfo to code, by section
    .quad .text.99callee

The correct outcome, according to our principles, is that the all the odd number cases (named definitions) except 7 (debug) get wrapped, as do 2 and 6 (inter-section but non-debug) but not 8 (inter-section but debug) or 4 (intra-section by section symbol, so assumed to be a self-mention). So, the wrapped should be 1, 2, 3, 5 and 6.

We can test it for each of our six approaches. To do a test run, we will assemble it and link it, generating a nonsense executable which we interrogate using gdb.

as -L -o test.o test.s
echo 'SECTIONS { .text : { *(*) } }' > allsects.lds # script to link into one big blob
ld  -Ttext=0x0 -e 0x0 -Bstatic -o test test.o allsects.lds

If our file is linked correctly, we should find our test cases labelled in ascending address order, contiguously from 0x8.

$ objdump -t test | grep '^0.*L' | sort -n -k1,1
0000000000000008 l       .text  0000000000000000 L1
0000000000000010 l       .text  0000000000000000 L2
0000000000000018 l       .text  0000000000000000 L3
0000000000000020 l       .text  0000000000000000 L4
0000000000000028 l       .text  0000000000000000 L5
0000000000000030 l       .text  0000000000000000 L6
0000000000000038 l       .text  0000000000000000 L7
0000000000000040 l       .text  0000000000000000 L8

Now we can script gdb to print out the values of all eight references after linking.

$ echo >/tmp/gdbscript <<EOF
file test
set $n=1
while $n<=8
 eval "x/1ga L%d", $n
 set $n=$n+1
$ gdb -q test -x /tmp/gdbscript | tail -n+2 | column -t

On a null run, i.e. of method Z, we get the following.

0x8   <caller>:     0x18  <callee>
0x10  <caller+8>:   0x18  <callee>
0x18  <callee>:     0x18  <callee>
0x20  <callee+8>:   0x18  <callee>
0x28  <callee+16>:  0x18  <callee>
0x30  <callee+24>:  0x18  <callee>
0x38  <callee+32>:  0x18  <callee>
0x40  <callee+40>:  0x18  <callee>

In other words, all references go to the callee and nothing is wrapped. If we test each of our other approaches in turn, we get the results we would expect. Approach A also does no wrapping because without unbinding, intra-file references aren't wrapped. Approach B starts to get it but covers only the easy cases. Approach C gets all except number 3 correct, since it ignores all self-reference whereas case 3 (named symbol) is a self-use that should be unbound. Approaches D and E both wrap exactly the cases our principles agree should be wrapped: 1, 2, 3, 5 and 6. E does it using the muldefs trick, avoid --wrap entirely.

testD  0x0   <caller>:         0x20  <__wrap_callee>
testD  0x8   <L2>:             0x20  <__wrap_callee>
testD  0x10  <__def_callee>:   0x20  <__wrap_callee>
testD  0x18  <L4>:             0x10  <__def_callee>
testD  0x28  <L5>:             0x20  <__wrap_callee>
testD  0x30  <L6>:             0x20  <__wrap_callee>
testD  0x38  <L7>:             0x10  <__def_callee>
testD  0x40  <L8>:             0x10  <__def_callee>
testE  0x8   <caller>:         0x0   <__wrap_callee>
testE  0x10  <L2>:             0x0   <__wrap_callee>
testE  0x18  <__real_callee>:  0x0   <__wrap_callee>
testE  0x20  <L4>:             0x18  <__real_callee>
testE  0x28  <L5>:             0x0   <__wrap_callee>
testE  0x30  <L6>:             0x0   <__wrap_callee>
testE  0x38  <L7>:             0x18  <__real_callee>
testE  0x40  <L8>:             0x18  <__real_callee>

The test is packaged as a single makefile that you can get here, for which you will want my elftin (that contains normrelocs and sym2und) and patched binutils repositories.

Problem 2 (next time, mostly): wrapping cross-DSO references

Comparing muldefs with --wrap, another difference is that with --wrap, the __real_ symbol was never actually defined; it was just a magic alias that could be used to refer to the original unqualified symbol. The __wrap_ symbol did exist, so, in the final executable we have symbols __wrap_SYM and SYM. In contrast, using the muldefs symbol replacement approach, the final executable gives us SYM (our wrapper) and __real_SYM (which we created as a bona fide symbol definition, aliasing the original function). In our “more compatible” approach above we also get __wrap_SYM, but only because we made it an alias of SYM, which isn't essential.

Of these two outcomes, the latter symbol replacement case plays much more nicely with dynamic linking, since it gives the wrapper the identity of the original function SYM. In other words, the wrapped SYM, if exported, will naturally be what an external DSO binds to when it makes reference to SYM. That's always what we want. (Meta-references never cross DSOs, and self-mentions need to use __real_ as before.) By contrast, with ordinary wrapping, SYM would be left referring to the unwrapped symbol, and that will be what gets exported—so cross-DSO calls won't get wrapped.

As hinted at above, it's also useful to do wrapping only at dynamic link time, without using --wrap at all but relying on LD_PRELOAD or, more generally, link order. Of course this only suffices when wrapper and wrappee are in different DSOs. Even then, it is hard to arrange and even harder to do reliably... a topic for another time.

Is this sensible?

We've taken it as a given that messing with symbols at link time is a reasonable thing to want to do. Yet also, we saw pretty early on with static functions that without going back to source, we were limited in what we could do. Even worse, we couldn't be sure that what were doing is even meaningful in a source-level sense. So maybe all this is just a terrible idea?

The issue comes down more generally to whether link-time artifacts—relocatable object code, essentially—have any meaning to the programmer. Are they a notation that can legitimately be manipulated?

Traditionally, one could feel reasonably comfortable doing this because compilers were less clever than they are today. Long ago, most programmers worked close enough to the metal that object code had a place in their mental model. To systems programmers today, this largely remains the case, but to application programmers, it doesn't. Meanwhile, to compiler developers, it is very inconvenient to have this extra intermediate level of abstraction hanging around, at which programmers might want to intervene and mess with the source-level semantics, which to a compiler author are all that matters.

However, since we as “end programmers” often want to plug together code coming from different languages or different toolchains, it should be clear that the compiler author's take isn't really all that matters. Plugging stuff together is a case of integration, and by its nature, integration often involves stepping beyond the view of any single compiler or language implementation.

I've written before, with colleagues, about how linking is used in semantically significant ways, not just as a mechanism for separate compilation. However, POSIX did not standardise linker interfaces and they continue to be unportable, obscure and unreliable. (Witness my colleague Laurie Tratt's recent travails embedding a binary blob into a built binary, which is still only portably doable via a compiler or, slightly less portably, an assembler.) Meanwhile, link-time optimizers (LTO) routinely break link-time interventions of the kind this post has been describing, including symbol wrapping. These optimizers assume that they control the entire path from source down to the final binary. The approach of the LTO systems I've seen is to issue fake intermediate object files containing opaque compiler-specific blobs (serialized IR). At link time, the compiler is re-run in disguise (as a linker plug-in) to reprocess these blobs, discarding the fake contents (which could be used by a non-LTO-aware linker, but normally get ignored). This mirrors the compiler author's perspective: only source semantics matter, and “the usual” intermediates can validly be faked out as a compatibility hack. Of course, if you do funky stuff like symbol wrapping on these intermediates, it will not affect the compiler blobs and so will be discarded.

I believe that object code has value as an integration layer. It has deliberately moved away from the details of source-level semantics, but exposes a “wiring and plumbing” abstraction. This is a convenient level for humans to express compositions—including wrapping, but definitely not limited to that. (My PhD thesis did rather more elaborate stuff at link time, to adapt between binary interfaces. It was influenced by a nice earlier seam of work on linking and configuration languages such as Alastair Reid's Knit.)

Even better, object code actually doesn't lack semantics at all. There is invariably an established, standard and elaborate system for describing its structure and meaning, relating it not just downwards to the machine (a fairly easy step) but upwards to source language features. This system is called debugging information. This is a misnomer since it has many applications besides debugging. All this could certainly be cleaner and tidier—for example, our latent criteria for distinguishing “meta-references” above could and should be a lot more explicit. Also, renaming a symbol in an object file usually won't rename a definition in the corresponding debug info; again, it should. And there is a lot more semantic information that could usefully be included. Nevertheless, the practice of bundling some such information is well established. (So is the practice of unbundling it and throwing it away, unfortunately.) Although many language implementers no doubt see this as an inconvenience that gets in the way of fancy optimisations. a world with good tools for working with object code is a world closer to realising the goal of software as workable ‘stuff’. This is precisely because, counterintuitively, that world is necessarily one where no single tool is so over-powerful as to exclude others. (Readers of Ivan Illich will recognise this idea, of course.)

[/devel] permanent link contact

Powered by blosxom

validate this page