Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 10 Oct 2022

Understanding C99 inlines

For a long time I struggled to remember the rules for using inline in C. I think I've cracked it now, though. As often with C, the trick is to think about how it's implemented.

It doesn't help that in C++, the programmer-facing semantics of inline are much more straightforward. If you want to hint that a function should be inlined, put inline (or, for a member function, define it within its class).

inline int foo() { return 42; } // this is C++, not C!

The toolchain takes care of the rest. That's because C++ implementations demand a smarter linker that can provide “link once” semantics, sometimes called COMDAT, for inline functions. Each compilation output includes its own out-of-line copy of the function, and all but one are later thrown away by the linker.

In C99 and later, inline has been specified, but in such a way that the linker need not support COMDAT. The resulting rules are a bit weird—they care about cases where we have multiple declarations of the function, and whether we use inline and extern consistently across each of them. We can still write things like this....

inline int foo() { return 42; } // also C! but semantics different from C++...

... but what they mean is different than in C++. In what follows, I'll dig into this in rather too much detail.

C inlining

The meaning of our snippet above is valid C, but its meaning varies depending on what else appears in the same translation unit. To quote from the C11 spec....

If a function is declared with an inline function specifier, then it shall also be defined in the same translation unit. If all of the file scope declarations for a function in a translation unit include the inline function specifier without extern, then the definition in that translation unit is an inline definition. An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.

The above is telling us that if I write

inline int foo();
inline int foo() { return 42; } // also C! but semantics different from C++...

... it's different from

int foo();
inline int foo() { return 42; } // also C! but semantics different from C++...

... and perhaps also different from the following.

extern inline int foo();
inline int foo() { return 42; } // also C! but semantics different from C++...

There is a logic to all this. If we write inline in a function's declaration in a given compilation unit, it's reasonable that we have to provide a definition for it too. We can't inline something if there's no definition. The question is whether that definition is also emitted for out-of-line use. In C++ the answer is “always, although it may get thrown away later”. In C the answer is “it depends on what is declared”.

We can have one or more declarations for the function in the translation unit. Let's consider two cases. First, there's the case where we always put inline on the prototype. That means none of the declarations can be mistaken for a normal externally-visible function. And hey presto, this does indeed mean that such a compilation unit “does not provide an external definition for the function”. The definition we provide is only for inlining.

However, the standard also tells us that this “does not forbid an external definition in another translation unit”. And in fact, the compiler is allowed to assume one exists! As usual, it doesn't have to inline the function in the generated code; it could generate an out-of-line call. So you'd better have another compilation unit that does things differently....

This is where it helps that standard also tells us that “a file-scope declaration with extern creates an external definition”. In other words, by including a prototype declared extern, we instantiate the inline in that translation unit. Of course, for any particular function we should only do this one in one translation unit, otherwise we'll get “multiple definition” link errors.

How do we normally use all this? Typically, we put the inline definition in a header file, and then in a single .c file we include an extern prototype to instantiate the out-of-line copy in that file. Hey presto, we have avoided COMDAT linking, by forcing the programmer to decide which compilation unit gets the out-of-line copy. That's the main idea.

It's pretty confusing though. Normally extern on a declaration means “not defined in this compilation unit”. By contrast, on an inline function it means the opposite. Even aside from that, the whole set-up is a bit wacky in that it's relying on the inconsistent use of decorators to carry information. Normally we'd perceive an inconsistency like this as a code smell and expect the semantics to be, if not an error, then as if the many declarations were somehow merged into a single canonical signature that is “what you could have written”. But here we are required to use multiple signatures just to convey the intended semantics.

One final quirk is that there is also static inline. This is much more straightforward because although it might generate an an out-of-line copy, it would be a local symbol specific to the file in question, so does not cause link-time complexity about missing or multiple definitions. It's a good option if you just want some quick utility functions, but risks code bloat if the functions are large and/or often-used.

Adding GNU C

Of course, things get even more complicated because before C99, GNU C introduced inline with slightly different semantics. These can be seen in a third case where we mark a definition as extern inline (above it was just a declarationt where we put extern). In GNU C, this means the definition is only available for inlining, never for emission as a visible symbol. It's in fact just like prototyping it as inline in C99 and never using extern.

These old-style GNU semantics can still be requested of GCC by a command-line option (-fgnu89-inline), even when ISO dialects are requested otherwise. They can also be chosen on a per-function basis using the gnu_inline attribute. If __GNUC_STDC_INLINE__ is defined, the file-level default is the “standard” semantics, otherwise we get the GNU-style one.

It's common to see codebases use extern inline combined with gnu_inline and always_inline to get the effect of a macro: the inline function is always inlined and never emitted as an out-of-line function. (However, it turns out that even with these GNU semantics, if we have an “extern inline” definition that we elsewhere also declare as “[regular non-extern] inline”, this “only for inlining” property goes away, and an out-of-line instance is generated.)

A pattern that doesn't work

All this complicates how we deal with our header files. Ideally we'd have one set of header files, which we can freely include where we need them, and the issue of where to instantiate out-of-lines is a matter for .c files only. That ideal mostly works: we can use extern inline in a single .c file to do the instantiation.

However, some more complex scenarios turn out not to work. What if we want a function to be “opportunistically inlinable”: some compilation units will have an inlinable definition to use, but others won't and should emit an out-of-line call. This might be because the definition itself is change-prone, so is better left in a private header but not pushed to clients, who should see only the out-of-line prototype in a shared public header. Makes sense? Unfortunately this shared public header sets up an unwinnable situation: if its declaration is marked inline then this will be wrong for the external compilation units, at best generating a spurious compiler warning (inline declaration but no definition). If the public declaration is not marked inline, then all the “private” compilation units, which do include an inline definition, will instantiate the out-of-line copy, causing “multiple definition” link errors later. The only solution I know to this is conditional compilation: define a macro IN_MYPROJ in the private codebase, and prototype the function like so.

#ifdef IN_MYPROJ
int myfunc(int arg);

External clients then won't know it's inlinable, but a private header can define the function inlineably and a (unique) internal compilation unit can instantiate it in the usual way.

A test suite

I wrote a single-file “test suite” that elaborates the space of inline and extern inline usage and tests what a given compiler does with all of them.

Consider a function with three declarations: the one on the definition, and two standalone ones. Each of these can be independently set to inline, extern inline, or unqualified. This creates 27 distinct cases. In each case, we test at run time whether the function really was inlined, at a test call site (using some funky introspection from within the call). In cases where we expect an out-of-line copy not to be generated, we generate one in assembly, so that any additional copy would cause a link error. (This applies only to some of the GNU cases and the ISO “inline, inline, inline” case.) In cases where this copy is not expected to be called, because inlining should happen at our call site, we give it a body that will abort with an error.

In the actual test file, which you should take a look at if you've got this far, the cases are heavily macroised so they really just look like counting up in a weird ternary number system where the digits are inline,   (empty) or extern inline.

CASEn(1, inline, inline, inline, assert(!OOL_EXISTS); assert(!AM_INLINED); /* UND ref created! */ )
CASEo(1, inline, inline, inline, assert(!OOL_EXISTS); assert(AM_INLINED);)
CASEn(2, , inline, inline, assert(OOL_EXISTS); assert(!AM_INLINED); )
CASEo(2, , inline, inline, assert(OOL_EXISTS); assert(AM_INLINED);)
CASEn(3, extern inline, inline, inline, assert(OOL_EXISTS); assert(!AM_INLINED); )
CASEo(3, extern inline, inline, inline, assert(OOL_EXISTS); assert(AM_INLINED);)

Optimization affects the expected semantics, so there are actually 54 cases: each of these 27 tests is instantiated once with optimization turned on at the call site (the “o” case) and once without (the “n” case). We make the simplifying assumption that, since our test inline functions are small, optimization implies that it will be inlined. (I did not attempt to test the GNU always_inline attribute, but its behaviour is basically to swap the expected behaviour from “unoptimized” to “optimized”.) In eleven of the 27 pairs, the expected behaviour is different with old GNU-style semantics (e.g. from-fgnu89-inline) than with ISO semantics, so we use conditional compilation to assert the right things.

If 54 cases sounds like a lot, then arguably they are a bit redundant. Of the 27 pairs, one is degenerate (no inlining at all), and of the others, arguably those that duplicate a set of qualifiers between two declarations are redundant. For simplicity I chose to expand them all out anyway.

The test code itself is pretty unportable, so do read the caveats at the top of the file, and take care that you build it with the necessary link options to allow the funky introspection to work (principally -Wl,--export-dynamic).

I should add that my motivation for fully getting my head around all this when I found that inline semantics in CIL were subtly broken in some cases. In short, CIL's output was not reproducing function declarations faithfully with respect to the input, so it could unwittingly change the inlining behaviour of code as it passed through. I've fixed this in my personal branch, to which I have also added my test case. It's on my list to contribute a lot of my CIL improvements back to mainline. I have a lot of this to do....

[/devel] permanent link contact

Powered by blosxom

validate this page