Rambles around computer science

Diverting trains of thought, wasting precious time

Mon, 12 Dec 2022

Interoperability: what's rich is still poor

Is there good interoperability with C from C++? I could have named any two languages, and for most pairs the answer would be a clear “no”. But in the case of working with C code from C++, the situation is a lot better than normal. In this post I'll observe why it's still not good, via the lens of a particular programming task I carried out recently. I'll end with some notes on what I think good interoperability ought to mean.

My task was writing some code to target a C-style plugin interface. (It's the plugin interface of the GNU linkers, as it happens.) It follows a common C style, which is to write a bunch of callbacks and ‘register’ them.

enum ld_plugin_status new_input_handler(const struct ld_plugin_input_file *file)
{
    fprintf(stderr, "new input handler called ()");
    // ...
    return LDPS_OK;
}
enum ld_plugin_status cleanup_handler(void)
{
    fprintf(stderr, "cleanup handler called ()");
    // ...
    return LDPS_OK;
}
...
enum ld_plugin_status onload(struct ld_plugin_tv *tv)
{
...
    register_new_input_handler(new_input_handler);
    register_cleanup_handler(cleanup_handler);
...
}

If these operations need to keep any private state, we have to keep it in globals or at least file-level variables—they can be static, a.k.a. private to the file. (A slightly more sophisticated C callback idiom threads an extra void* argument to each callback, with the pointer supplied at callback registration time. This allows the private state to be dynamically allocated. But in this task that's not the case.)

Now let's imagine I have a family of plugins with some commonality. Maybe their handle_foo() is shared. Or maybe one handle_foo() needs to call the basic handle_foo() and then do some additional stuff before or after. The extended plugin should be defined in a separate file, but it can easily refer back to the base one. You can do this all with straightforward link-time plumbing: define a new extended_handle_foo() calling handle_foo(), say.

enum ld_plugin_status extended_new_input_handler(const struct ld_plugin_input_file *file)
{
    fprintf(stderr, "extended new input handler called ()\n");
    // ...
    // now just do the basic version
    return new_input_handler(file);
}

If we need some extended state—a new bunch of variables—then again, our ‘extended’ handler can just be accompanied with some additional file-level variables; and we use link-time reference to get at the basic ones if we need them.

What I've just enumerated are the “static-state” (link-time) analogues of some standard object-oriented patterns: method overriding (defining extended...handler()), delegation to a superclass or similar (linking back to the basic handler), holding state in fields a.k.a. member variables (the file-level variables) and extending that state in a derived object (new file-level variables).

But one quirk is that in this scenario there is no notion of class... each instantiated plugin “object” is implicitly one of a kind. It has its own code and its own data; no way is provided, or needed, to share the same code among many distinct instances of the state. Each piece of code binds directly, at link time, to the state it uses, and what we are building is a single object—albeit perhaps by composition with a base definition.

Since our C extended-plugin code is conceptually extending an object, not a class, there's still one set of state, albeit now split between the file-level variables in the 'base' and 'extension' parts. So when we want an actual plugin that we can run, we instantiate either the base or extended version, by choosing what code we link together. Typically this logic exists in makefiles, say for building plugin-base.so or plugin-extended.so by invoking the linker with different combinations of object files.

C++ classes provide a nicer notation for this “base” and “extension” idea. Maybe we should use C++ to write our plugins? If we did this, our extended plugins could explicitly declare their relationship with the code being extended; there would be no need to invent a naming convention like extended_blah.

struct extended_plugin : basic_plugin
{
...
    new_input(const struct ld_plugin_input_file *file)
    {
        fprintf(stderr, "extended new input handler called ()\n");
        // ...
        // now just do the basic version
        return this->basic_plugin::new_input_handler(file);
    }
};

We could also explicitly declare and group the state that is basic versus the state that is extended, in the member variables of the different classes. And instead of co-opting the linkage namespace and using “static” (local symbols) for the private stuff, there are language features tailored to the specific use cases (protected or private or public).

The semantics of the whole thing are a bit different though. Rather than resolving references at link time, in C++ the binding to plugin state must be dynamic if we're to use these nicer notations. Additional pointers, of course named “this”, are threaded through each member function, pointing at the state to be used. This is to allow multiple instances of a given class. But it means a member function cannot have a signature that is compatible with the plain C function that our callback registration API expects; the extra argument gets in the way. More generally, the fallout of C++'s dynamic binding assumption is that the “static objects” style of interface cannot be modelled using objects in C++. All objectwise interactions need to involve an explicit “this” pointer.

This hinders stylistic interoperability with our static-state plugin interface. We can't simply code against this interface using these C++ features! To use C++ as intended, we would have to change the interface profoundly. For example, in the C code, although a notion of “object state” does still exist conceptually, it is not necessarily localised in memory, so there is no single pointer to it—no natural this pointer. Rather, the linker takes care of placing the various parts of it and wiring it all together; the state might be spread around in memory.

What if we really really want to use these C++ language features to write a plugin usable by the provider of this C-style plugin API? It's not easy, and most people would just fall back on writing C-style code in C++—or maybe just forget C++ entirely.

I'm not most people, so I was not so easily dissuaded. I can see at least two ways to do it: we can package the C++ code (client) in the native style of C (API), or we can package the C code in the native style of C++. Either way, I want to automate the process so that the adaptation burden largely doesn't fall on the programmer. Let's talk about the first approach here: packaging a C++ plugin to fit the C-flavoured style of the API.

To get a C-style “static” “direct binding” plugin, from a C++ class-based implementation which necessarily indirects through a “this” pointer, we need to get plain function pointers that privately access statically allocated state. We can use libffi's closure API for this. For each method, let's generate a closure that can be passed as a plain function pointer. We want to be able to write code something like this.

    MyClass obj;
    auto foo_p = generate_closure(&MyClass::handle_foo, &obj);
    plugin_api_register_foo_handler(foo_p);

Notice the use of a C++ pointer-to-member function. A pointer to member function isn't a closure because it doesn't come with an object. To call it, we have to supply the object specially, via something like this.

    (obj.*handle_foo)(arg);

In our case, once we've generated our closure, we can just do

    foo_p(arg);

because we have bound the handle_foo function together with its object obj. This works by gnerating a trampoline that bakes in obj as the this argument to the member function, pointing to the object that we identified when we generated the closure. In our case this is a statically allocated instance of MyClass. This does mean our plugin state now resides in a localised object, rather than a rag-bag of global variables; we have had to group our state into a single link-time object definition, but we can still use class derivation to split the declaration of that state across multiple (header) files.

If you can track down the mailing list posts (1, 2) that document libffi's closure API, it's not too hard to get closure generation working. It does, however, involve enough boilerplate that you wouldn't want to write it once per member function of even a smallish plugin. The good news is that in C++ it is possible to use template metaprogramming to generate the libffi boilerplate needed for this. My solution is now in libsrk31c++, my rag-bag library of useful its of C++ code. (I found a page by Murray Cumming especially useful to get my head around tuples and “argument packs”.) I'll walk through it briefly.

We use a class template to group together all the various stepping-stone definitions we'll need, all parameterised by the class whose method we're turning into a closure, and the signature of that method.

/* Closure glue code specific to a particular class type and member function
 * signature is generated by this template. */
template <typename ClassType, typename RetType, typename... MemberFunArgs>
struct ffi_closure_s
{
    /* This is the type of (a pointer to) the member function that we
     * want to generate a closure for. It's any member function. */
    typedef RetType (ClassType::*MemberFunPtrType)(MemberFunArgs...);
...

The main piece of code we want to generate is a function that conforms to libffi's expectations. All generated closures do a similar thing: they slurp their arguments from the calling context's stack/registers, pack them into a block of memory, create an array of pointers to that memory, and dispatch to a function that looks like the below. That function then must unpack them and do the actual thing the closure is intended to do, e.g. our member function call, then write the return value on the end of another pointer. It's a lot of packing/unpacking.

    template <MemberFunPtrType member_fun_ptr>
    static void
    /* FN will receive the arguments
     * CIF (the original cif),
     * RVALUE (a pointer to the location into which to store the result),
     * AVALUE (an array of pointers to  locations holding the arguments) and
     * DATA (some user defined data that the callee can use to determine what
     * do do about this call.)
     *
     * It might look like this, to dispatch to a concrete C++ member function.
        int ret = reinterpret_cast<myclass *>(data)->myfunction(
            *reinterpret_cast<float*>(avalue[0]),
            *reinterpret_cast<bool*>(avalue[1]));
        memcpy(rvalue, &ret, sizeof ret);
     */
    the_fun(ffi_cif *cif, void *rvalue, void **avalue, void *data);
...

To unpack arguments from the array of pointers into our actual bona fide C++ member function call, we use some template magic. First we generate a std::tuple from the array. Doing so is deceptively simple thanks to std::tie and the miracle of the ‘...’ pattern expansion operator. Our caller will instantiate ‘Is’ with an ascending sequence of integers from zero. (Of course, there'll be a way to generate that too.)

    template <std::size_t... Is>
    static
    std::tuple<MemberFunArgs...>
    get_tuple(void **avalue, std::index_sequence<Is...> seq)
    {
        return std::tie<MemberFunArgs...>(
            *reinterpret_cast<MemberFunArgs *>(avalue[ Is ])...
        );
    }

In the above, at first it's a bit mysterious how seq gets expanded in unison with MemberFunArgs. I think this is just the semantics of ‘...’-expansion of expressions with two or more template argument packs (here Is and MemberFunArgs)—they get expanded in lock-step.

Now we need a helper that can call a function given a tuple, i.e. make it into a bona fide argument pack. Again the ascending sequence of integers helps us, this time getting the nth item from the tuple. Notice the use of std::index_sequence to generate the needed ascending sequence of integers.

    template <std::size_t... Is>
    static
    RetType call_member_fun_with_tuple(
        ClassType *obj,
        MemberFunPtrType member_fun,
        const std::tuple < MemberFunArgs ... >& tuple,
        std::index_sequence<Is...> seq
    )
    {
        return (obj->*member_fun)(std::get<Is>(tuple)...);
    }

Now we plumb these together into the actual function we promised in the forward declaration above. (Wart: the code we saw above doesn't actually work as a forward declaration! Forward-declaring templates is clearly beyond my C++ pay grade right now, so in the real code I skip the forward declaration... but I hope it was useful for exposition.)

    template <MemberFunPtrType member_fun_ptr>
    static void
    the_fun(ffi_cif *cif, void *rvalue, void **avalue, void *data)
    {
        ClassType *obj =  reinterpret_cast<ClassType *>(data);
        *reinterpret_cast<RetType*>(rvalue) = call_member_fun_with_tuple(
            obj,
            member_fun_ptr,
            get_tuple(avalue, std::index_sequence_for<MemberFunArgs...>()),
            std::index_sequence_for<MemberFunArgs...>()
        );
    }

If we're dynamically creating closures, how do they get destroyed? It seems reasonable to use std::unique_ptr. One quirk here is that in the libffi API, the closure pointer (the start of the allocated blob) is distinct from the code pointer (the actual callable instructions). It's the closure pointer we need to free, but it's the code pointer that is most useful to client code. For now, we abuse the deleter... we use the deleter object to remember the closure pointer and any other state we need. Then the unique_ptr we give to clients can point directly to the function, and we can give it a pointer-to-function type.

    struct closure_deleter {
        ffi_closure *closure;
        std::unique_ptr< vector<ffi_type *> > typevec;
        std::unique_ptr< ffi_cif > p_cif;
        closure_deleter() : closure(nullptr), typevec(nullptr), p_cif(nullptr) {}
        closure_deleter(ffi_closure *closure,
            std::unique_ptr< vector<ffi_type *> > &&typevec,
            std::unique_ptr<ffi_cif>&& p_cif)
         : closure(closure), typevec(std::move(typevec)), p_cif(std::move(p_cif)) {}
        void operator()( RetType(*fp)(MemberFunArgs...) ) const
        { if (closure) ffi_closure_free(closure); }
    };

Now we actually have the ingredients necessary to call libffi and set up a brand new closure, passing the relevant template instance that generates our the_fun.

    template <MemberFunPtrType member_fun>
    static unique_ptr< RetType(MemberFunArgs...), closure_deleter > make_closure(ClassType *obj)
    {
        RetType(*fp)(MemberFunArgs...);
        ffi_closure *closure = reinterpret_cast<ffi_closure*>(
            ffi_closure_alloc(sizeof (ffi_closure), (void**) &fp)
        );
        ffi_status status;
        auto p_atypes = make_unique<std::vector<ffi_type *> >(
            (std::vector<ffi_type *>) {ffi_type_s<MemberFunArgs>::t()...}
        );
        auto p_cif = make_unique<ffi_cif>();
        status = ffi_prep_cif(&*p_cif,
            /* ffi_abi abi */ FFI_DEFAULT_ABI,
            /*unsigned int nargs */ p_atypes->size(),
            /* ffi_type *rtype */ ffi_type_s<RetType>::t(),
            /* ffi_type **atypes */ &(*p_atypes)[0]
        );
        if (status != FFI_OK) return nullptr;
        status = ffi_prep_closure(closure, &*p_cif, &the_fun<member_fun>, obj);
        if (status != FFI_OK) return nullptr;
        return std::unique_ptr< RetType(MemberFunArgs...), closure_deleter >
            (fp, closure_deleter(closure, std::move(p_atypes), std::move(p_cif)));
    }
};

The code is a bit rough and ready. I'm not sure it's free of resource leaks, and in our rag-bag of state on the deleter, since we need the cif and type info to live as long as the closure itself lives, there are rather extravagantly two separate heap allocations. Anyway, it'll do for now and is much nicer than using libffi directly!

What have we really learned from the above? Even though on paper, C++ and C have “good” “interoperability”, the facilities one might expect for taking an interface defined in one language and using it from the other are not there “out of the box”. We just had to build one for ourselves, and the result looks quite outlandish. Even I would admit that dynamically generating trampolines using libffi is pretty hairy and not very efficient.

I mentioned a second, dual way to do it. This would work instead by fabricating a C++ view of the plain-C state, by generating a class definition that includes the relevant state and functions, and using a linker script to group the state contiguously so that its starting address can be the this pointer and its linked layout matches the struct-level layout of the class definition. That's even more hairy and I haven't attempted to make it work yet, although it's the sort of thing my dwarifdl project can help with, specifically the dwarfhpp tool that can generate C++ class or struct definitions from binary interfaces (albeit not covering this particular case yet).

I also mentioned that calling the original API “C-style” is an abuse—it's valid C++, and other styles are common in C. I'd go so far as to say that languages are not the most illuminating way to look at interoperability problems; we should think about style instead. Style is about choices; languages are about hard rules. I'm caricaturing slightly, but a PL-minded person would observe the mess we saw above, with static versus dynamic “object” notions, and say that the answer is to define a new language in which we eliminate the in-hindsight-foolhardy distinction between statically and dynamically created state. Gilad Bracha—there he is again!—has done this in Newspeak, for example. One can quibble about how best to eliminate that distinction (and a long time ago, before I had met Gilad and also before I had learned to write even somewhat graciously, I did). The language creator's view is one of identifying the bad pattern and then forbidding it! This is natural because doing such things is within a language designer's power.

My take today sidesteps most of that. Rather than eliminating distinctions, I want to embrace them. I am fine with distinctions existing as long as I can easily bridge between them. That is almost never the case today! Yet this change of viewpoint is necessary if we're to have any hope of genuinely good interoperability. Firstly, we need to stop telling everyone to use a different language! Secondly we need to sanction actively the need for bridging, or integration, and for making it easy, rather than eliminating it. We need the maturity to accept that yes, different languages will come along, and they will embed distinctions as their creators see fit, wise or otherwise. In general, even when two languages are concerned with “the same” conceptual distinction, they are still likely pick two somewhat different lines to split it down. For example, many languages have a notion of “constant” or “non-mutable” data, but no two are ever exactly alike in what you can do with them. Trying to define diversity out of existence by creating a perfect language is an unwinnable game of whack-a-mole. We need to embrace diversity.

If embracing “found” diversity is step one, what is step two? It's about asserting your right to be different! In the main part of this post I hope I convinced you that the apparent “good” “interoperability” between C and C++ isn't really good at all. You can call any C from C++, but not vice-versa... and even when you do it, you do it by writing C! Good interoperability would instead mean writing unapologetic, idiomatic C++. Then there needs to be a way for the system to map it to the other interface, here the C-style one. This is basically never possible at present, for any pair of languages. For example, many FFI approaches, such as Python CFFI, effectively do the same thing as with C and C++: they embed C in the higher-level language. Although it may be fractionally less painful to write the resulting Python-style C than writing the C code directly, this still isn't interoperability! It's just C in disguise. “Including one language in the other” is always is a poor person's interoperability. It has proven successful for C++ because the alternative is even worse. Step two means going beyond that. When a coding task involves using code written in another language, we should be able to code in the native style, not just its surface syntax.

For a preview of what this “stylistic interop” might look like for Python, you could have a read of the VMIL workshop paper Guillaume Bertholon and I wrote a few years back, about Guillaume's very impressive internship project with me at Kent. The concept is a CPython extension module that makes native libraries accessible in a natural Pythonic style, with no FFI code required. Among many smaller problems, the main problem right now is that to maintain reference counts, we have to instrument the native code with a pointer write barrier; that is neither convenient nor fully reliable, and it slows things down more than necessary. However, I have some ideas on how to do better, and am very interested in reviving this strand of work—please get in touch if interested. Prospective PhD students might like to know that I do have a funded opportunity (caveat caveat! it's harder to fund non-UK students with these but it can sometimes be done).

[/devel] permanent link contact


Powered by blosxom

validate this page