Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 21 Sep 2016

Project suggestion: run-time type information in the FreeBSD kernel

It's past time I wrote up some Part II project suggestions. This post is about a new one. See also one from last year about a debuggable implementation of OCaml, also my suggestions from 2014, another from 2014 and other standing suggestions.

My liballocs project is a library and toolchain extension that allows tracking of run-time allocation types in large C programs, precisely and efficiently. On top of this, the libcrunch system uses this information to do run-time type checking, such as checking that a pointer cast matches the type of the actual pointed-to object. The idea is to make C coding a lot more easily debuggable, by providing clean failure messages at run time, rather like ClassCastException or ArrayIndexOutOfBoundsException in Java. There are various other applications in debugging and program comprehension, since liballocs effectively adds an oracle capable of answering the question “what is on the end of this pointer?”.

Both of these systems have been developed for user-space code but could usefully be applied in the kernel. The toolchain extensions, which gather type information from binaries and analyse allocation sites in C source code, should work with kernel code mostly unchanged. However, the liballocs runtime is written against user-level interfaces and cannot be used as-is.

There are two obvious approaches to creating a liballocs runtime suitable for the kernel.

One is simply to modify the existing liballocs runtime to support kernel-side interfaces. This should yield a liballocs.a library that can be linked into the kernel, adding code to observe the kernel's allocations and maintain type metadata on them as it executes. The main differences will be how the runtime allocates memory, and how it locates and loads type information. Likely, the latter will be trivial (the type information can simply be linked in to the kernel), except perhaps for loadable modules (they must be created with care, so that they contain any additional type information). The former is more complex, since the current runtime makes extensive use of Linux's unreserved virtual memory (via MAP_NORESERVE) which is not available as such in the FreeBSD kernel, although equivalent alternatives are likely possible with some hacking.

The other option is to use DTrace to write a non-invasive “monitor” process, started early during boot, that captures equivalent information to the liballocs runtime. One could then implement a client library which mimics the liballocs API and supports the same kinds of query. This approach is likely simpler to get started, and allows for a completely external, non-invasive client that can be used without recompiling the kernel. However, it is less reliable because it is likely to miss certain kernel allocation events, such as those happening early during kernel initialisation, or those that DTrace drops because of buffer contention. These will lead to incomplete type information. It is also unlikely to be able to answer queries coming from the kernel itself, so is useful for debugging and comprehension tasks (queried by a user-space program) but not for run-time type checking.

Both versions will require careful understanding of the FreeBSD kernel's allocation routines, for which some familiarisation time must be allowed. The DTrace option would be a good vehicle for exploring this, after which an executive decision could be made about whether to switch to the full in-kernel option.

[/research] permanent link contact

Powered by blosxom

validate this page