Rambles around computer science

Diverting trains of thought, wasting precious time

Tue, 02 Jan 2024

How to “make” a shell script

This post is about the following strange program I wrote recently.

dummy=; define () { true; }
define dummy
echo "Hello from shell; PATH is ${PATH}"
return 0 2>/dev/null || exit 0

.PHONY: say-hello
    @echo "Hello from make; makevar PATH is $(PATH), envvar PATH is $${PATH}"

One of my guilty secrets is that ‘make’ is one of my favourite programming languages. It has persistence and goal-driven incremental execution... so high-level! Although it is string-based, it has much less esoteric quoting behaviour than the shell.

Another guilty secret is that I also write a lot of shell scripts. Despite not really liking the shell as a language, I've grown used to its quirks. When the filesystem is your database, the shell is an essential tool.

A third not-so-guilty secret is that I believe in the power of the one-filer: a piece of software I can deploy as a single short-ish file. The above program is a demo of how to combine shell and ‘make’ code in the same file, increasing the space of one-file programs you can write.

Of course, ‘make’ already embeds the shell, but in a very limited way: recipes embed single-line shell commands within them. Normally you can't (usefully) define shell functions within ‘make’ for example. But now, thanks to my handy trick, you can! Whereas normally you'd have to put them in a separate file that you source from some shell commands in the makefile, the “separate sourced file” can now be the same file as the makefile itself. That's just one benefit to have shell-and-‘make’ code bundled together.

This implied use of source is why the shell code ends as it does: if we're being sourced we want to return rather than exit. Since return will provoke a complaint and an error status if we're not being sourced, we silence the complaint and fall back to exit.

Why does all this work? The shell is more-or-less line-oriented: it reads your script incrementally and will not read further in the file than it needs to. So, the file itself can easily contain non-script data at the end, following an exit command. The self-extracting shell script is a well-known trick exploiting this fact.

‘make’, however, as an interpreter is file-oriented: it will read a whole makefile before it does anything. That would seem to be a problem if we want to embed non-‘make’ content in a makefile. However, there is a way.

The first three lines consist of punny shell-or-make code. It means something different, but valid, to both the shell and ‘make’. These meanings are such that ‘make” will ignore the shell code and the shell will ignore the ‘make’ code. The main trick is use of the multi-line lazy “define” feature of ‘make’. This wraps around the shell code. Even if the shell code contains things that look like make expansions, the laziness ensures they will not get evaluated. Rather, ‘make’ just scans forwards looking for “endef”, which appears only after the shell has exited.

Conversely, to allow the shell to pass over this “define” decoration harmlessly, we create a function called define (supplanting any command of the same name that might be on the PATH). The ‘;’ trick exploits how

blah=x; something

associates differently in ‘make’ versus the shell. In ‘make’ the part ‘; something’ is included in the string that is assigned, since semicolons are not a terminator. In the shell they are, allowing our something to be a definition of a no-op function called define, which we use to make the shell skip over the define intended for ‘make’. Similarly, define in ‘make’ is a multi-line construct and the body is expanded only lazily (i.e. not at all, since we don't use it). We use this to skip over an arbitrary number of shell lines.

Before landing on this solution, I tried various things that didn't work. These included variations using the form $(eval ...) which is syntactically valid in both ‘make’ and the shell but has slightly different semantics. It didn't work because ‘make’'s eval is tricksy. Unlike shell expansion, we can't use it to inject arbitrary strings or even arbitrary token sequences; it seems only to work when if it generates a whole phrase of valid “make’ syntax. This means that we can't, for example, have the shell skip over the define simply by having define be expanded from an application of ‘make’'s eval'd that expands to empty in the shell.

Why do I want all this? My motivation was writing moderately simple web applications that could be hosted easily on a machine where I have only basic privileges. If I can run CGI scripts, I can use the shell and the filesystem. I may not be able to (or want to) run PHP or similar. And the same goes for databases... even if I could run a database, they're a lot less useful than files in our current (impoverished) software ecosystem. Programs have to be written specially to interface with a database, but files have the nice property that most programs know how to access them. My application's data doesn't need to be “exported” to them, which anyway brings the complexity of copying. Instead I can have direct access to files that my web application also accesses. My demo application is a form-filling system a bit like Doodle, but called Doddle. Instead of a database, I have files. Instead of PHP, I have the shell. No special privileges are needed, beyond the ability to run CGI programs. More about that anon....

[/devel] permanent link contact

Mon, 11 Sep 2023

A process by any other name

On Linux, what is a process's name? I've found this to be more complicated than I thought it would be. Clearly it's non-trivial because it seems top and ps can disagree about a process's name. As usual I have Firefox and some of its worker processes cheerfully burning some CPU for no apparent reason, so let's use one of those.

top - 14:53:48 up 5 days,  3:20, 28 users,  load average: 0.47, 0.82, 0.74
Tasks: 629 total,   1 running, 543 sleeping,   0 stopped,   4 zombie
%Cpu(s):  1.9 us,  0.8 sy,  0.0 ni, 97.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  16064256 total, 15800628 used,   263628 free,    55752 buffers
KiB Swap:  9291780 total,  6922152 used,  2369628 free.  5019028 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                      
 3172 stephen   20   0 2493792 145184  91804 S   6.0  0.9   0:07.80 Isolated Web Co                                              
 1685 stephen   20   0 2693232 264100  95080 S   3.6  1.6   0:28.86 Isolated Web Co                                              
19803 stephen   20   0 13.552g 6.042g 2.603g S   2.6 39.4  37:22.35 firefox-esr                                                  
12266 stephen   20   0 2653868 670136 582816 S   1.7  4.2 103:31.80 Xorg                                                         

Here it looks like process 3172 has a name ‘Isolated Web Co’. But this isn't the name that shows up in ps xf.

$ ps xf | grep [3]172
 3172 pts/2    Sl     0:09  |   |               \_ /usr/lib/firefox-esr/firefox-esr -contentproc -childID 218 -isForBrowser -prefsLen 39352 -prefMapSize 227223 -jsInitLen 277276 -parentBuildID 20230403141754 -appDir /usr/lib/firefox-esr/browser 19803 true tab

Nor is it what is visible in /proc/pid/cmdline.

$ cat /proc/3172/cmdline | tr '\0' '\n'

In fact we can get top to flip between these views by presssing the ‘c’ key.

       -c  :Command-line/Program-name toggle
            Starts  top  with  the last remembered 'c' state reversed.  Thus, if top was dis?
            playing command lines, now that field will show program names,  and  visa  versa.
            See the 'c' interactive command for additional information.

So the process name is not the same as its argv[0], and in our case it is the string ‘Isolated Web Co’. I'd like to know where this string is stored. It seems this is called the “program name”, at least by the top manual page. Later in the manual page it talks about “command names” as the same thing as “program names” and observes that even a process without a command line, like kthreadd, has a program name. So where does it live? The /proc filesystem does know about these names.

$ cat /proc/3172/status | head
Name:   Isolated Web Co
Umask:  0022
State:  S (sleeping)

In libbsd there is a setproctitle() call. It says it affects the name used by ps, which for us was just the command line, not the other name. We can request that ps show this name, although this seems not to be the default.

$ ps --pid=3172 -o comm
Isolated Web Co

And indeed a simpler way to get it to show this is ps -a (the minus is significant!).

$ ps -a | grep [3]172
 3172 pts/2    00:00:47 Isolated Web Co

This actually checks out. The use of - (dash) in ps options signifies “Unix options” and not BSD options, whereas it is the BSD setproctitle() call that claims to change the name used by ps. The name used by a “BSD options” invocation of ps, such as ps a, is indeed just the command, not the program name. So it looks like on BSD, historically there was just one “program name” and it was stored at argv[0], but somehow a separate “program name” has also arrived in Linux.

Glancing at the implementation of setproctitle() in libbsd, it seems more-or-less to overwrite the contents of the buffer pointed to by argv[0]. This is a behaviour I consider broken and had previously reported as a bug in Chromium, since it may overwrite existing command-line arguments besides the name (when overwriting the argv[0] string with something longer) and/or erase the framing of them. The latter is, or was, the problem in Chromium's implementation, which concatenates all arguments, joined by a space, into one big new argv[0] string. This renders ambiguous the case of spaces within arguments. (It's also unnecessary, since the goal of the exercise is presumably to update argv[0] while leaving later arguments untouched. Although I haven't tried it, if the new argv[0] string doesn't fit in place, can presumably just be stored elsewhere and argv[0] updated to point to it. In extremis, as we'll see shortly, it is possible for the entire argvenviron structure to be rebuilt and re-declared to the kernel... although this requires permissions that an ordinary process typically does not have.)

The source code of the BSD function's reimplementation in Chromium reveals the intricacies of this apparently simple operation on Linux. Linux has overhauled its handling of process names, introducing some bugs that Chromium needed to work around between kernel versions 4.18 and 5.2. The latest major kernel commit affecting this explains more about how things are supposed to work. It splits the logic into two cases: in get_mm_proctitle, with the comment...

    If the user used setproctitle(), we just get the string from
    user space at arg_start, and limit it to a maximum of one page.

... and in get_mm_cmdline with the following confusing comment:

    We allow setproctitle() to overwrite the argument
    strings, and overflow past the original end. But
    only when it overflows into the environment area
    .. and limit it to a maximum of one page of slop.

But all this seems to suggest that setproctitle() will change both the “command line” and the “process title”. Yet what we're seeing is that the two exist independently: one says Isolated Web Co, the other firefox-esr .... What is going on?

A big of digging led me to discover that there is a 16-byte field comm in struct task_struct roughly here. Unlike argv[0], this exists entirely on the kernel side. The comm field turns out to be the “process title” that top is (optionally) displaying—but actually it's a thread title, since there is one per schedulable task.

Use of strace reveals that top itself reads it from the /proc filesystem. Could this by the only interface on Linux that exposes the “process title” as distinct from the command line? Not quite! There is also prctl(PR_GET_NAME). (Interestingly, the PR_SET_MM family of prctl() calls also let you update the various kernel-side pointers that mark the bounds of the argv and envp vectors and their backing character data. You can also modify the position of brk, and so on. And one can even change the whole lot in one shot by passing a struct prctl_mm_map, although only if we were built with CONFIG_CHECKPOINT_RESTORE. A process needs to have CAP_SYS_RESOURCE if it is to be allowed to perform any of these updates. This makes sense: being able to completely re-plumb your process's command line and environment like this is a feature rarely needed, but would be needed in a “user-space exec()” or image replacement of the kind a checkpoint-restore system might perform. Meanwhile, tweaking the argv bounds could be seen as modifying the stack size, which might need resource-override permissions. (I find the latter a bit tenuous, since it is really just setting two kernel-side pointers, and they only usefully point into memory that still has to be allocated by the calling process.)

Going back to the mysterious Linux kernel distinction between get_mm_proctitle and get_mm_cmdline, confusingly, get_mm_proctitle still has nothing to do with the thread name (the 16-character thing), which top called the “process name”, which a reasonable person might think is the same as the “process title”. It's just an alternative semantics for reading from the argv area: instead of limiting the read to the argument strings according to their original bounds, it allows overflow into the environment area. In both cases it's only reading the command line, never the 16-byte comm string. It exists only to make allowances for how a setproctitle() call might clobber the environment strings, thereby tacitly redefining a part of that region as part of the argument area.

One might also wonder why on detecting this clobbering, Linux doesn't take the opportunity to move the process's env_start and arg_end pointers, as if the necessary prctl() calls had been made as discussed above. That might be considered “too clever” I suppose, and risk creating further changes observable to certain user-space programs that might assume the status quo.

All this has consequences for a user-space runtime that wants to understand what is stored at the top of the initial stack, in the argument strings area. When a BSD-style setproctitle() happens, the effective boundary of the argument strings may change. So, remind me to patch liballocs to override setproctitle() with something that does the necessary extra bookkeeping. That probably means tracking separately the kernel's arg_end and “the place the kernel will claim is the end of argv[0]”. And of course for setproctitle() calls that don't overflow into the environment strings, we might still want to track when they've trashed argument strings.

Just to make things more confusing, the implementation of setproctitle() in libbsd does not seem to overwrite the argument strings in all cases. I can successfully set the process title to a longer string without it spilling over into later arguments. I could dig into this, but my guess it it's using the “separate storage” trick I mentioned as an option for Chromium, above... i.e.  it's no longer doing the overwriting that the kernel's heuristic approach is hacking around.

Of course, after finding all this out I found a nice blog post by Tycho Andersen covering basically the same material.

[/devel] permanent link contact

Tue, 29 Aug 2023

Teaching loads in computer science: it's not just lectures

About four years ago, when I had spent a year at the University of Kent in my first job as a lecturer (or “Assistant Professor” for North American-style readers), I had to fill in a probation review form. One question asked me to reflect on my teaching load over the previous year, which had consisted of 1.4 courses or “modules”. One module lasts for a 12-week teaching term, with one reading week, and students typically take four modules at a time. I said that in my view, 1.4 modules had been a heavy load for a new lecturer trying also to do research at the same time. I still think I was correct to say so.

My head of department's response was unsympathetic. “You only had to give thirty lectures. Normally it'd be thirty-three!” This made me unhappy, partly on a personal level—why ask for a reflection if you already know the “right answer”?—but also because counting lectures is “obviously” not a good measure of load. Since then, my move from Kent to King's has provided another study in this.

(It's been a couple of years since I wrote about my then-impending move. I realise I haven't yet written anything directly about my teaching experiences at King's. I'll get into that a little in this piece, but not a lot.)

To set the record straight, here I'll enumerate some other dimensions by which teaching loads differ. These might be useful for those weighing up job offers: it's worth getting a feel for how the institutions compare on as many of the dimensions as you can.

All this deserves some caveats. Of course there will be plenty of other ways in which institutions differ. Also, the loads at any given department are a moving target... I'm told my department at Kent has managed to reduce loads significantly since I left, while King's had been reducing them significantly immediately prior to my joining. At King's right now, the load is generally favourable compared to my time at Kent, both on a lecture-count basis and on many but not all of the other factors. However, there is one big exception, bringing me to the first of my many dimensions....

Number of students, as reflected in assessment and marking

This is the obvious one but it needs saying. All else being equal, marking is usually linear in the number of students. All else is never equal, however. Only slightly less obviously....

Amount and manner of assessment, marked by whom

If there is less coursework, there is less marking. If exams are done electronically and can be marked wholly or partly automatically, there is less marking (but this comes with downsides). If teaching assistants (TAs) can mark coursework, there is less marking for the lecturer, provided that directing the TAs and moderating the results is not too much work... this can be significant. All this varies hugely between institutions (and indeed departments).

Use of coursework

Although this is a special case of the previous one, it bears repeating that departments and programmes vary hugely in both how and how much they use coursework for assessment. The weighting of coursework can be a red herring load-wise, since just because something has a low weighting doesn't mean it's a small piece of work. Bigger coursework means more marking, and often disproportionately since coursework is usually harder to mark than exams. Also, whereas exam-marking happens in a consolidated, frenzied period during which most other business is paused and external commitments can easily be blocked out, coursework comes in amid other stuff going on.

Complex rubrics and submission schedules can also ramp up the work. For example, at Kent, when marking an undergraduate or master's project, the submission consists of not only a dissertation, but a corpus (code and/or data), one or more individual reports (many in the case of a group project), and various progress reports along the way. As someone with very limited desk- and head-space, I found that this plethora of materials made the process more time-consuming than it needed to be.

Student level

Again this is “obvious”, but a preponderance of relatively weak students, who therefore need a lot of help, can become a major ingredient of load. At the other extreme, the most able students can be needy in their way, though it's a nice problem to have. In theory, the course should adapt to stretch students “just enough” whatever their level, but this falls down when the spread of abilities is large.

Absolute class size matters too, because when the number of students is large enough, the spread of abilities is always large, almost independent of entry requirements. Put differently: if the invariant is that 10% of admitted students slip through the “equippedness net” that entry or progression requirements were supposed to take care of, it matters whether that 10% adds up to 4 students or 40. Critical mass effects mean the latter can draw much more attention, not just in absolute terms but also per capita.

The amount of direct contact with students

If you have few or no TAs, you as lecturer are the TA. You are the only human interface the student has with the course. This has consequences far beyond marking. There were few or no TAs on all my courses at Kent, although some other courses did have decent provision; it often reflected the age of the module spec, with higher levels of provision a vestige of bygone, less penny-pinching days. At King's there generally is decent TA provision, but scale can make this hard to manage, the TA bag is very mixed (many are themselves undergraduates; some very good, in fairness, but not all), and there are also rumours that the money might get taken away. Some days at Kent I spent hours answering student e-mails or dealing with spontaneous drop-ins. This brings me to....

Cultural distantness of the lecturer

Institutional culture affects the threshold that students feel they must cross before approaching a lecturer. It is related to the TA issue, in that students might approach TAs first (only if they have them!). But it varies even aside from that. If students are told there's an open-door policy and they can drop in to their lecturers' offices whenever (as was the case at Kent), they will do so! Conversely, if they absorb a sense that lecturers are busy and distinguished people who have limited time for them, they will be more conservative. (It also helps if, as at Cambridge, lecturers are kept behind a locked door....) I'm not saying which of these is “better”—obviously, access to lecturers can be a boon to students, but it comes at a price.

Student dependence and learned helplessness

Even in addition to the combination of student level, TA provision and lecturer distantness-or-not, a department's culture can make students more or less self-reliant. At Kent there was an oft-remarked vicious circle of expectations around hand-holding and “being nice”, which amplified both the office drop-in tendency and frequent generalised demands for “help” of unspecified nature. At King's there are similar effects in evidence. It continues to be a bit mysterious to me how exactly these expectations came about and what might lessen them... I suspect the style of first-year teaching has an outsized influence, but it can't be just that.

Late submission, resubmission, resitting

A disproportionate marking-related load comes not from exam scripts or coursework submissions per se, but from handling those scripts and submissions that do not arrive or do not pass. When departments have more generous policies around late submissions and extensions, this causes greater fallout onto academics, who are forced either to mark in multiple batches, or to work with a narrower marking window overall. Reassessment multiplies the number of assessments to be written, and adds to the number of context switches (getting into marking mode and out of it).

In Cambridge—the most pared-down, research-focused regime I know—these simply don't happen, at least for computer science students (and, er, with the caveat that my knowledge is possibly out-of-date). If you miss a deadline, the late-submission penalty is severe and you don't get another chance. There is only one exam period a year, and if you're not fit to sit your exams, you either get a special result that year (“deemed to deserve honours” a.k.a. DDH) or you fail, depending on whether your term-time work has been up to scratch. At other universities I've worked at, any module's assessment is done at least twice per year, with all the doubling that entails.

Per-student casework, as it falls on the academic

Things like submission, marking, extensions, mitigating circumstances, release of marks, appeals and so on can generate a surprising amount of student-by-student “casework”. Mainly this is when students end up in odd or difficult situations. In addition to the inherent task of marking resit papers or what-have-you, there is the added task of regularising the student's situation, for example by obtaining exemptions (or advising the student how they can do this), or simply by forwarding information that has got lost owing to the student's unusual path. Academics invariably do get pulled into this, to a varying extent; the lurking variables, aside from the number of students, are about processes themselves (which can be heavy or light) but also about how they are administered and communicated.

If support staff are adequately resourced, academically literate, communicative and procedurally empowered to solve problems by themselves, they can take the casework off the academic. If information such as internal web pages communicates effectively with students how they can help themselves, demand can be dissipated at source. However, when these are not the case, academics get sucked in, under either our lecturer or tutorial hats. We can end up as a hapless e-mail go-between, forwarding student queries to the admin office, unable to help but also unable to extricate ourselves. If the bureaucratic machine appears to be stonewalling a student, perhaps from underresourcing and/or lacking the right systems (e.g. a ticketing system for student requests), often they will re-approach the academic rather than wait for their case to reach the front of the invisible queue. This generates more requests on the system in general and more tasks for the academic; last summer it was a major problem for me at King's.

(I can't fault the student's tactics here: an academic advocate can help cut through bureaucracy and sometimes prevent unfair outcomes. Nevertheless, this is a high-cost way for the institution to enable this outcome; a more empowered and better-trained administration is a better option.)

Misconduct levels and processes

Misconduct cases are a particular kind of casework, but worth covering specially. A Kent colleague used to bemoan that “we incentivise cheating” because the penalty is a mark of zero if caught. So the expected return on a plagiarised submission is positive academic credit. At King's, the incentives are not too different, although perhaps slightly less bonkers because repeat offences are (I'm told) treated especially seriously. I'll believe that when I see it. In any case, reporting and seeing through misconduct cases places a large burden on the academic; the incentives are either to let it slide or to design assessments so that cheating is undetectable (note: not impossible! the former is feasible but the latter usually isn't).

It seems universal across (at least) CS departments that the cost of dealing with increased misconduct levels has not been adequately “priced in” as student numbers have been ramped up. On top of that, the greater use of automated marking, to deal with greater numbers of students, has I suspect allowed a lot of undetected collusion (and worse). Pandemic-related changes have also increased these tendencies, through greater use of online assessments (open-book exams are an especially terrible idea) and through general depersonalisation. I hypothesise that the tendency to depersonalisation—particularly strong at King's since in Informatics we no longer give in-person lectures—has led some students to fall more into a “system-gaming” mindset, rather than seeing their degree as the matter of personal development that it is supposed to be.

Institutional resistance to academic-as-admin

Beyond student casework, various non-academic tasks can be intentionally pressed upon academics. These include submitting attendance data, mechanically diddling marksheets, physically sorting exam papers, signing paperwork for this or that, and so on. How much this happens is a cultural variable of the institution. During my time at Kent the institution attached considerable virtue to form-filling and other paperwork, to the extent that a colleague was rebuked for suggesting (albeit tactlessly) that academics might not be the right people to carry out sorting of exam scripts. At King's, processes and admin work are much less feted, but there is a flip side: greater prevalence of cases where “nobody knows what to do” where obtaining a sensible outcome can be a stressful process requiring much ad-hoc effort, possibly dragging out over a long period.

Churn in lecturing assignments

It's well-known that teaching something for the first time is hugely more work than subsequent iterations. Therefore, churn in teaching assignments is a hidden source of load. In my three academic years at Kent I lectured on five distinct modules. In each of those years I was lecturing something for the first time. In fairness, if I'd stuck around one more year I probably would have kept a stable assignment at last. But some churn was built in: the department had a very fine-grained workload allocation model which attempted to treat fairly a large variation in research-activeness in the academic body. Although most lecturers were nominally research-active, practically there were very differing extents of that, so if you got a strong run of papers or a grant success, you would expect to shed a chunk of teaching in the next year, to be picked up by someone else. At King's, there is no such fine granularity. Instead there is a vaguely standard load and two exceptions: “workload reduction” for those with major buy-out (50% or over), and a separate, explicitly teaching-focused career path for some academic staff. Although the Kent system is “fairer”, the King's system seems to generate a lot less churn.

Vulnerability to market conditions

In the UK's current marketised approach to higher education, there is much pressure to constantly revise the teaching offering, for example to chase new segments of the student market. I've written before about how foolish and counterproductive a trend-chasing approach can be. But this pressure is greater in some institutions than others, usually because of market position and a lack of security therein.

When in 2020, calamitous university finances at Kent led to a purge of modules, the knee-jerk measure was to discard “unpopular” modules. Of course this inevitably also discards student opportunity and staff investment; I found this exasperating, because in my view duplication of teaching was a far bigger problem, yet consolidation of duplicates was not being considered. The C++ module I taught was cancelled even though clearly there was enough demand for C++—just not among the needlessly narrow sliver of the student body to which the course was being offered. (In fact I had been asked to teach extra C++ content to a different cohort of students in another department. I said no, because there was no way of getting any workload credit for doing it, and because my teaching load was already too high.) The module went away and, of course, I was given something new to lecture the next year.

While it's true that smaller-cohort modules are higher-cost, they often provide higher value for the students that choose them. The compilers module that I taught during 2019–21 was one which, both years, some students declared as the highlight of their degree. But in 2020, this module would also by rights have been cancelled if the standard numbers test had been applied; rumour was that it was “safe” only because cancelling it would have caused embarrassment in a department where “Programming Languages and Systems” was arguably the most successful research group.

Duplication of teaching effort

Although I just covered the “what” of this, it's worth considering how a department can end up with internal duplication in its teaching offering. One ingredient is what we just discussed: courting many market segments, leading to many formally distinct programmes, each with formally distinct modules. The mistake is letting formal distinctions become actual distinctions. It takes wise management to stop this from happening. Sadly the unwisdom tends to be on the academics' part in this case.

At Kent, as a department we used to teach introductory computer science four times over: to Bachelor's students of “Computer Science” at Canterbury, to Bachelor's students of “Computing” at Medway, to “conversion Master's” students and to “Year in Computing” students (Bachelor's students studying an extra year in an otherwise non-CS degree). This could have been fine, but there was some but very little sharing between the four programmes. Sometimes there were good reasons for distinct provision, but at other times there definitely was not. (Once I was told we can't merge two near-identical courses “because that module code starts with a 3 and that with an 8”.)

Sometimes the reasons were more logistical, such as the programmes' teaching being timetabled in incompatible ways. However, even timetabling barriers don't excuse fully separate offering: the same people delivering the same course in two places at two times is much less work than different people delivering two formally unrelated courses, each with their own materials (and no guarantee of mutual awareness!).

At King's there is less duplication, and proposals to introduce new modules seem to be treated more circumspectly. However, we are just starting up a new “Artificial Intelligence” programme that certainly has the capacity for adding duplication (allow me to report back). I'd expect departments' care in avoiding duplication to be roughly inversely proportional to their teaching-orientedness: if spawning a new course is viewed as an expensive undertaking because of the other things the academics could be doing, more effort is made to avoid it, while if the attitude is “teaching is what we pay academics to do” then of course they can always be made to do more! At Kent, prevalence of the latter attitude was one of the factors pushing me away from the institution.

Over-use of crude metrics like staff/student ratio don't help things, because this phenomenon literally doesn't show up in them: same staff numbers, same student numbers, but many ways in which the teaching work can be structured. Although they measure equal, they are far from equal in reality: with many modules on the go, I found that the “divided attention” factor degraded how effective and conscientious I could be in any one part of my teaching.

Academic atomisation

Continuing the broad issue of duplication, a further variable is the extent to which academics talk to each other. Departments differ a lot on how readily they build community. Even pre-pandemic, King's was reportedly not great on this, because central London is expensive and time-consuming to get to, so working from home is more common. (Also, academics not getting their own offices doesn't incentivise working from the department.) At Kent, we were pretty good at talking to each other, with a supportive day-to-day culture, although I don't know how well this has now been revived since the pandemic.

Academics talking to each other, forming a support network, is an important way in which the academic body finds out the good ways to deal with a task or to minimise the work generated by a given issue. One hazard is that with many meetings now continuing to be virtual, there is less opportunity for informal chat, such as in the spaces before and after physical meetings, yet it's largely in these spaces that support networks are sustained.

Bad technology

Ill-fitting technology pushed down from above can easily make simple teaching-related tasks complex to perform. The converse is also true—good technology does exist—although for some reason this phenomenon mostly seems to be going one way. To pick an example continuing the duplication and atomisation themes: nowadays Moodle and similar systems have brought new levels of secrecy and obscurity to teaching within a department. Whereas not long ago, course materials went on a public or internal web page, or could be picked up in hard copy (gasp!) from an office, nowadays you can't even look at your colleagues' materials without requesting access one-by-one, unless they have had the foresight to make their page public—which is not the default. This inappropriately corporate “need-to-know” access control policy is an imposition from above, though the imposition of poorly chosen technology.

As another example, at Kent we had a very nice shared teaching server into which all students had ssh access. The students quickly learned the basics of how to use this system (it was used by many modules) and it proved a very flexible environment. For example, within it I wrote a gated submission system that could give students pre-submission feedback (helping them) and stop them submitting ill-formed assignments that would foil my automatic marking (helping me). It also ran an “anonymous Q&A” CGI script that had been used by several modules over many years. At King's there is no such server and for modules wishing to have a shared machine, a bespoke set-up needs to be provisioned each time, at much academic cost. It turns out that this can be done, at a cost of explaining the requirement from scratch several times, and continuing to fight for admin attention when (for example) the initial provisioning level is wrong or the well-running of the service is otherwise overlooked. These problems would not happen with a single widely-used service of recognised importance to the department's teaching, as was the case at Kent.

Correspondingly, here at King's even more faith is put in bought-in solutions like Padlet, Gradescope and the like (though these were by no means absent at Kent). My view is that in a computer science department these tend to be relatively high-cost and low-value.

Academic autonomy in respect of teaching

The IT problems I mentioned aren't really problems if IT systems are provided for discretionary use, if plenty of alternatives are available, and if academics are free to choose. It's when their use is mandated that the problems start to bite. Reading David Evans's excellent (but understandably US-centric) “How to live in paradise” left me suspecting that many top-level US public research universities leave quite a lot of control in the hands of academics about how to teach, compared to ostensibly “peer” institutions over here (if one includes King's in that, say; one can certainly question this!). If this is true, perhaps the US tenure system and some kind of fractionally more academic-led culture can explain this. (Or I could be completely wrong; this may be a greener-grass effect.)

My experience of UK departments is that there is more pressure, and sometimes a hard requirement, to teach in centrally agreed ways. Correspondingly, this could perhaps be explained by the tight grip here of UK government policies and their knock-on centralising influence on institutions. For example, at King's we have a default formula for teaching delivery, adopted at department level. One can ask to follow an alternative model, but that's it; the menu has exactly two options. Even worse, the alternative model comes pre-torpedoed for undesirability; it seems to be designed so that nobody will choose it. I'm still not sure why this is considered a good idea.

In fairness, institutions do vary on this even within the UK. To pick my favourite outlier again, in Cambridge a few years back, a colleague decided to teach C programming using a flipped-classroom approach, recording lecture videos and then using the timetabled lecture slots to help students with examples. (This in turn was possible only because the class size was below 100.) No other course in the department was taught this way at the time. I'm not sure whether he did this unilaterally or following some permission-seeking, but I'm fairly sure it wasn't an arduous approval process. This kind of autonomy is rare in the other UK departments I know. Compounding this is how although deviation might (or might not) be tolerated on paper, at many institutions it would likely cause the academic many admin-headaches if even the first round of student feedback was less than glowing.

Attitudes to student feedback

This is my last point and is one that has taken me a bit longer to tune in to than the others. My experience has been that departments vary a lot in their attitudes to student feedback. This can have significant consequences for workload: a department that takes every student comment as a knee-jerk “action point”, which the academic must resolve, will create a lot more work than one that simply uses student feedback as a crude aggregate barometer. This can be thought of as an internal dual to the “external” churn of one's assigned courses or modules; it is churn occurring internally to each of those courses.

Pressure to improve can be helpful, but not all the pressure actually does improve. For example, tacit “disengagement compacts” between students and lecturers (and I thank a nameless colleague for bringing that phrase to my attention recently) create pressure instead to lower standards. I wager that most of my colleagues at King's, at least those who've been around a while, would privately agree that this effect has become embedded in our department. Some have already privately said so; as one put it (I paraphrase slightly), “the approach here is to give them a gold star just for turning up”.

Workload-wise, there are other problems arising from a high internal pressure to change. The over-use of video-based delivery interacts especially badly with it, because a unit change is much more expensive when made to recorded video than to in-the-room delivery. My department currently has a policy I regard as a nonsense, which is that lecturers should “re-do 30% of videos each year”. Aside from the madness of video lecturing, it's a silly notion to enforce yearly incremental change, since changes in one place invariably have ripple effects elsewhere. This is why academics prefer to improve their courses every few years in a “big bang”, rather than try to focus changes to a small portion every year. Changes simply don't focus nicely like that! And video delivery amplifies this effect: something as simple as slide numbers visible in a video can made it a Herculean task to achieve a given change to an acceptable standard.

What that standard is also varies. Video-based delivery tends to coincide with an overemphasis on production value. As someone who was often lectured from hand-drawn OHP slides, my view is that any big emphasis on production values means the time isn't being spent on the important stuff. Yet now, if your materials look a bit raggedy, students are sure to complain, while lecturers are given gold stars for producing something slick but superficial. Different departments can be infected by this madness to greater or lesser extents.

The end

I think that's all I have. I hope it was useful to someone! I've not been particularly guarded about what I see as structural problems in how teaching is done, both in my current department and at others I know. However, no department gets away unscathed... I think that says a lot about the general state of UK higher education right now, and no doubt the same problems exist elsewhere in the world too. Reflecting more widely on “the job” (which isn't the same job everywhere), above I made a reference to David Evans's “How to live in paradise”. This is a piece I recommend and that I intend to revisit more broadly in a future post.

[/highered] permanent link contact

Mon, 12 Dec 2022

Interoperability: what's rich is still poor

Is there good interoperability with C from C++? I could have named any two languages, and for most pairs the answer would be a clear “no”. But in the case of working with C code from C++, the situation is a lot better than normal. In this post I'll observe why it's still not good, via the lens of a particular programming task I carried out recently. I'll end with some notes on what I think good interoperability ought to mean.

My task was writing some code to target a C-style plugin interface. (It's the plugin interface of the GNU linkers, as it happens.) It follows a common C style, which is to write a bunch of callbacks and ‘register’ them.

enum ld_plugin_status new_input_handler(const struct ld_plugin_input_file *file)
    fprintf(stderr, "new input handler called ()");
    // ...
    return LDPS_OK;
enum ld_plugin_status cleanup_handler(void)
    fprintf(stderr, "cleanup handler called ()");
    // ...
    return LDPS_OK;
enum ld_plugin_status onload(struct ld_plugin_tv *tv)

If these operations need to keep any private state, we have to keep it in globals or at least file-level variables—they can be static, a.k.a. private to the file. (A slightly more sophisticated C callback idiom threads an extra void* argument to each callback, with the pointer supplied at callback registration time or at invocation time. This allows the private state to be dynamically allocated. But in this task that's not the case.)

Now let's imagine I have a family of plugins with some commonality. Maybe their handle_foo() is shared. Or maybe one handle_foo() needs to call the basic handle_foo() and then do some additional stuff before or after. The extended plugin should be defined in a separate file, but it can easily refer back to the base one. You can do this all with straightforward link-time plumbing: define a new extended_handle_foo() calling handle_foo(), say.

enum ld_plugin_status extended_new_input_handler(const struct ld_plugin_input_file *file)
    fprintf(stderr, "extended new input handler called ()\n");
    // ...
    // now just do the basic version
    return new_input_handler(file);

If we need some extended state—a new bunch of variables—then again, our ‘extended’ handler can just be accompanied with some additional file-level variables; and we use link-time reference to get at the basic ones if we need them.

What I've just enumerated are the “static-state” (link-time) analogues of some standard object-oriented patterns: method overriding (defining extended...handler()), delegation to a superclass or similar (linking back to the basic handler), holding state in fields a.k.a. member variables (the file-level variables) and extending that state in a derived object (new file-level variables).

But one quirk is that in this scenario there is no notion of class... each instantiated plugin “object” is implicitly one of a kind. It has its own code and its own data; no way is provided, or needed, to share the same code among many distinct instances of the state. Each piece of code binds directly, at link time, to the state it uses, and what we are building is a single object—albeit perhaps by composition with a base definition.

Since our C extended-plugin code is conceptually extending an object, not a class, there's still one set of state, albeit now split between the file-level variables in the 'base' and 'extension' parts. So when we want an actual plugin that we can run, we instantiate either the base or extended version, by choosing what code we link together. Typically this logic exists in makefiles, say for building plugin-base.so or plugin-extended.so by invoking the linker with different combinations of object files.

C++ classes provide a nicer notation for this “base” and “extension” idea. Maybe we should use C++ to write our plugins? If we did this, our extended plugins could explicitly declare their relationship with the code being extended; there would be no need to invent a naming convention like extended_blah.

struct extended_plugin : basic_plugin
    new_input(const struct ld_plugin_input_file *file)
        fprintf(stderr, "extended new input handler called ()\n");
        // ...
        // now just do the basic version
        return this->basic_plugin::new_input_handler(file);

We could also explicitly declare and group the state that is basic versus the state that is extended, in the member variables of the different classes. And instead of co-opting the linkage namespace and using “static” (local symbols) for the private stuff, there are language features tailored to the specific use cases (protected or private or public).

The semantics of the whole thing are a bit different though. Rather than resolving references at link time, in C++ the binding to plugin state must be dynamic if we're to use these nicer notations. Additional pointers, of course named “this”, are threaded through each member function, pointing at the state to be used. This is to allow multiple instances of a given class. But it means a member function cannot have a signature that is compatible with the plain C function that our callback registration API expects; the extra argument gets in the way. More generally, the fallout of C++'s dynamic binding assumption is that the “static objects” style of interface cannot be modelled using objects in C++. All objectwise interactions need to involve an explicit “this” pointer.

This hinders stylistic interoperability with our static-state plugin interface. We can't simply code against this interface using these C++ features! To use C++ as intended, we would have to change the interface profoundly. For example, in the C code, although a notion of “object state” does still exist conceptually, it is not necessarily localised in memory, so there is no single pointer to it—no natural this pointer. Rather, the linker takes care of placing the various parts of it and wiring it all together; the state might be spread around in memory.

What if we really really want to use these C++ language features to write a plugin usable by the provider of this C-style plugin API? It's not easy, and most people would just fall back on writing C-style code in C++—or maybe just forget C++ entirely.

I'm not most people, so I was not so easily dissuaded. I can see at least two ways to do it: we can package the C++ code (client) in the native style of C (API), or we can package the C code in the native style of C++. Either way, I want to automate the process so that the adaptation burden largely doesn't fall on the programmer. Let's talk about the first approach here: packaging a C++ plugin to fit the C-flavoured style of the API.

To get a C-style “static” “direct binding” plugin, from a C++ class-based implementation which necessarily indirects through a “this” pointer, we need to get plain function pointers that privately access statically allocated state. We can use libffi's closure API for this. For each method, let's generate a closure that can be passed as a plain function pointer. We want to be able to write code something like this.

    MyClass obj;
    auto foo_p = generate_closure(&MyClass::handle_foo, &obj);

Notice the use of a C++ pointer-to-member function. A pointer to member function isn't a closure because it doesn't come with an object. To call it, we have to supply the object specially, via something like this.


In our case, once we've generated our closure, we can just do


because we have bound the handle_foo function together with its object obj. This works by generating a trampoline that bakes in obj as the this argument to the member function, pointing to the object that we identified when we generated the closure. In our case this is a statically allocated instance of MyClass. This does mean our plugin state now resides in a localised object, rather than a rag-bag of global variables; we have had to group our state into a single link-time object definition, but we can still use class derivation to split the declaration of that state across multiple (header) files.

If you can track down the mailing list posts (1, 2) that document libffi's closure API, it's not too hard to get closure generation working. It does, however, involve enough boilerplate that you wouldn't want to write it once per member function of even a smallish plugin. The good news is that in C++ it is possible to use template metaprogramming to generate the libffi boilerplate needed for this. My solution is now in libsrk31c++, my rag-bag library of useful its of C++ code. (I found a page by Murray Cumming especially useful to get my head around tuples and “argument packs”.) I'll walk through it briefly.

We use a class template to group together all the various stepping-stone definitions we'll need, all parameterised by the class whose method we're turning into a closure, and the signature of that method.

/* Closure glue code specific to a particular class type and member function
 * signature is generated by this template. */
template <typename ClassType, typename RetType, typename... MemberFunArgs>
struct ffi_closure_s
    /* This is the type of (a pointer to) the member function that we
     * want to generate a closure for. It's any member function. */
    typedef RetType (ClassType::*MemberFunPtrType)(MemberFunArgs...);

The main piece of code we want to generate is a function that conforms to libffi's expectations. All generated closures do a similar thing: they slurp their arguments from the calling context's stack/registers, pack them into a block of memory, create an array of pointers to that memory, and dispatch to a function that looks like the below. That function then must unpack them and do the actual thing the closure is intended to do, e.g. our member function call, then write the return value on the end of another pointer. It's a lot of packing/unpacking.

    template <MemberFunPtrType member_fun_ptr>
    static void
    /* FN will receive the arguments
     * CIF (the original cif),
     * RVALUE (a pointer to the location into which to store the result),
     * AVALUE (an array of pointers to  locations holding the arguments) and
     * DATA (some user defined data that the callee can use to determine what
     * do do about this call.)
     * It might look like this, to dispatch to a concrete C++ member function.
        int ret = reinterpret_cast<myclass *>(data)->myfunction(
        memcpy(rvalue, &ret, sizeof ret);
    the_fun(ffi_cif *cif, void *rvalue, void **avalue, void *data);

To unpack arguments from the array of pointers into our actual bona fide C++ member function call, we use some template magic. First we generate a std::tuple from the array. Doing so is deceptively simple thanks to std::tie and the miracle of the ‘...’ pattern expansion operator. Our caller will instantiate ‘Is’ with an ascending sequence of integers from zero. (Of course, there'll be a way to generate that too.)

    template <std::size_t... Is>
    get_tuple(void **avalue, std::index_sequence<Is...> seq)
        return std::tie<MemberFunArgs...>(
            *reinterpret_cast<MemberFunArgs *>(avalue[ Is ])...

In the above, at first it's a bit mysterious how seq gets expanded in unison with MemberFunArgs. I think this is just the semantics of ‘...’-expansion of expressions with two or more template argument packs (here Is and MemberFunArgs)—they get expanded in lock-step.

Now we need a helper that can call a function given a tuple, i.e. make it into a bona fide argument pack. Again the ascending sequence of integers helps us, this time getting the nth item from the tuple. Notice the use of std::index_sequence to generate the needed ascending sequence of integers.

    template <std::size_t... Is>
    RetType call_member_fun_with_tuple(
        ClassType *obj,
        MemberFunPtrType member_fun,
        const std::tuple < MemberFunArgs ... >& tuple,
        std::index_sequence<Is...> seq
        return (obj->*member_fun)(std::get<Is>(tuple)...);

Now we plumb these together into the actual function we promised in the forward declaration above. (Wart: the code we saw above doesn't actually work as a forward declaration! Forward-declaring templates is clearly beyond my C++ pay grade right now, so in the real code I skip the forward declaration... but I hope it was useful for exposition.)

    template <MemberFunPtrType member_fun_ptr>
    static void
    the_fun(ffi_cif *cif, void *rvalue, void **avalue, void *data)
        ClassType *obj =  reinterpret_cast<ClassType *>(data);
        *reinterpret_cast<RetType*>(rvalue) = call_member_fun_with_tuple(
            get_tuple(avalue, std::index_sequence_for<MemberFunArgs...>()),

If we're dynamically creating closures, how do they get destroyed? It seems reasonable to use std::unique_ptr. One quirk here is that in the libffi API, the closure pointer (the start of the allocated blob) is distinct from the code pointer (the actual callable instructions). It's the closure pointer we need to free, but it's the code pointer that is most useful to client code. For now, we abuse the deleter... we use the deleter object to remember the closure pointer and any other state we need. Then the unique_ptr we give to clients can point directly to the function, and we can give it a pointer-to-function type.

    struct closure_deleter {
        ffi_closure *closure;
        std::unique_ptr< vector<ffi_type *> > typevec;
        std::unique_ptr< ffi_cif > p_cif;
        closure_deleter() : closure(nullptr), typevec(nullptr), p_cif(nullptr) {}
        closure_deleter(ffi_closure *closure,
            std::unique_ptr< vector<ffi_type *> > &&typevec,
            std::unique_ptr<ffi_cif>&& p_cif)
         : closure(closure), typevec(std::move(typevec)), p_cif(std::move(p_cif)) {}
        void operator()( RetType(*fp)(MemberFunArgs...) ) const
        { if (closure) ffi_closure_free(closure); }

Now we actually have the ingredients necessary to call libffi and set up a brand new closure, passing the relevant template instance that generates our the_fun.

    template <MemberFunPtrType member_fun>
    static unique_ptr< RetType(MemberFunArgs...), closure_deleter > make_closure(ClassType *obj)
        ffi_closure *closure = reinterpret_cast<ffi_closure*>(
            ffi_closure_alloc(sizeof (ffi_closure), (void**) &fp)
        ffi_status status;
        auto p_atypes = make_unique<std::vector<ffi_type *> >(
            (std::vector<ffi_type *>) {ffi_type_s<MemberFunArgs>::t()...}
        auto p_cif = make_unique<ffi_cif>();
        status = ffi_prep_cif(&*p_cif,
            /* ffi_abi abi */ FFI_DEFAULT_ABI,
            /*unsigned int nargs */ p_atypes->size(),
            /* ffi_type *rtype */ ffi_type_s<RetType>::t(),
            /* ffi_type **atypes */ &(*p_atypes)[0]
        if (status != FFI_OK) return nullptr;
        status = ffi_prep_closure(closure, &*p_cif, &the_fun<member_fun>, obj);
        if (status != FFI_OK) return nullptr;
        return std::unique_ptr< RetType(MemberFunArgs...), closure_deleter >
            (fp, closure_deleter(closure, std::move(p_atypes), std::move(p_cif)));

The code is a bit rough and ready. I'm not sure it's free of resource leaks, and in our rag-bag of state on the deleter, since we need the cif and type info to live as long as the closure itself lives, there are rather extravagantly two separate heap allocations. Anyway, it'll do for now and is much nicer than using libffi directly!

What have we really learned from the above? Even though on paper, C++ and C have “good” “interoperability”, the facilities one might expect for taking an interface defined in one language and using it from the other are not there “out of the box”. We just had to build one for ourselves, and the result looks quite outlandish. Even I would admit that dynamically generating trampolines using libffi is pretty hairy and not very efficient.

I mentioned a second, dual way to do it. This would work instead by fabricating a C++ view of the plain-C state, by generating a class definition that includes the relevant state and functions, and using a linker script to group the state contiguously so that its starting address can be the this pointer and its linked layout matches the struct-level layout of the class definition. That's even more hairy and I haven't attempted to make it work yet, although it's the sort of thing my dwarifdl project can help with, specifically the dwarfhpp tool that can generate C++ class or struct definitions from binary interfaces (albeit not covering this particular case yet).

I also mentioned that calling the original API “C-style” is an abuse—it's valid C++, and other styles are common in C. I'd go so far as to say that languages are not the most illuminating way to look at interoperability problems; we should think about style instead. Style is about choices; languages are about hard rules. I'm caricaturing slightly, but a PL-minded person would observe the mess we saw above, with static versus dynamic “object” notions, and say that the answer is to define a new language in which we eliminate the in-hindsight-foolhardy distinction between statically and dynamically created state. Gilad Bracha—there he is again!—has done this in Newspeak, for example. One can quibble about how best to eliminate that distinction (and a long time ago, before I had met Gilad and also before I had learned to write even somewhat graciously, I did). The language creator's view is one of identifying the bad pattern and then forbidding it! This is natural because doing such things is within a language designer's power.

My take today sidesteps most of that. Rather than eliminating distinctions, I want to embrace them. I am fine with distinctions existing as long as I can easily bridge between them. That is almost never the case today! Yet this change of viewpoint is necessary if we're to have any hope of genuinely good interoperability. Firstly, we need to stop telling everyone to use a different language! Secondly we need to sanction actively the need for bridging, or integration, and for making it easy, rather than eliminating it. We need the maturity to accept that yes, different languages will come along, and they will embed distinctions as their creators see fit, wise or otherwise. In general, even when two languages are concerned with “the same” conceptual distinction, they are still likely pick two somewhat different lines to split it down. For example, many languages have a notion of “constant” or “non-mutable” data, but no two are ever exactly alike in what you can do with them. Trying to define diversity out of existence by creating a perfect language is an unwinnable game of whack-a-mole. We need to embrace diversity.

If embracing “found” diversity is step one, what is step two? It's about asserting your right to be different! In the main part of this post I hope I convinced you that the apparent “good” “interoperability” between C and C++ isn't really good at all. You can call any C from C++, but not vice-versa... and even when you do it, you do it by writing C! Good interoperability would instead mean writing unapologetic, idiomatic C++. Then there needs to be a way for the system to map it to the other interface, here the C-style one. This is basically never possible at present, for any pair of languages. For example, many FFI approaches, such as Python CFFI, effectively do the same thing that C++ did with C: they embed C in the higher-level language. Although it may be fractionally less painful to write the resulting Python-style C than writing the C code directly, this still isn't interoperability! It's just C in disguise. “Including one language in the other” is always is a poor person's interoperability. It has proven successful for C++ because the alternative is even worse. Step two means going beyond that. When a coding task involves using code written in another language, we should be able to code in the native style, not just its surface syntax.

For a preview of what this “stylistic interop” might look like for Python, you could have a read of the VMIL workshop paper Guillaume Bertholon and I wrote a few years back, about Guillaume's very impressive internship project with me at Kent. The concept is a CPython extension module that makes native libraries accessible in a natural Pythonic style, with no FFI code required. Among many smaller problems, the main problem right now is that to maintain reference counts, we have to instrument the native code with a pointer write barrier; that is neither convenient nor fully reliable, and it slows things down more than necessary. However, I have some ideas on how to do better, and am very interested in reviving this strand of work—please get in touch if interested. Prospective PhD students might like to know that I do have a funded opportunity (caveat caveat! it's harder to fund non-UK students with these but it can sometimes be done).

[/devel] permanent link contact

Mon, 10 Oct 2022

Understanding C99 inlines

For a long time I struggled to remember the rules for using inline in C. I think I've cracked it now, though. As often with C, the trick is to think about how it's implemented.

It doesn't help that in C++, the programmer-facing semantics of inline are much more straightforward. If you want to hint that a function should be inlined, put inline (or, for a member function, define it within its class).

inline int foo() { return 42; } // this is C++, not C!

The toolchain takes care of the rest. That's because C++ implementations demand a smarter linker that can provide “link once” semantics, sometimes called COMDAT, for inline functions. Each compilation output includes its own out-of-line copy of the function, and all but one are later thrown away by the linker.

In C99 and later, inline has been specified, but in such a way that the linker need not support COMDAT. The resulting rules are a bit weird—they care about cases where we have multiple declarations of the function, and whether we use inline and extern consistently across each of them. We can still write things like this....

inline int foo() { return 42; } // also C! but semantics different from C++...

... but what they mean is different than in C++. In what follows, I'll dig into this in rather too much detail.

C inlining

The meaning of our snippet above is valid C, but its meaning varies depending on what else appears in the same translation unit. To quote from the C11 spec....

If a function is declared with an inline function specifier, then it shall also be defined in the same translation unit. If all of the file scope declarations for a function in a translation unit include the inline function specifier without extern, then the definition in that translation unit is an inline definition. An inline definition does not provide an external definition for the function, and does not forbid an external definition in another translation unit. An inline definition provides an alternative to an external definition, which a translator may use to implement any call to the function in the same translation unit. It is unspecified whether a call to the function uses the inline definition or the external definition.

The above is telling us that if I write

inline int foo();
inline int foo() { return 42; } // also C! but semantics different from C++...

... it's different from

int foo();
inline int foo() { return 42; } // also C! but semantics different from C++...

... and perhaps also different from the following.

extern inline int foo();
inline int foo() { return 42; } // also C! but semantics different from C++...

There is a logic to all this. If we write inline in a function's declaration in a given compilation unit, it's reasonable that we have to provide a definition for it too. We can't inline something if there's no definition. The question is whether that definition is also emitted for out-of-line use. In C++ the answer is “always, although it may get thrown away later”. In C the answer is “it depends on what is declared”.

We can have one or more declarations for the function in the translation unit. Let's consider two cases. First, there's the case where we always put inline on the prototype. That means none of the declarations can be mistaken for a normal externally-visible function. And hey presto, this does indeed mean that such a compilation unit “does not provide an external definition for the function”. The definition we provide is only for inlining.

However, the standard also tells us that this “does not forbid an external definition in another translation unit”. And in fact, the compiler is allowed to assume one exists! As usual, it doesn't have to inline the function in the generated code; it could generate an out-of-line call. So you'd better have another compilation unit that does things differently....

This is where it helps that standard also tells us that “a file-scope declaration with extern creates an external definition”. In other words, by including a prototype declared extern, we instantiate the inline in that translation unit. Of course, for any particular function we should only do this one in one translation unit, otherwise we'll get “multiple definition” link errors.

How do we normally use all this? Typically, we put the inline definition in a header file, and then in a single .c file we include an extern prototype to instantiate the out-of-line copy in that file. Hey presto, we have avoided COMDAT linking, by forcing the programmer to decide which compilation unit gets the out-of-line copy. That's the main idea.

It's pretty confusing though. Normally extern on a declaration means “not defined in this compilation unit”. By contrast, on an inline function it means the opposite. Even aside from that, the whole set-up is a bit wacky in that it's relying on the inconsistent use of decorators to carry information. Normally we'd perceive an inconsistency like this as a code smell and expect the semantics to be, if not an error, then as if the many declarations were somehow merged into a single canonical signature that is “what you could have written”. But here we are required to use multiple signatures just to convey the intended semantics.

One final quirk is that there is also static inline. This is much more straightforward because although it might generate an an out-of-line copy, it would be a local symbol specific to the file in question, so does not cause link-time complexity about missing or multiple definitions. It's a good option if you just want some quick utility functions, but risks code bloat if the functions are large and/or often-used.

Adding GNU C

Of course, things get even more complicated because before C99, GNU C introduced inline with slightly different semantics. These can be seen in a third case where we mark a definition as extern inline (above it was just a declarationt where we put extern). In GNU C, this means the definition is only available for inlining, never for emission as a visible symbol. It's in fact just like prototyping it as inline in C99 and never using extern.

These old-style GNU semantics can still be requested of GCC by a command-line option (-fgnu89-inline), even when ISO dialects are requested otherwise. They can also be chosen on a per-function basis using the gnu_inline attribute. If __GNUC_STDC_INLINE__ is defined, the file-level default is the “standard” semantics, otherwise we get the GNU-style one.

It's common to see codebases use extern inline combined with gnu_inline and always_inline to get the effect of a macro: the inline function is always inlined and never emitted as an out-of-line function. (However, it turns out that even with these GNU semantics, if we have an “extern inline” definition that we elsewhere also declare as “[regular non-extern] inline”, this “only for inlining” property goes away, and an out-of-line instance is generated.)

A pattern that doesn't work

All this complicates how we deal with our header files. Ideally we'd have one set of header files, which we can freely include where we need them, and the issue of where to instantiate out-of-lines is a matter for .c files only. That ideal mostly works: we can use extern inline in a single .c file to do the instantiation.

However, some more complex scenarios turn out not to work. What if we want a function to be “opportunistically inlinable”: some compilation units will have an inlinable definition to use, but others won't and should emit an out-of-line call. This might be because the definition itself is change-prone, so is better left in a private header but not pushed to clients, who should see only the out-of-line prototype in a shared public header. Makes sense? Unfortunately this shared public header sets up an unwinnable situation: if its declaration is marked inline then this will be wrong for the external compilation units, at best generating a spurious compiler warning (inline declaration but no definition). If the public declaration is not marked inline, then all the “private” compilation units, which do include an inline definition, will instantiate the out-of-line copy, causing “multiple definition” link errors later. The only solution I know to this is conditional compilation: define a macro IN_MYPROJ in the private codebase, and prototype the function like so.

#ifdef IN_MYPROJ
int myfunc(int arg);

External clients then won't know it's inlinable, but a private header can define the function inlineably and a (unique) internal compilation unit can instantiate it in the usual way.

A test suite

I wrote a single-file “test suite” that elaborates the space of inline and extern inline usage and tests what a given compiler does with all of them.

Consider a function with three declarations: the one on the definition, and two standalone ones. Each of these can be independently set to inline, extern inline, or unqualified. This creates 27 distinct cases. In each case, we test at run time whether the function really was inlined, at a test call site (using some funky introspection from within the call). In cases where we expect an out-of-line copy not to be generated, we generate one in assembly, so that any additional copy would cause a link error. (This applies only to some of the GNU cases and the ISO “inline, inline, inline” case.) In cases where this copy is not expected to be called, because inlining should happen at our call site, we give it a body that will abort with an error.

In the actual test file, which you should take a look at if you've got this far, the cases are heavily macroised so they really just look like counting up in a weird ternary number system where the digits are inline,   (empty) or extern inline.

CASEn(1, inline, inline, inline, assert(!OOL_EXISTS); assert(!AM_INLINED); /* UND ref created! */ )
CASEo(1, inline, inline, inline, assert(!OOL_EXISTS); assert(AM_INLINED);)
CASEn(2, , inline, inline, assert(OOL_EXISTS); assert(!AM_INLINED); )
CASEo(2, , inline, inline, assert(OOL_EXISTS); assert(AM_INLINED);)
CASEn(3, extern inline, inline, inline, assert(OOL_EXISTS); assert(!AM_INLINED); )
CASEo(3, extern inline, inline, inline, assert(OOL_EXISTS); assert(AM_INLINED);)

Optimization affects the expected semantics, so there are actually 54 cases: each of these 27 tests is instantiated once with optimization turned on at the call site (the “o” case) and once without (the “n” case). We make the simplifying assumption that, since our test inline functions are small, optimization implies that it will be inlined. (I did not attempt to test the GNU always_inline attribute, but its behaviour is basically to swap the expected behaviour from “unoptimized” to “optimized”.) In eleven of the 27 pairs, the expected behaviour is different with old GNU-style semantics (e.g. from-fgnu89-inline) than with ISO semantics, so we use conditional compilation to assert the right things.

If 54 cases sounds like a lot, then arguably they are a bit redundant. Of the 27 pairs, one is degenerate (no inlining at all), and of the others, arguably those that duplicate a set of qualifiers between two declarations are redundant. For simplicity I chose to expand them all out anyway.

The test code itself is pretty unportable, so do read the caveats at the top of the file, and take care that you build it with the necessary link options to allow the funky introspection to work (principally -Wl,--export-dynamic).

I should add that my motivation for fully getting my head around all this when I found that inline semantics in CIL were subtly broken in some cases. In short, CIL's output was not reproducing function declarations faithfully with respect to the input, so it could unwittingly change the inlining behaviour of code as it passed through. I've fixed this in my personal branch, to which I have also added my test case. It's on my list to contribute a lot of my CIL improvements back to mainline. I have a lot of this to do....

[/devel] permanent link contact

Thu, 06 Oct 2022

How to do link-time symbol wrapping... as a plugin

I wrote recently about the problems with link-time symbol wrapping using the --wrap option to GNU linkers and similar. In short, references don't get wrapped in some cases when they arguably should, if definition and use are in the same file. But there is another way, using symbol replacement, that can do such wrapping if we're careful about self-references and the like.

I've now created a plugin for the GNU BFD or gold linkers that implements, in effect, a command-line option for wrapping that is mostly interchangeable with --wrap except that it handles those cases as described. I've tested it with the test cases I documented last time, and it gives the right behaviour. (But that's almost as far as testing has gone!)

It's called xwrap-ldplugin (for “extended wrap”) and is part of my elftin repository. It should be moderately easy to build. Apart from standard libraries it depends only on librunt (build-time dependency only), on the normrelocs tool in the same repository (build that first!), and on the binutils plugin-api.h file that defines the linker plugin interface. Once built, you can load the plugin with -plugin and give it a sequence of -plugin-opt argument strings that are the symbol names to be wrapped, much as if each had been prefixed with --wrap and included on a standard command line.

Caveat emptor! The semantics in respect of shared libraries may be a bit different to --wrap. I haven't tested these cases very well, but I did find one important case where my new approach requires modification. What if the wrapped symbol is defined only in an “input” .so file, so is not going to be defined locally in the output? The muldefs approach works by first adding a __real_ alias of the wrapped symbol, then replacing that symbol with the wrapper. But we can't add this alias to a shared library on the system, only to a relocatable .o file. The solution is just to fall back on --wrap for these symbols, since they are always ones that the standard wrapping behaviour handles properly.

I'm not a fan of plugin architectures generally. They tend to work only for anticipated use cases, and their APIs tend to suit the creators much more than their users. They tend to embed a lot of incidental detail while often forcing clients to reinvent wheels. In this case, much about the linker invocation is not exposed through the plugin API. Since it's a C API not a string API, information from the command line needs to be “forwarded”, or “bound” into it—always a bad pattern. One of the things my plugin cares about is the list of --wrap-treated symbols; we want to inspect this and maybe add to it (as covered above). Only the most bleeding-edge versions of the API even let me get the list of symbols that the command line is asking to --wrap, and there's no way to add to the list. There's also no way to enable the magic -z muldefs option that the new approach relies on. Even bread-and-butter stuff is missing: you can add an object to the link, using ld_plugin_add_input_file, but you don't get to say anything about its position in the command-line link order. This really matters when there are archives involved. I'm not criticising the authors; adding all these features is really work. It's the concept that is the problem.

Sequencing of actions is also a problem with callback-based plugins. Although the current plugin API can tell me what symbols are defined where, at the all_symbols_read event, this happens relatively late on in the link; it might well be too late to add new --wrap options (if that were even possible). In general, it's just the nature of data dependency that one application of the same code might need to know some things earlier than another one does; a fixed sequence of callbacks is asking for trouble. I need to be able to get the contents of the input objects at a time that works for my application, meaning before the set of --wrap options is decided, whereas for the vanilla linker, that ordering constraint just doesn't apply.

(If adding new wrapped symbols after resolution sounds a bit circular—wrapping will change what named symbol is defined where!—that's because it is. Linkers contain some quite subtle phase-ordering problems. I wrote about this in section 5.3 of the REMS project's ELF linking paper.)

Let's think about an alternative. I can think of two. The first is a library API not a plugin API. What's the difference? In short, a library is reactive not proactive. It gives a blank canvas to the client programmer and provides stuff for them to use, in whatever order it likes. By contrast, the plugin gives the client a temporal structure it might not want: specific callback points spliced into an otherwise hard-shell tool. If the linker were itself architected mostly as a library—including routines for, say, parsing command lines—plus merely a one-page main function that glues together the library's primitives in the vanilla way, we would have two useful degrees of freedom. We could write a new one-pager if the vanilla sequence doesn't work for us. Or if it does, we could make do with overriding specific operations that it calls. If these sound suspiciously like common workflows in object-oriented programming, there's a reason for that. Callbacks that add logic let us override in an “around” sense only. This is a subset of the possibilities in a classic O-O inheritance or delegation set-up.

The concept of the GNU BFD is not too far from this library-plus-client-glue idea, but the glue part is more than just glue: all the stuff to do with linker scripts and command lines and so on is still inside the linker client, not a library. One could imagine a library-style refactoring of the linker which does realise this sort of extensibility.

Instead of libraries, another alternative would be to say that the command-line interface is the only interface. Any extension then works as a rewriter of linker invocations. This is a much more black-box approach; it is what an adapter or wrapper script does, and is an annoyingly good fit for the problem at hand here. It satisfies a nice Occam-like property: interfaces should not be multiplied beyond necessity! We're spared from working with a brand new interface consisting of weirdly specific, weirdly limited callback handlers that don't give us the access we need. On the other hand, strings are not as rich as C data structures, and we forgo the potential benefits of sharing data structures with the linker or of fine-grainedly borrowing its working state (such as might be passed to callbacks!). These can all allow us to achieve our ends more efficiently—whether with less repeated work at run time or less repeated code at development time.

Since I was committed to writing a plugin, even if just for fun, I used some hacks to get the “two out of three” solution: using the callbacks as best I could, but also allowing rewriting of the command line. The first hack was to write my own parser of the linker command line, (for want of a library version!), and snarfing the original command line from the auxiliary vector. My command-line code is pretty fiddly, relatively unportable, incorrect in some cases, and will need to be maintained (somewhat) as new options are added. If the existing parser could be isolated behind a library interface, this could be avoided. For now, at least I can get at any information in the command line.

The second big hack, and my biggest hammer against these plugin API shortcomings, is a self-restarting mechanism. This is really nasty, but also pleasing if you like nasty things. Essentially it's doing command-line rewriting from within the linker, to avoid the need for a separate wrapper script. If I detect a condition on which I need to change the command line, such as needing to add a --wrap option, I simply restart the link using execve(), using a command line that is modified accordingly. Care is needed to avoid an infinite loop of restarts.

As an approach to extending an arbitrary program, this is a terrible idea: it might leave behind all kinds of weird filesystem state and partial or repeated outputs. Or maybe we've consumed some inputs and so they're just not there any more on restart. But since a linker is basically a pure function over stored inputs, we can restart at any time reasonably harmlessly. Even if it has already created some or all of the output file (unlikely in this case), it will just be overwritten. Restarting does, however, create a hazard around leaking temporary files: if any part of the linker happened to create a temporary file it expected to clean up later, this may instead be orphaned by the execve(). It is possible to create temporaries defensively so that they will be unlinked by the time any execve() happens; I take that approach in my code, but of course, other plugins' authors, or the linker's own authors, won't know that they need to. So the restarting trick is definitely a workaround rather than the good engineering solution, whic would be to write and contribute the necessary extensions to the plugin API.

So, do feel free to check out the plugin if you'd like to do symbol wrapping, while being aware that it's definitely not production-ready code. Aside from the risk of leaking temporary files (for debugging, it currently even omits to clean up its own temporaries) it completely fails to report various errors. It's still useful though. Contributions are welcome and I will keep making small improvements as time allows.

[/devel] permanent link contact

Wed, 03 Aug 2022

How and why to do link-time symbol wrapping (or not?)

It's sometimes useful to hook functions at link time. On ELF platforms (among others) a GNU-style linker offers the --wrap option which seems to be exactly what we want.

           Use a wrapper function for symbol.  Any undefined reference to symbol will be
           resolved to "__wrap_symbol".  Any undefined reference to "__real_symbol" will be
           resolved to symbol.

Unfortunately there are several problems that arise when using this. In this post I'll cover some ways to solve these problems. Some of the ways I've found, others I've created. At the end, I'll discuss whether we should be doing any of this.

Problem 1: wrapping file-internal references

As the helpful manual page states, --wrap is not a complete solution.

Only undefined references are replaced by the linker.  So, translation unit
internal references to symbol are not resolved to "__wrap_symbol".  In the next
example, the call to "f" in "g" is not resolved to "__wrap_f".

       f (void)
         return 123;

       g (void)
         return f();

Why not? The text implies that it's because there's no undefined reference to f. This means a relocation record referring to a symbol that is undefined (UND) in the referring file. So, if we move either f or g into a separate file, we can ask objdump -rd to show us the two files.

$ objdump -rd f.o g.o
f.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 7b 00 00 00          mov    $0x7b,%eax
   9:   5d                      pop    %rbp
   a:   c3                      retq 

g.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   e8 00 00 00 00          callq  9 
                        5: R_X86_64_PLT32       f-0x4
   9:   5d                      pop    %rbp
   a:   c3                      retq 

As expected, we see a reference to f from g, marked by a R_X86_64_PLT32 relocation record. In this case, --wrap will work perfectly, by binding instead to __wrap_f (assuming we supply one). This works because f is an undefined symbol in g.o.

$ objdump -t g.o | grep 'f$'
0000000000000000         *UND*  0000000000000000 f

By contrast, if we keep f and g in the same file, we get something that looks similar but is actually quite different.

$ objdump -rd testwrap.o
testwrap.o:     file format elf64-x86-64
Disassembly of section .text:
0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 7b 00 00 00          mov    $0x7b,%eax
   9:   5d                      pop    %rbp
   a:   c3                      retq 

000000000000000b :
   b:   55                      push   %rbp
   c:   48 89 e5                mov    %rsp,%rbp
   f:   e8 00 00 00 00          callq  14 
                        10: R_X86_64_PLT32      f-0x4
  14:   5d                      pop    %rbp
  15:   c3                      retq 

To see the difference we have to look at the symbol for f that the relocation record is referencing.

$ objdump -t testwrap.o | grep 'f$'
0000000000000000 g     F .text  000000000000000b f

This time it's a defined symbol. That means g's reference to f is already bound to the definition of f. There is no separate identity for “the thing, called f within the code of g, that g calls”. There is just “the definition called f”. Worse, that's directly how things came out of the assembler. Since no link-time action binds this reference, link time is too late to interfere.

This is an awkward non-compositionality when assembling and linking ELF or most other object code: if we group multiple parts into a single file, such as by refactoring what was once many source files into one, at link time we get a less descriptive model of the referential structure of the whole. Binding has occurred eagerly, and we have no way to talk about “references requesting a thing called f” as distinct from “references to a definite thing, namely f”. Since --wrap relies on the distinction, it can't have any effect.

From now on I'll refer to this pre-binding issue as “problem 1a”.

In fact there is another way for --wrap to go wrong, which is if we want to wrap a static (file-local) function. Although wrapping a static function is less common, it might still be desirable. Reliably wrapping local calls to statics is intractable without going back to the source, because they will often have been inlined. But when a static function is address-taken, it might end up being called from far and wide within the program; we might reasonably want to instrument this so that actually, a wrapper is the thing that gets address-taken. Problem 1a guarantees that this cannot possibly work: references to static functions are always intra-file, and therefore to a defined symbol. However, even if problem 1a were solved, we'd have another problem: at the point where we reference static function f, on many architectures, the compiler can use a relative jump or call. Within the same text section, such jumps don't even need a relocation record, so the linker has no clue that reference is even happening. This is problem 1b. Later on I'll discuss ways to solve it, but let's worry about the basic non-static case initially.

Naive solution 1a: symbol unbinding

To solve problem 1a, intuitively we want some way to “unbind” these assembler-bound references. There is no off-the-shelf way to do this (trust me! but I discuss this further below). Way back in 2008 I wrote a patch (unfortunately buggy graduate-student code) to GNU objcopy that extends it to perform unbinding, adding an option --unbind-sym SYM.

$ ./objcopy --help
     --unbind-sym             Separate <sym> into  __def_<sym> and __ref_<sym>

For wrapping, I would use this followed by a second objcopy to change __def_f back to f and __ref_f to __wrap_f, so that references will go to the wrapper, just as we wanted.

I submitted my patch on the Bugzilla in July 2008. (Intriguingly, to this day it still shows as “NEW”. Clearly the Bugzilla was the wrong place to post it.) Some years later I even fixed the code, integrated it into a forked binutils tree which I then kept vaguely up-to-date for a few years. The story doesn't end there, but it's a viable start. So let's return to problem 1b.

Naive solution 1b: -ffunction-sections and objcopy --globalize-symbol

I mentioned that wrapping a static function is sometimes desired, but that a call to a static function might not even have a relocation record. There's no hope of diverting to a wrapper function at link time if the linker doesn't have any relocation to apply. Consider this code.

static void *myalloc(size_t size)
        return malloc(size);
void *(*get_allocator(void))(size_t)
        return myalloc;

If we use objdump -rd on the resulting object code, we see this.

Disassembly of section .text:

0000000000000000 :
  (snip code of myalloc)

000000000000001a :
  1a:   55                      push   %rbp
  1b:   48 89 e5                mov    %rsp,%rbp
  1e:   48 8d 05 db ff ff ff    lea    -0x25(%rip),%rax        # 0 
  25:   5d                      pop    %rbp
  26:   c3                      retq 

Notice the lea whose argument points back to the definition of myalloc. It is just referring to a point earlier in the same text section; no relocation is needed.

The fix is to compile with -ffunction-sections. Since each function gets its own section, calls to a static function are nevertheless exiting the section, so require relocation (sections are the unit of placement in memory). Even references to static functions will come with a relocation record.

(In contexts where you don't control compilation, -ffunction-sections might not be an option, but you could in principle write a tool that walks over the binary, decodes its instructions and re-adds relocs at instructions that address in-section memory locations.)

If we do this, we get the following.

Disassembly of section .text.myalloc:

0000000000000000 :
  (snip code for myalloc)

Disassembly of section .text.get_allocator:

0000000000000000 :
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # b 
                        7: R_X86_64_PC32        .text.myalloc-0x4
   b:   5d                      pop    %rbp
   c:   c3                      retq 

In other words we now have two different sections and a relocated reference between them. If we compile like this, then promote the local symbol to a global, using objcopy --globalize-symbol we should be able to wrap it like any other... job done? Not quite!

Real solution 1b: preferring non-section relocs

Notice that the relocation record above isn't relative to a symbol called myalloc; it's relative to the section symbol .text.myalloc. So the reference won't be unbound if we ask to unbind myalloc! We also can't just name .text.myalloc as the symbol to unbind since (among other problems) the linker doesn't perform wrapping on section symbols anyway.

As a quick fix, I provided a way to eliminate use of section symbols in relocs where a named function or object symbol was available and could be used instead.

$ ./objcopy --help
     --prefer-non-section-relocs   Avoid section symbols in relocations, where possible

This is progress... although it seems nasty. The nastiness does not end there. Let's keep track of what we've learned about how to handle these tricky questions.

Uses of a definition should be wrapped, so force them to use an undefined symbol, unbinding reference from definition if both are present in the same file.
Uses of a definition should be wrapped, and only ordinary symbol-based relocs can be wrapped, so force these references to use ordinary-symbol-based relocs, not section-symbol-based relocs.

Real solution 1a: refinements to unbinding

Even without worrying about static functions, some similar issues emerged in the course of using my naive unbinding tool. Just as above we worried about “section references”, there are also “self-reference”, and “meta-reference” cases.

Meta-references are (my name here for) references from debugging information, or similar metadata, to “base-level” code or data. This doesn't just affect debugging per se; unwinding information also includes these meta-references, and is critical to C++-style exception handling. Naturally, metadata about compiled code wants to refer to specific places in that code. And naturally, to do so it uses relocation records bound to defined symbols. For example, the .eh_frame unwind information section for a hello.o, when dumped by readelf, looks like this:

00000018 000000000000001c 0000001c FDE cie=00000000 pc=0000000000000000..0000000000000017
  DW_CFA_advance_loc: 1 to 0000000000000001
  DW_CFA_def_cfa_offset: 16
  DW_CFA_offset: r6 (rbp) at cfa-16
  DW_CFA_advance_loc: 3 to 0000000000000004
  DW_CFA_def_cfa_register: r6 (rbp)
  DW_CFA_advance_loc: 18 to 0000000000000016
  DW_CFA_def_cfa: r7 (rsp) ofs 8

... but actually the binary also includes relocation records for the section, to refer to the actual address of its text section.

Raw data contents of section .eh_frame:
  0000 14000000 00000000 017a5200 01781001  .........zR..x..
  0010 1b0c0708 90010000 1c000000 1c000000  ................
  0020 00000000 17000000 00410e10 8602430d  .........A....C.
                        20: R_X86_64_PC32       .text
  0030 06520c07 08000000                    .R......      

This is what ensures that if you dump a fully linked hello binary, the line showing pc=0000000000000000 above will instead actually mention the real address of the main function. (If you're wondering: stock objdump can't dump data sections with relocation records interspersed like I've shown above. But my objdump-all script can.)

The whole idea of “unbinding” is to wrench apart bindings to defined symbols. But clearly we don't want to unbind debugging information from the code is describes! We just want to unbind calls and address-takes. Unfortunately, at the object code level, it's not trivial to distinguish these from meta-references. In my patch, the best I could do was the following.

+  /* What is a self reloc? It's either
+   * - in a debug section;
+   * - in .eh_frame (HACK);
+   * - within the dynamic extent of the symbol to which it refers. */
+  bfd_boolean is_debug_or_eh_sect =
+    (bfd_get_section_flags (ibfd, isection) & SEC_DEBUGGING)
+         || 0 == strcmp(bfd_get_section_name (ibfd, isection), ".eh_frame");

We then avoid unbinding references coming from a debug or unwind section. For better or worse, BFD doesn't count .eh_frame as a debugging section (certainly one wouldn't want strip -g to strip it) so I had to recognise .eh_frame by name.

Another way one might think we could have got the same effect is simply by exploiting the behaviour that gave us problems with statics earlier: we don't unbind relocations going via section symbols. We saw above that the relocation used the .text symbol. Maybe debugging relocs always use section symbols? If so, we'd be all clear. Sadly, this is mostly true but it is not always true. If you grep the objdump of a large build tree, you'll find some debugging sections using ordinary symbol relocs. We could imagine handling these with a converse of --prefer-non-section-relocs, which for debugging sections prefers the section-symbol-based relocs. I haven't coded this up in binutils, but it would be worth having; let's add it to our rules.

Meta-references should not be wrapped. They are always local within a file, so force them to use section-symbol-based relocs, which won't be wrapped.

One reason I haven't implemented it yet is that we'll soon see some cases it can't help with. (However, actually I have coded it up in a standalone tool; more below.)

As well as meta-references, there are self-references. Many cases of self-reference are actually, like meta-reference, referencing the internals of a function. For example, on architectures with only absolute branches, a goto would generate a relocation record. Again, we don't want to unbind it so that it gets redirected to the same offset within a wrapper function; that would be meaningless! Modern architectures use relative not absolute branches in the direct case. But once we start doing indirect branches we usually do start needing relocations. Here is some GNU C code that generates a self-reference on many architectures (albeit still not x86-64) by taking an internal address.

#include <stdio.h>
int main(void)
        printf("A function-internal address: %p\n", &&resume);
        return 0;

On 32-bit x86 we get the following.

Disassembly of section .text:

00000000 <main>
   0:   8d 4c 24 04             lea    0x4(%esp),%ecx
   4:   83 e4 f0                and    $0xfffffff0,%esp
   7:   ff 71 fc                pushl  -0x4(%ecx)
   a:   55                      push   %ebp
   b:   89 e5                   mov    %esp,%ebp
   d:   51                      push   %ecx
   e:   83 ec 04                sub    $0x4,%esp
  11:   83 ec 08                sub    $0x8,%esp
  14:   68 26 00 00 00          push   $0x26
                        15: R_386_32    .text

At the end of the listing we see the rightmost argument to printf, which is pushed first, being formed by a self-referencing reloc against the .text section.

We clearly don't want to unbind this sort of thing. It's tempting to think that we should just let all self-references go undisturbed. But what about this?

int call_me(void *arg)
    if (...) return 0; else return call_me(...);

Or this?

int is_me(void *arg)
    return (arg == is_me);

What should wrapping mean in the case of self-reference via an ordinary symbol? Both of the above are compiled as referencing an actual public definition, via its function symbol, rather than referencing internals via a section symbol. The compiled code for is_me looks like this.

0000000000000000 <is_me>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
   8:   48 8d 05 00 00 00 00    lea    0x0(%rip),%rax        # is_me <is_me+0xf>
                        b: R_X86_64_PC32        is_me-0x4
   f:   48 39 45 f8             cmp    %rax,-0x8(%rbp)
  13:   0f 94 c0                sete   %al
  16:   0f b6 c0                movzbl %al,%eax
  19:   5d                      pop    %rbp
  1a:   c3                      retq 

Intuition suggests that probably the recursive call should be wrapped, but the address take not. But why?

To explore just how tricky the semantics of identity can get, let's divert ourselves to ask: can the is_me function above ever return false when passed itself? On dynamically linked ELF platforms (or in fact any SunOS-style dynamic linking system), the answer is yes, because is_me can be overridden by another object (e.g. the executable or a preloaded library). In such a context, the identity comparison is then done against this other definition, i.e. the function isn't really checking for itself at all. That's a bug: the code really doesn't want to refer to “the active definition of a thing called ‘is_me’”. It wants to really refer to itself, really! To make it reliable under dynamic linking, one would change the code above to instead test against a hidden alias, which can not be overridden.

extern int really_me(void *arg) __attribute__((alias("is_me"),visibility("hidden")));

What about the analogous problem with wrapping? If we really want to reference ourselves, there should be some wrap-proof way to do so. Luckily, in my current objcopy unbind logic, I don't unbind self-references at all.

+              /* Okay, the reloc refers to the __ref_ symbol. Is the relocation site
+               * in the same section as the __def_ symbol? */
+              bfd_boolean is_self_section = nu->def_sym && nu->def_sym->section == isection;
+              bfd_boolean is_self = is_debug_or_eh_sect || is_self_section;
+              if (!is_self) continue;

Here the code treats specially any intra-section references. If we compile with -ffunction-sections, only true self-references will be intra-section. In more general contexts, the code above it will leave alone any calls that are bound to a target within the same section, whether to the same function or not. (Note that in relocatable code, even ordinary symbols are defined relative to a section. The section of the defined symbol is what matters in the test above; the symbol itself could be a section symbol or an ordinary symbol.)

Since we currently exclude self-calls from unbinding, then by analogy with our dynamic linking case, the effect is equivalent to “all self-reference is as if to a hidden alias”. In short, self-reference is non-interposable by --wrap.

This, however, is the wrong thing. In the earlier example of a recursive call, we said that intuitively, we might well want the self-reference to be unbound and (hence) wrapped, at least in some cases. The function is simply using itself; it can happily do so indirectly via the wrapper. I went with the simpler no-unbind behaviour only because I'm lazy and I didn't face such cases in my application, which is wrapping allocation functions. These are non-recursive by nature. However, my application does have code similar to is_me: a function that wants to be able to recognise its own address. Since unbinding that self-reference broke my code—the function ended up erroneously looking for its wrapper's address—I just stopped unbinding self-reference, by adding the is_self_section predicate above.

If that's wrong, what is right? We need a more nuanced understanding of reference. It's the nature of wrapping that a reference to plain f has become ambiguous; we have to resolve the ambiguity one way or another, either to the wrapper or to the real thing. The old linguist's distinction of “use” versus “mention” is a good fit here. Meta-references, which we've already dealt with, are pretty clearly mentions, not uses. Self-knowledge, like with is_me, is also a mention. But there is also self-use; a recursive call would count. Correct wrapping requires telling these apart, so that self-use can go via a wrapper but self-mention can be direct. This is important.

Self-references may be self-mentions or self-uses.

How can we distinguish self-reference from self-use at the object code level? Unlike with interposition in dynamic linking, an ELF-level protected or hidden-visibility self-alias would not help, because --wrap does not pay any attention to symbol visibilities. It happily wraps a reference to a hidden definition, and/or from a hidden UND. We could layer a rule on top, particular to our unbinding tool, say skipping self-unbinding in the case of hidden symbols. However, this is an abuse of hidden and would stop us ever wrapping a hidden symbol, which we might still want to do.

Instead of a protected or hidden alias, there's another alias which is tailor-made for this ambiguity-resolving purpose: the “__real_” alias! It's an alias that is magically defined only when wrapping, and it's exactly what we want. To do self-reference that is not wrapped, this wrapped function f needs to refer to to __real_f instead of just f. It does mean that the function needs to be written in the knowledge that it's going to be wrapped (or might be). If it wasn't written with this knowledge, a custom fixup pass could be done on its assembly or object code.

The program must disambiguate self-mention from self-use, by using __real_ for self-mention, and leaving self-use the default.

Finally, self-reference has yet another case to worry about, which I'll call “self-reference at a distance”. and arises if inlining is involved. It poses a real problem for correct wrapping, even without unbinding. Imagine an is_me-style function that gets inlined into its caller. This will generate an outgoing reference to is_me, a symbol defined elsewhere, maybe in the same file or maybe further afield. Now suppose we decide to wrap is_me. If it is defined in the same file, we will unbind this outgoing reference (it's not a self-reference) and wrap it. If it isn't—say if is_me is an inline and its out-of-line copy is instantiated in a different file—we don't need unbinding; plain old wrapping will divert the reference into the wrapper. Both of these are wrong! It's still a mention, not a use. The function really wants to get hold of its own identity, not its wrapper's identity. The only change is that the reference now appears to be non-self; it is coming from somewhere distant, thanks to the inlining. As before, the solution is that it must refer to __real_f. In short: in the presence of inlining, any self-mentioning function must at source level use __real_ if it is possibly going to be wrapped. The compiler will know how to emit that reference whatever the inlining context.

Here we can further critique our earlier not-quite-right idea of leaving section-symbol-relative relocs undisturbed but always wrapping ordinary-symbol-relative ones. I mentioned that debugging information sometimes makes reference via ordinary symbols. The cases we just considered must use an ordinary symbol reference, both with or without the use of __real_, because it may refer to a definition in another file. In non-inline cases, instead of using __real_ we could imagine asm hackery inside a function to get hold of a section-relative reference to its own start address. In fact, I tried it. But if inlining might happen, this won't work. The moment this asm trick gets inlined into a caller, it will at best generate a reference to the entry point of the enclosing non-inlined caller, which is not right. My attempt generated a completely spurious offset in the middle of the enclosing function's code, which was not fun to debug. (The intersection of self-referencing functions and inline functions is small, so this issue will arise rarely. But I managed to find it.)

What's to say this use-versus-mention distinction is only an issue for self-reference? There might be mentions that are non-self, i.e. one function mentioning another, but not using it, meaning that semantically it should not be wrapped. An example might be a function that inspects the on-stack return address in order to behave differently depending on which function is calling it. If the caller happens to be wrapped, it still wouldn't be correct to test for the wrapper's address range, since the wrapper is never going to be the immediate caller. Since introspecting on one's caller is a kind of metaprogramming, we could bucket this as another kind of meta-reference; either way, it's clearly a mention not a use. This practice is rather perverse and I haven't run into such cases. But the same solution would apply: mentions of a possibly-wrapped function need to reference __real_.

Another issue with meta-levels is that there can be more than two. I have in the past had cause to write functions of the form __wrap___real_f, i.e. linking with both --wrap f and --wrap __real_f. Perhaps surprisingly, this works to an extent—the first wrapper, which appears to call __real_f, does indeed call into the second, __wrap___real_f. Annoyingly though, referencing __real___real_f doesn't work. This is because there never was a genuine __real_f symbol definition; binding of a __real_-prefixed reference relies on a fiction in the linker, and that is not applied recursively to generate a fictional __real___real_f and so on. Instead I resort to finding the real definition through an alias I created specially earlier.

This is suspiciously starting to resemble an alternative way of doing wrapping without --wrap, as I'll cover next.

Solutions 1a′ and 1b′: doing it in standalone tools

At some point it became clear that my objcopy patches weren't going to be mergeable into GNU binutils. The treatment of debugging sections alone is a giant hack—especially the unwind sections, which I could recognise only by name. I did persist with maintaining my patch for a while (or even letting someone else do it) but maintaining a fork of binutils is no fun. Its long build time was a burden for people wanting to try out my project.

I spent ages trying to find a way to get the desired unbinding effect using only stock objcopy and ld, but concluded that it's not possible. While objcopy lets you add and remove and rename symbols, none of these is right. Adding a new symbol won't make relocs use it, and in any case it won't let us add a new undefined symbol, only a defined one. Renaming a symbol simply leaves all the relocs pointing at this now-renamed symbol. And removing a symbol isn't what we want—in fact objcopy will refuse to do it, on the grounds that there are relocation records depending on it. We want to somehow “make undefined” the existing symbol, so that previously bound relocs will become unbound. Even if we could do this by first adding a new undefined symbol (which we can't), we'd need a way to “switch over” relocation records from one symbol to another. Sadly, objcopy gives us no way to manipulate relocation records as distinct from the symbols they reference.

I therefore wrote a really minimal tool, sym2und, which looks for a symbol of a given name in a given .o file, and changes it to be undefined. With that, we can use the stock tools to complete the job by first adding __def_SYM as an alias of SYM (using ld -r or objcopy --define-sym), then changing SYM to be undefined (using sym2und), then renaming the undefined symbol to __ref_SYM (using objcopy again).

What about the static function case (problem 1b)? For that, as before, we need -ffunction-sections and objcopy --globalize-symbol. And we want a standalone tool in place of my patch's --prefer-non-section-relocs option, to ensure an address-taken function is referenced using the symbol name we expect, rather than a section symbol that we don't. This is again quite simple to write as a standalone tool over ELF files; my version is called normrelocs. Just like my patched objcopy, it needs to skip references in debug sections, to avoid hooking meta-references that happen to point to the entry point of the static function.

What about meta-references, self-references and self reference at a distance?

Meta-references via ordinary (non-section) symbols will be disrupted if the symbol is wrapped, because they will keep referencing the old symbol, which becomes undefined and is later resolved elsewhere. We can use the same debugging-specific hack to extend normrelocs with a converse case which flips from-debug references to use the section symbol.

Self-references that are “mentions” need to be directed at __real_, which will work fine (not wrapped). Self-uses will be diverted to __ref_ which mens they will be wrapped, as we intend.

Self-mentions-at-a-distance simply need to use the __real_ symbol. That is never defined locally in the file, so these will not be wrapped, which is correct.

Solutions 1a′′ and 1b′′: an equivalent?

The approaches we just saw worked by very local surgery to single .o files, in order that a later link-time --wrap option would take fuller effect.

There is a way to reduce (but not eliminate) the need for these custom tools, if we don't mind abandoning --wrap and using another linker feature to do the wrapping. It's a bit fragile, because it is sensitive to the order of .o files on the command line, which normally doesn't matter and which build systems and compiler drivers can make surprisingly difficult to control. It's a big hammer, because it suppresses what would otherwise be link errors. And it doesn't escape the need for an earlier fixup pass, both to handle the statics issue (by normrelocs or similar) and also to create an alias serving the purpose of __real_SYM. It does, however, avoid any need for unbinding, since it works even if SYM is already defined and used in the same .o file. So in some ways it's cleaner and shows that the --wrap option didn't really need to exist.

The feature is the -z muldefs command-line option, which tells the linker to ignore the original symbol definition and use a new one. This does not remove any content from the link—content meaning sections, not symbols. It just affects how that content is labelled, i.e. which symbols end up decorating particular positions or ranges within it, and hence how the content becomes interconnected when relocations are applied.

Instead of --wrap SYM after unbinding, we do the following.

  1. add a fresh name for SYM; we choose __real_SYM by analogy with --wrap
  2. create the wrapper definition called simply SYM (not __wrap_SYM), using __real_SYM if it wants, in a file wrapper.o
  3. include wrapper.o before the original definition on the command line, meaning -z muldefs will give it precedence

We use link order and muldefs to take control of SYM, but the code from its old definition still gets included in the link output; it is just “de-symbol'd” since the earlier definition takes priority. But since earlier we created an alternative symbol name for it, we can still refer to it. Unlike --wrap, this approach happily diverts references that were previously bound to that definition—by providing a new definition of the same name, replacing the old one and acquiring its relocations in the process.

Using this approach is quite a disruptive change, if you're used to using --wrap. You now have to pull out the wrappers into one or more separate objects, so they can be specially placed before the “real” definitions in link order. Previously, your __wrap_ functions might be defined in any of the input .o files. To arrange this “placed before” command-line property in a makefile, my best attempt would be for the link target to set something like CC := $(CC) -Wl,wrapper.o (and hope that no other part of the build logic clobbers CC in a conflicting way). This isn't ideal.

I don't know a way to avoid that problem, but we can at least make things more compatible with --wrap, by changing our wrapper.o to be like the wrappers before: define __wrap_SYM and put it anywhere on the command line. Then we augment it with a linker script wrapper.lds, which does have to go first on the command line, but says just the following.

SYM = __wrap_SYM;

This must be placed first in the command line. This brings us two benefits: we can still call our wrapper __wrap_SYM, rather than SYM, and we can have the wrapper logic originate in any object on our command line. We only require the new wrapper.lds to appear before the other, now-wrapped definition of SYM, whichever object that happens to be in. (This is especially handy if the wrapper is in an archive; usually, moving an archive to the front of the command line is not an option.)

What language is the above snippet in? It's the linker script language. There are two main reasons to write a linker script: to provide additional link inputs or definitions (often also doable via the command line, but unwieldy), and to control the memory layout of the final binary (not doable via the command line). These are more-or-less disjoint. The latter is handled by the “default linker script”. The former can be done by naming additional text files in the linker command, each containing declarations in the linker script language as we saw above. (The boundary between these two use cases is a bit fuzzy; I've found that a dropped-in linker script can also contain SECTIONS rules, which are normally the preserve of the default script. From what I've seen, the semantics seems to be that these are concatenated with those of the default script's SECTIONS rules, although I could be wrong.)

You might ask whether we could avoid our pre-aliasing step that created __real_SYM, by instead making the linker script say something like this.

__real_SYM = SYM;   /* let __real_SYM refer to whatever SYM refers to */
SYM = __wrap_SYM;        /* now let SYM refer to __wrap_SYM? */

This seems like a nice pithy definition of what wrapping should do. Unfortunately it doesn't work. Despite the semicolons and equals signs, this isn't an imperative language. Symbol definitions are treated equationally: the defining expressions are all evaluated at the same time. So what the above really does is set SYM and __real_SYM both to equal __wrap_SYM. This breaks the wrapper, making it recurse infinitely.

(In our use case, therefore, we might like have a “predefine symbol” feature, whose defining expressions are evaluated in an “old” context that is then updated, giving a two-stage semantics. Of course that is still limiting; a fuller solution, with arbitrary dependencies between definitions, would require the linker's algorithm to change from a single “build symbol table” pass into something more complicated and with more failure cases—effectively embedding a “proper” programming language, whether imperative or declarative.)

How does our new approach fare with meta-references, self-reference and self-reference at a distance?

As before, it will screw with debug info in the occasional cases where that's relocated via an ordinary (non-section) symbol. So, we really need our version of normrelocs that does the converse rewriting in the case of debug info sections, making them always use section-symbol-based relocations.

With self-reference, again it will wrap unless you reference the __real_ symbol, for both apparent and at-a-distance kinds of self-reference. As we decided earlier, this is the right thing.

Recap of the methods and test cases

We accumulated the following principles.

Uses of a definition should be wrapped, so force them to use an undefined symbol, unbinding reference from definition if both are present in the same file.
Uses of a definition should be wrapped, and only ordinary symbol-based relocs can be wrapped, so force these references to use ordinary-symbol-based relocs, not section-relative relocs.
Meta-references are mentions and should not be wrapped. They are always local within a file, so force them to use section-symbol-based relocs, which won't be wrapped.
Self-references may be self-mentions or self-uses. The program must disambiguate these, by using __real_ for self-mention, and leaving self-use the default.

Meanwhile, we saw various approaches to realising wrapping, which we developed in several stages.

Can we systematically test and compare these against each other? I'll happily rule out testing those stages with asterisks, since they need some kind of manual intervention (and I never wrote a patch to objcopy that will prefer section-based relocs for meta-references). In our tests we simply assume that -ffunction-sections is already in use if desired and any symbol that's to be wrapped is already a global.

That still leaves us many cases! In total there are eight.

a reference is made to code, ...e.g.?
1inter-sectionfrom codeby named defnormal call or address-take or self-mention/use at a distance
2inter-sectionfrom codeby section(nothing realistic I can think of)
3intra-sectionfrom codeby named defself-call
4intra-sectionfrom codeby sectionself-mention; address-taken label
5inter-sectionfrom databy named deffunction pointer initializer
6inter-sectionfrom databy section(almost nothing realistic I can think of)
7inter-sectionfrom debuginfoby named defsome debuginfo references
8inter-sectionfrom debuginfoby sectionmost debuginfo references

We can write a test that is a single assembly file. Semantically, the executable it builds is nonsense and would crash immediately if run. But we just care about what it looks like after linking. The assembly code includes eight references. For each one, we want to see whether it gets wrapped or not.

.globl callee
.globl caller
.section .text.caller
# (1) inter-section, from code to code, by named def
    .quad callee
# (2) inter-section, from code to code, by section
    .quad .text.99callee
.section .text.99callee, "ax"
# (3) intra-section, from code to code, by named def
    .quad callee
# (4) intra-section, from code to code, by section
    .quad .text.99callee
.section .data, "aw"
# (5) inter-section, from data to code, by named def
    .quad callee
# (6) inter-section, from data to code, by section
    .quad .text.99callee
.section .debug_info.text.99callee, "",@progbits
# (7) inter-section, from debuginfo to code, by named def
    .quad callee
# (8) inter-section, from debuginfo to code, by section
    .quad .text.99callee

The correct outcome, according to our principles, is that the all the odd number cases (named definitions) except 7 (debug) get wrapped, as do 2 and 6 (inter-section but non-debug) but not 8 (inter-section but debug) or 4 (intra-section by section symbol, so assumed to be a self-mention). So, the wrapped should be 1, 2, 3, 5 and 6.

We can test it for each of our six approaches. To do a test run, we will assemble it and link it, generating a nonsense executable which we interrogate using gdb.

as -L -o test.o test.s
echo 'SECTIONS { .text : { *(*) } }' > allsects.lds # script to link into one big blob
ld  -Ttext=0x0 -e 0x0 -Bstatic -o test test.o allsects.lds

If our file is linked correctly, we should find our test cases labelled in ascending address order, contiguously from 0x8.

$ objdump -t test | grep '^0.*L' | sort -n -k1,1
0000000000000008 l       .text  0000000000000000 L1
0000000000000010 l       .text  0000000000000000 L2
0000000000000018 l       .text  0000000000000000 L3
0000000000000020 l       .text  0000000000000000 L4
0000000000000028 l       .text  0000000000000000 L5
0000000000000030 l       .text  0000000000000000 L6
0000000000000038 l       .text  0000000000000000 L7
0000000000000040 l       .text  0000000000000000 L8

Now we can script gdb to print out the values of all eight references after linking.

$ echo >/tmp/gdbscript <<EOF
file test
set $n=1
while $n<=8
 eval "x/1ga L%d", $n
 set $n=$n+1
$ gdb -q test -x /tmp/gdbscript | tail -n+2 | column -t

On a null run, i.e. of method Z, we get the following.

0x8   <caller>:     0x18  <callee>
0x10  <caller+8>:   0x18  <callee>
0x18  <callee>:     0x18  <callee>
0x20  <callee+8>:   0x18  <callee>
0x28  <callee+16>:  0x18  <callee>
0x30  <callee+24>:  0x18  <callee>
0x38  <callee+32>:  0x18  <callee>
0x40  <callee+40>:  0x18  <callee>

In other words, all references go to the callee and nothing is wrapped. If we test each of our other approaches in turn, we get the results we would expect. Approach A also does no wrapping because without unbinding, intra-file references aren't wrapped. Approach B starts to get it but covers only the easy cases. Approach C gets all except number 3 correct, since it ignores all self-reference whereas case 3 (named symbol) is a self-use that should be unbound. Approaches D and E both wrap exactly the cases our principles agree should be wrapped: 1, 2, 3, 5 and 6. E does it using the muldefs trick, avoid --wrap entirely.

testD  0x0   <caller>:         0x20  <__wrap_callee>
testD  0x8   <L2>:             0x20  <__wrap_callee>
testD  0x10  <__def_callee>:   0x20  <__wrap_callee>
testD  0x18  <L4>:             0x10  <__def_callee>
testD  0x28  <L5>:             0x20  <__wrap_callee>
testD  0x30  <L6>:             0x20  <__wrap_callee>
testD  0x38  <L7>:             0x10  <__def_callee>
testD  0x40  <L8>:             0x10  <__def_callee>
testE  0x8   <caller>:         0x0   <__wrap_callee>
testE  0x10  <L2>:             0x0   <__wrap_callee>
testE  0x18  <__real_callee>:  0x0   <__wrap_callee>
testE  0x20  <L4>:             0x18  <__real_callee>
testE  0x28  <L5>:             0x0   <__wrap_callee>
testE  0x30  <L6>:             0x0   <__wrap_callee>
testE  0x38  <L7>:             0x18  <__real_callee>
testE  0x40  <L8>:             0x18  <__real_callee>

The test is packaged as a single makefile that you can get here, for which you will want my elftin (that contains normrelocs and sym2und) and patched binutils repositories.

Problem 2 (next time, mostly): wrapping cross-DSO references

Comparing muldefs with --wrap, another difference is that with --wrap, the __real_ symbol was never actually defined; it was just a magic alias that could be used to refer to the original unqualified symbol. The __wrap_ symbol did exist, so, in the final executable we have symbols __wrap_SYM and SYM. In contrast, using the muldefs symbol replacement approach, the final executable gives us SYM (our wrapper) and __real_SYM (which we created as a bona fide symbol definition, aliasing the original function). In our “more compatible” approach above we also get __wrap_SYM, but only because we made it an alias of SYM, which isn't essential.

Of these two outcomes, the latter symbol replacement case plays much more nicely with dynamic linking, since it gives the wrapper the identity of the original function SYM. In other words, the wrapped SYM, if exported, will naturally be what an external DSO binds to when it makes reference to SYM. That's always what we want. (Meta-references never cross DSOs, and self-mentions need to use __real_ as before.) By contrast, with ordinary wrapping, SYM would be left referring to the unwrapped symbol, and that will be what gets exported—so cross-DSO calls won't get wrapped.

As hinted at above, it's also useful to do wrapping only at dynamic link time, without using --wrap at all but relying on LD_PRELOAD or, more generally, link order. Of course this only suffices when wrapper and wrappee are in different DSOs. Even then, it is hard to arrange and even harder to do reliably... a topic for another time.

Is this sensible?

We've taken it as a given that messing with symbols at link time is a reasonable thing to want to do. Yet also, we saw pretty early on with static functions that without going back to source, we were limited in what we could do. Even worse, we couldn't be sure that what were doing is even meaningful in a source-level sense. So maybe all this is just a terrible idea?

The issue comes down more generally to whether link-time artifacts—relocatable object code, essentially—have any meaning to the programmer. Are they a notation that can legitimately be manipulated?

Traditionally, one could feel reasonably comfortable doing this because compilers were less clever than they are today. Long ago, most programmers worked close enough to the metal that object code had a place in their mental model. To systems programmers today, this largely remains the case, but to application programmers, it doesn't. Meanwhile, to compiler developers, it is very inconvenient to have this extra intermediate level of abstraction hanging around, at which programmers might want to intervene and mess with the source-level semantics, which to a compiler author are all that matters.

However, since we as “end programmers” often want to plug together code coming from different languages or different toolchains, it should be clear that the compiler author's take isn't really all that matters. Plugging stuff together is a case of integration, and by its nature, integration often involves stepping beyond the view of any single compiler or language implementation.

I've written before, with colleagues, about how linking is used in semantically significant ways, not just as a mechanism for separate compilation. However, POSIX did not standardise linker interfaces and they continue to be unportable, obscure and unreliable. (Witness my colleague Laurie Tratt's recent travails embedding a binary blob into a built binary, which is still only portably doable via a compiler or, slightly less portably, an assembler.) Meanwhile, link-time optimizers (LTO) routinely break link-time interventions of the kind this post has been describing, including symbol wrapping. These optimizers assume that they control the entire path from source down to the final binary. The approach of the LTO systems I've seen is to issue fake intermediate object files containing opaque compiler-specific blobs (serialized IR). At link time, the compiler is re-run in disguise (as a linker plug-in) to reprocess these blobs, discarding the fake contents (which could be used by a non-LTO-aware linker, but normally get ignored). This mirrors the compiler author's perspective: only source semantics matter, and “the usual” intermediates can validly be faked out as a compatibility hack. Of course, if you do funky stuff like symbol wrapping on these intermediates, it will not affect the compiler blobs and so will be discarded.

I believe that object code has value as an integration layer. It has deliberately moved away from the details of source-level semantics, but exposes a “wiring and plumbing” abstraction. This is a convenient level for humans to express compositions—including wrapping, but definitely not limited to that. (My PhD thesis did rather more elaborate stuff at link time, to adapt between binary interfaces. It was influenced by a nice earlier seam of work on linking and configuration languages such as Alastair Reid's Knit.)

Even better, object code actually doesn't lack semantics at all. There is invariably an established, standard and elaborate system for describing its structure and meaning, relating it not just downwards to the machine (a fairly easy step) but upwards to source language features. This system is called debugging information. This is a misnomer since it has many applications besides debugging. All this could certainly be cleaner and tidier—for example, our latent criteria for distinguishing “meta-references” above could and should be a lot more explicit. Also, renaming a symbol in an object file usually won't rename a definition in the corresponding debug info; again, it should. And there is a lot more semantic information that could usefully be included. Nevertheless, the practice of bundling some such information is well established. (So is the practice of unbundling it and throwing it away, unfortunately.) Although many language implementers no doubt see this as an inconvenience that gets in the way of fancy optimisations. a world with good tools for working with object code is a world closer to realising the goal of software as workable ‘stuff’. This is precisely because, counterintuitively, that world is necessarily one where no single tool is so over-powerful as to exclude others. (Readers of Ivan Illich will recognise this idea, of course.)

[/devel] permanent link contact

Fri, 01 Apr 2022

Anti-prosperity in the UK, part one

(I thought about calling this “Never Mind the Bullshit Jobs”... but that would apply better to a future part.)

Various rhetoric about government-funded academic research here in the UK seems increasingly to use the word “prosperity”. For me this has become a major red flag. Despite the positive-sounding line that “prosperity” exists as an easily recognised and obviously good thing, it is often invoked by interested parties attempting to whitewash issues that in fact are subtle and, usually, deeply political. In many cases, claims of “prosperity” are used as cover for outright harm.

(This post will be overtly political. However, I don't believe very much in it is controversial.)

For examples of anti-prosperity, we only have to look at ad-funded social media “prospering” from monetised hatred and insecurity, or ad-funded cloud computing “prospering” from mass surveillance. Alternatively we can look at the countless corporations “prospering” from the constant private and public investment in unsustainable high-energy technologies, from electric cars to data centres to cryptocurrency, deliberately disregarding both the carbon cost and the other emitted pollutants (whether particulates, carbon or coinbros) while starving out the adoption of obvious low-energy alternatives. We could look more generally at the epidemic of “crap automation” afflicting our society—lower-cost but fundamentally less effective “solutions” to all manner of formerly human service roles, most obviously in customer services and retail, but nowadays increasingly in knowledge work (transcription, translation, tuition) and even in healthcare. All of these bring opportunities to make profit for someone, but also to lose value for people at large. Nobody in their right mind would call that “prosperity”, yet that is what is too often meant. As Illich argued, our very definitions of progress are controlled by a select few—whom he called “self-certifying professional elites”—who are very often engaged or invested in programmes of work that once arguably did good, but have shifted towards doing outright harm.

Public and private harm

When public harm is profitable to private interests, we should expect those private interests to pursue it. What we should also expect is that academics and government—those who have the public interest in mind—to act against that harm. Instead, today's private harmdoers are helped on their way, by complicit academics and wrong-headed government initiatives that have created an extreme dependency on external funding and extreme contingency of that funding on indicators of “success” defined as benefit to private interests.

Here in the UK, recent governments have worked hard to more closely align public fundability of university research (whether by project grants or REF-dependent block grants) with benefit to private-sector industry. This is called “partnership” and “impact”. In the other direction, private harmdoers have worked hard to embed their interests and attitudes within universities. This is called “engagement” and “collaboration”. In the middle of all this we find the hapless individual academic. The issue is both institutional and individual. It's easy to criticise institutions, who certainly try very hard to portray this state of affairs as “the system” and therefore “the job”, in which academics are duty-bound to do whatever it takes to bring in money. Our institutions are so desperately, increasingly hungry for outside money. Promotion and other rewards are offered for “impact” and “engagement” but, for some reason, never for public good as an end in itself. Instead, “good for some business” and “good for the public” are assumed equivalent.

While the incentives on academics have never been more toxic or less conducive to good societal outcomes, academic pushback on these bad incentives has been about as effective as a hard-boiled frog. Once upon a time, one might have expected academics to speak out en masse. Somehow, individuals have become rotten too. To borrow two phrases from Jeff Schmidt's sizzling polemic “Disciplined Minds”, increasingly it is the individuals who have “assignable curiosity” and a capacity for “alienated labour” who can succeed in the academic system. If you actually care about something, the system is destined to frustrate and contain you. The changes of recent decades—to a more bureaucratic, managerial university system in general, and a more short-term impact-centric research system specifically—could hardly have been tailor-made any better to further this effect.

Cosiness and carbon

Let's consider perhaps the most glaring example, of how cosiness with the fossil fuel industry continues to be rife within our universities. While I was a PhD student at Christ's College, Cambridge, a cheerful man named Reg introduced himself to me as the new Bursar, explaining that he had recently retired from British Petroleum (BP). Here was a company that for decades had been roundly thought of as comprising jolly good trustworthy establishment sorts who were bringing “prosperity” to the great British nation. More recently it is better known for its core business of bringing about the climate disaster, as well as its criminal liability for the largest marine oil spill in history. It does so while frantically trying to greenwash its image by touting a tiny fraction of its subsidiary activities. In 2018 the Christ's College Council voted against terminating the College's investments in fossil fuel companies, despite what was at the time conspicuous lack of coherent argument supporting that decision. A little earlier, still in Cambridge, in a debate about university investments, an academic funded by the fossil fuels industry stood up and pronounced these industry players as “the good guys” of decarbonisation. That claim was not plausible at the time, and even in a few short years, it hasn't aged well.

Although implausible, I don't doubt that these beliefs were well-meant. Let's say your research is about combustion efficiency. Great! If it succeeds, we can burn less fuel per unit useful energy. Isn't that helping? And if BP (or whoever) is funding this, maybe they're doing good? It's easy to rationalise yourself into believing this sort of argument, particularly if your career, wealth and standing all benefit from it. Unfortunately it doesn't stand up to systems thinking. It's well accepted that humanity's goal now must be to keep as much carbon in the ground as possible. Increasing the efficiency of a destructive technology has many indirect effects that work against that. It is likely to prolong these technologies' lifespans, maintain for longer their positions in the market, give fodder to lobbyists' efforts to stall government regulation, bolster the fortunes of companies invested in pursuing them further, excuse and enable superficial non-fixes (hydrogen to replace natural gas, anyone? how much of that is really going to come from green electrolysis?), slow the writedown of fossil-fuel plant investments (anyone heard of stranded assets?) hence lengthen the operating life of these plants and practices, and ultimately lead to more of fossil fuel being burnt. And that is no accident; it is what these companies want because it ekes out more returns from the money they have sunk into existing activities, plant and expertise. It benefits their shareholders, to humanity's detriment.

Academics less reliant on funding from or “partnership with” giant corporates might be more awake to all this. Put differently, externally dependent research programmes are inherently corruptible. Thanks to prevailing bastardised notions of “impact” and “engagement”, even publicly-funded “academic” programmes are now largely in that category. If what we call “industry” were on balance doing good in the world, this might be not so bad. But the consumption-based industries underpinning modern society have mostly long since crossed Illich's second threshold: they are doing active harm, while passing it off as progress. Incumbent corporates' fingers reach into most or all influential institutions of the state. They spend good money on appearing like the good guys, and for them it is well worth it.

Capture and complicity

Quite often I can ascribe academic complicity to benign naivety... but not always. One non-executive director of BP is a certain Ann Dowling, a professor of considerable standing at Cambridge, who has also happened to wield much influence over funding nationally, as chair of one of the four REF panels in 2014. Besides her, many other academics in Cambridge have deep relationships with BP and similar companies. (The same is likely true in many leading institutions; I'm picking on Cambridge because I know it the best.) Not so coincidentally, divestment from fossil fuels was vehemently resisted by much of that university's establishment in the face of increasing grass-roots pressure during (especially) 2016–18. The idea that “fossil fuels are our trusty financial friend” was heavily ingrained, not just as an ambient belief but by concrete financial interests, both institutional and personal. These were enough to bring out dirty tricks and “arguments” that would shame any free-thinking academic. As these companies know, thinking is more easily captured than carbon.

This was never more blindingly apparent than in 2018 at the issue of a non-report by a so-called “Divestment Working Group”. The group was ostensibly answering a grass-roots call for the university to assess the benefits of divestment. However, it was actually given vague and irrelevant terms of reference and delivered a similarly vague and irrelevant report. Later, in 2019 it was reported in The Guardian that professors Simon Redfern and John Shakeshaft, both members of the group, failed to declare an apparent conflict of interest from their role in processes around accepting funding from BP. The linked “conflict of interest register” (not public) indeed does not mention such interests. Perhaps dealing with BP money was so routine for them that it slipped their minds. Perhaps there is another explanation. But either way, this situation represents a problem. Even crediting those involved the best will in the world, it does not seem plausible that these financial relationships were fully isolated from other decision-making.

The same document also reveals that the chair of that working group, Athene Donald, was decent enough to declare that she personally held stock in three well-known fossil fuel companies. (The holdings are described as “small”... but as any good physicist knows, small is relative.) It says a lot that this wasn't enough to red-flag her as chair; one doesn't have to be too cynical to suspect it may even have green-flagged her, given the evidently low risk of her putting her mouth where her money wasn't. The Vice-Chancellor at the time the working group was formed, Leszek Borysiewicz, had told me in person that he opposed divestment because his overriding concern was the apparent hypocrisy of holding research collaborations with companies like BP in whom the university would meanwhile be declining to invest. The same Guardian article quotes a university spokesperson stating falsely to the press that the working group's recommendation of (essentially) inaction was “backed by the full Council”. In fact, a note of dissent had been submitted by two academic Council members in addition to the two student members. The relevant press release shamelessly attempted to bury the academic dissent under an obscure heading that appeared to conflate it with the student voice. Perhaps, again being charitable, the spokesperson simply absent-mindedly overlooked the dissent; “of course” there was consensus that supporting our fossil fuels multinationals, the good guys who so generously fund our research, is the path to prosperity. Again, that's charitable in the extreme, but it's a problem even if accurate.

Cash and (career) currency

It's true that some progress has since been made on the fossil fuel issue. Both Christ's and the University of Cambridge have since decided to divest from fossil fuels, at least ostensibly and perhaps also for real. But the rot still goes deep. From where I've been standing, this change does not appear driven by any sudden re-evaluation of the nature of prosperity. It was pursued only once not doing so became toxic for PR in general and for alumni relations in particular—thanks in part to press coverage, as seen above. In short, the university acted to safeguard its own income stream, rather than out of public-spiritedness.

This isn't about fossil fuels per se. There is nothing to stop complacent institutions blundering into blinkered support for the next industries of anti-prosperity, despite their stated do-gooder aspirations. In fact this is already happening. My own field is not short of examples: where once Google or even Facebook might have seemed non-evil dynamic young collaborators, now their business models and operating practices are obviously beyond redemption, yet in the intervening time, their hooks have become deeply embedded in our universities. As more and more of academics' jobs involve fundraising for those self-same jobs, universities' resistance to blunders and infiltration can only decline. Academics' eyes light up the moment corporates dangle what, for them, are quite puny sums but to an academic are career currency. Career success selects for those who have the least compunction.


What can reverse this dynamic? Currently I don't see any tendency for self-correction—neither in general nor even (unbelievably) in the specific case of fossil fuels. In a September 2021 departmental away afternoon here at King's, we were told that the best kind of impact is financially quantifiable impact, and (with a straight face) a really great success story at King's recently was enabling Schlumberger to drill for oil and gas in quantifiably more affordable ways. This is the core activity of the fossil fuel industry—not even its greenwashed fringe. We were so proud to be directly helping with the out-and-out dirty work.

Here in Cambridge, where I live, Schlumberger is also very present, and there is a nascent campaign to get the university (here) to dissociate from it. However, reactive single-issue campaigning is not the answer. For a few headline issues, it might bear some fruit, as it did with the student campaign groups who tirelessly pushed the debate on the fossil fuel issue. (Many staff members did so too, of whom I would single out Jeremy Caddick for his heroic tenacity.) However, such campaigning takes an awfully long time to get results even in the best case. For many issues, campaigning just never reaches critical mass.

Since so many individual academics evidently can't be relied on to shun anti-prosperity as a matter of course, the only hope is if we can somehow change the incentives in a more blanket fashion. If we are to talk about real prosperity, and public research funds' ability to deliver it, we need to ask: where's our incentive for not pursuing anti-prosperity? Hatred, surveillance, fossil fuels, high-energy technologies in general (once again following Illich): how can we ramp all of these down to zero? Where are the institutional and governmental policies offering reward for work on doing so? Where are the penalties for work that goes the other way? Given the corrupt state of both government and universities, I'd even settle for a weaker version: how can we not be penalised for working on real prosperity? In a regime where big-number “financially quantifiable impact” and funding contributions from “industry partnerships” are given so much weight, there is a de facto penalty against work that isn't assisting the big-money corporates. Yet such work is exactly what real prosperity must involve. If any institution should be incentivised against anti-prosperity, surely universities should. Yet all the incentives point completely the wrong way.

Relatively small tweaks could easily help with this. For example, benefit to charities or other not-for-profit partners could be favoured over private-sector “impact”, and such partnerships could explicitly be promoted. Currently, they are latently discouraged since the usual word “industry” does not even bring these options to mind, and because financially quantified “impact” numbers are inevitably either smaller or unavailable. Conservative governments are unlikely to deliver funding schemes that reward these, but it doesn't stop local efforts to prioritise them. I suggested above that public good—as distinct from “engagement” or “impact”—could easily be an explicit criterion in promotion, for example. Local funding pots could also prioritise public good. Small financial incentives can exert a large sway on academics. Unfortunately, as we saw above, the higher strata of universities have been captured by a particularly dangerous gaggle who do harm while earnestly believing they are doing good. Combined with institutional inertia—fossil fuels are still seen as a trusty financial friend—and the chance of reform in local institutional systems and processes seems negligible.

So ends my first rant about anti-prosperity. By picking on fossil fuels, I've chosen a headline issue that is an easy target. With its being rather outside my area, I could perhaps be accused of missing nuance. In one or more future posts I'll cover two further anti-prosperity issues rather more local to computer science. One is how the popular notion of “automation” (sorry, I mean “AI”) is being missold based on industrial notions of automation that ignore issues of human value and quality. The other is how aggregation has acquired an overwhelming mindshare, to the point that even bright people forget that or “scale-up” is not the only and not even the only technical means of creating human value in industry (indeed, it readily destroys value). All three issues are related, in that the same small minorities benefit from all three, and each reinforces the others. It is a vicious cycle which universities should be countering, not amplifying.

[/highered] permanent link contact

Mon, 17 Jan 2022

De-escalating the submission backlog

About two and a half years ago, I submitted a two-page abstract entitled “De-escalating software: counterproductivity versus conviviality” to the conference on History and Philosophy of Computation (HaPoC), proposing a talk the philosophy of Ivan Illich interpreted in the domain of software. I had already given a couple of experimental talks about it that appeared to be well-received. The idea was to push myself into developing these ideas a bit more, as a waypoint to eventually publishing them. The conference, in late October 2019 in Bergamo, would have been a nice stopover on my return (by surface travel) from SPLASH in Athens.

A couple of months later, the day the notification was due, I woke up very early in the morning in a panic, having slept very poorly. I was feeling overwhelmed by my workload and I knew that I could not deal with the extra burden of my talk being accepted. I ran downstairs and sent an apologetic e-mail asking to withdraw the submission. I've no idea whether my talk was to be accepted or not, but either way I am very glad for the chair's understanding response.

You can read the abstract here. I would still like to write more about software and Illich. As well as the two earlier talks I gave in 2019 (neither was recorded, sadly, but you can view the slides on my talks page), I was also a guest on the Hope in Source podcast by Henry Zhu and Nadia Eghbal where we shared some thoughts on Illich. It is proving very difficult to get time to devote to this. Perhaps it is something best done intensively at a retreat.

(I have at least three other pieces of writing that are in a fairly similar state. And I have dozens of blog posts that are “90% finished”—where the remaining 90% is left to do, of course.)

[/research] permanent link contact

Tue, 11 Jan 2022

(Tell me why) I don't like funding {applications, bodies, policies}

I was recently successful in my first non-trivial grant application, for a New Investigator Award from EPSRC. Great! Well, sort of. It's great because it will hopefully enable a talented person (who isn't me) to spend a few years doing some interesting research. In other ways, it's not so great. I have been grimacing at every well-meaning “congratulation” I've received. The process is a horrible, contorted mess that makes me sad, angry and anxious. I hate it and it is literally making me ill. I have one more proposal in the pipeline and another planned for just over the horizon. It is already too much for my blood pressure to handle. I am hoping this blog post will unburden me somewhat; if you read it, please read it in that spirit.

A small factor is that proposals trigger a social anxiety I have about “asking people for stuff”. I spend my life thinking differently from most other people I meet. So, psychologically, I much prefer to be independent and not need approval from others in what I set out to do. Writing papers is the culmination of this process: I get to elaborate precisely why my “different belief” is justified. Writing a funding proposal, on the other hand, is inherently speculative and so is a much less secure enterprise. I find myself having to “make stuff up”, or write things that “sound good” but don't stand up to scrutiny. How can they, when the research isn't done yet? It's an exercise in double-talk, appealing to some nebulous reader of whom I have little picture and no reason to respect. It literally stresses me out, mentally and physically. (I suspect this may be partly an autism-spectrum reaction, of being forced to meet expectations of a perceived “normal” reader.)

But it gets worse. A further aspect is that whenever I talk to funders, or to institutional support staff trained in funders' ways, I quickly get the sense I'm talking to someone with a wacky, bureaucratised and dangerously inaccurate model of how research works. My distaste and/or rage at this discourse has been stoked by many years as a ‘postdoc’—a concept fully deserving scorn-quotes, and one whose ever-growing prevalence itself owes to this inaccurate model. But now, as a nominally grown-up academic citizen, I am forced nose-holdingly to claim allegiance to all the nonsense. I feel complicit in the perpetuation of this idiot system and the resultingly horrible life situations and dire value-for-money that arise from today's publicly funded research.

All that would be bad enough even if the system were otherwise well-conceived and well-executed. Of course, it isn't. Its practices are founded on many wrong beliefs and flawed agendas. Here is a run-down of those as I see them.

The “project” agenda. Funders love the word “project”. They love to say that they fund projects. But in reality, almost all meaningful research projects both predate and outlive the funding vehicles that support them. A grant can be a vehicle for making progress with a worthwhile project, but it seldom delimits it.

The “impact” agenda. This needs no introduction. It suits my work well, but it's pushed to an inappropriate degree. To espouse it is to support misguided government beliefs that the only research worth doing is the immediately impactful. It's impossible to overlook the parallel with another wrong message that is rife in UK universities today, that the only degree-level education worth having is one that teaches vocational “skills”, preferably in “STEM”.

The “investigator vs researcher” agenda. Any project grant I might be awarded is partly or largely a vehicle for giving others time to explore what they want to explore. It may be that that is not specifically advancing “my” work, i.e. the work hypothesised in the proposal. Yet the proposal process requires me to ignore the former and pretend only the latter will happen. We can call the former “covert independent working”. It is regrettably necessary, because public policy allocates smaller and smaller shares of funding for overtly independent work (basically personal fellowships and a small fraction of PhD studentships). I have personally relied on this pattern of “covert independence” earlier in my career (albeit not without large doses of compromise and frustration). I've disbenefited from its absence (“one out of three postdoc positions ain't bad”). In disciplines without huge lab overheads, any researcher worth that name is also an independent investigator to an extent. That extent does vary. Although it might be workable to leave it down to each PI/researcher pairing to negotiate a mutually satisfying mix, regrettably, power imbalances skew this process and it fails a fair fraction of the time. It happens increasingly often when “successful” academics style themselves as “managers”. Negotiation is necessary only because funders' offerings have not adopted anything like the right mix of funding people (or institutions) versus funding projects, as I've ranted in the past.

The “national importance” agenda. National Importance should be a non-issue for any research worth funding. Inventing the future is always in the national interest. Feeding the business interests of Locally Invested Corporations may or may not be. If the work in question is not also in the global interest, it's simply advancing a race to the bottom, which shouldn't be rewarded by public funds. Examples of such races include financialisation, much work under “defence”, blockchain, fossil fuels, and so on. All this is not to be confused with the fact that that some national environments might make a good (or poor) place to do a given piece of research. If the exercise were simply about this, I could get on board.

The “training” agenda. Funders' calls and particulars are shot through with the belief that a bottleneck to academics' research success is their lack of “training” and “personal development”. In reality, the main bottleneck—besides raw talent and willingness to work self-harmingly hard, of course—is almost always firstly time and secondly consolidated time. That is not to say that all training is a waste of effort. But somehow, the present machine yields actually-useful training in at most 10% of cases. The greater part of that machine's function is much more sinister: to deflect responsibility, in the process gaslighting people into thinking that when their high-quality work or skills meet with limited success or limited opportunity, the problem is them—and certainly not the underresourced, over-competitive, incentive-broken system they inhabit. “The system works great for people who are better than you! Look at all the training on offer! If you don't succeed, you've only yourself to blame.” Meanwhile, public and institutional policy continues to ensure that no matter how strong the field, only a slim few garner the markers of “success”. The training agenda is about denying this political reality. It is most strongly in evidence in the situation facing postdocs who are searching for a “proper” academic position: funders and governments constantly dream up “training” or “career development” initiatives that they claim will “solve” this problem. It obviously cannot, because the problem is structurally embedded in (current) public policy. Training can at best move pain around and at worst just waste everyone's time, The added gaslighting just inflicts more pain.

The “token innovation” agenda. Faced with an incentive to “stand out” in a competitive field, or to appear “relevant” amid the initiatives and priorities du jour, the proposal-writing exercise gains an ugly new side: hoop-jumping, box-ticking tokenism. Thankfully I did not have to engage in this, but I've definitely seen it in action. As one very senior and successful colleague once put it to a visiting EPSRC bod: “you are just giving us incentives to lie”.

The planning agenda. Writing a “case for support” is a festival of bizarre over-planning. One has to tacitly claim to anticipate the future to an absurd degree, detailing “deliverables”, timelines on Gantt charts and the like. I try to make peace with this on the grounds that it is “a plausible story” and about “planning, not plans”. But one needs to keep up the mask.

The ‘excellence’ agenda. An open call from our esteemed European colleagues states proudly that “excellence is the sole criterion”. But excellence is not a criterion at all. Excellence with respect to what? The word has been abused to a point that is almost beyond satire, although this tour de force by Matthias Binswanger makes a compelling go of it.

The (metaphorical) suit-wearing agenda. Reviewers do their best to assess proposals' technical merit. But given the inherently speculative nature of any proposal, even with the best will in the world a substantial component of the “criticism” a proposal receives at review will be founded on superstition, impression and “feelings”. This is only natural. Proposals just don't have space to explain anything serious in scrutiny-supporting detail. Human beings faced with this underconstrained task fall back on a latent sense of “what a good proposal looks like”, involve large components of folklore and personal taste. This is witnessed by the huge amount of contradictory feedback I've received on draft proposals to date. “You have too few citations” but also “you have too many citations”. “Only include ‘management and risk’ in large grants,” but also “include more about ‘management and risk’”. “More verbiage” but also “less verbiage”. The whole process resembles the blind leading the blind. This is not because reviewers don't know about good research (many do), but because there is no reliable shared understanding of two mutually interdependent things: (1) how to detect a proposal that is likely to lead to successful research, and (2) how to write a proposal in a way that signals that likely success convincingly. So at last we reach the suit metaphor. The art of writing a proposal is like wearing a suit. It's an exercise in “looking the part” far more than demonstrating substance. The associated practices persist not on their established merit, but because there is a self-perpetuating belief that engaging in the ceremony, and doing so with an air of conviction, is a predictor of better outcomes. As a staunch suit non-wearer, I also don't believe in this ceremony. There are plenty of studies suggesting that far less effortful and more equitable ways to distribute money would deliver equal or better outcomes. Unfortunately, suit-wearers tend to become gatekeepers.

The funder-in-control agenda. This is an interesting one and is possibly the reason why academics find the system tolerable. I levelled up slightly when I realised that the whole process is set up to deceive. Government funders act like they are in control. The application process is fronted by bureaucrats who know little about research or its culture, but are instead trained up in the image of the funder's management (civil servants) and paymasters (politicians). Thankfully, these people are largely not in control. Researchers on the ground have a very different culture from all this. It is these researchers who actually review proposals and take part in panels (the meta-review that makes funding decisions). So, the funder's communications are at best in need of a pinch of salt, and at worst form a layer of active misdirection. Funders and their agents often advise in certain directions, such as “say lots about training and personal development”, “say why this should be funded this year and not next year” (I kid you not) and so on. However, the success of the application lies overwhelmingly in the hands of academic reviewers, who mostly don't care for these perspectives any more than the applicant does. (I say “mostly” because some do get institutionalised.) The same is often true of institutional “advice” from support staff—they tend to echo the funder's line rather than reality, since they are trained and briefed by funders. Many do have some experience on the academic's side of the table, but it tends to be up to the ‘postdoc’-ish level at most, so typically includes only a limited taste of writing or reviewing proposals.

The “prosperity” agenda. Faced with funders' glossy documents such as EPSRC's current Delivery Plan (I almost said “Deliverance”) repeating the word “prosperity” loud and often, I can't help feeling that an Orwellian reading of the word is sadly necessary. We know that countless technological and political changes that claim to be “delivering prosperity” are actually delivering poverty, alienation, dehumanisation and despair for much of the population. “Prosperity” has been conveniently redefined as consolidating the capital of the 1%. It saddens me to see my academic peers suckered by this, and sadly I do believe many have been (as voiced aspirationally in a departmental meeting not long ago: “wouldn't it be great if our department had an Amazon chair? a DeepMind chair?”). I have a whole other rant on this subject, coming soon.

[/highered] permanent link contact

Mon, 18 Oct 2021

ELF dynamic linking: a brief introduction

[I wrote this to help current or future students who might want to do projects involving dynamic linking. But it may be of interest more widely. There may well be an equivalent or better article out there; if anyone knows one, please let me know.]

In old-fashioned static linking, the executable file captures a complete starting “memory image” of the program being loaded. The kernel simply maps the binary into memory, creates an initial thread stack, dumps some useful information onto it (the auxiliary vector) followed by the program name and argument strings, then transfers control to the entry point identified in the ELF binary's header. So far, so straightforward. I'll assume readers familiar with this. (If not, among other resources, you could read section 3 of the OOPSLA 2016 paper I co-authored.)

In dynamic linking, the executable no longer provides a complete image; rather, it is one of a collection of “dynamic shared objects”, or DSOs, that provide the starting memory image. (Normally, the executable itself is counted as a DSO, even though it may not be “shared”.)

The dynamic linker composes this image at load time, by loading both the executable and the (transitive closure of) depended-on library DSOs.

Dynamic linking is delegated by the kernel to user space: the kernel still only knows how to load static binaries, and the dynamic linker is a special static binary that knows how to load (and link) dynamic binaries. The kernel is extended just enough to split this case off: it must figure out, after loading an executable, that actually the dynamic linker is needed to complete the loading. The dynamic linker can be any binary nominated by the executable file, but usually the standard one (for me: /lib64/ld-linux-x86-64.so.2) is the only one on the system. In practice, the kernel will map both the executable and the dynamic linker into memory, so the dynamic linker only normally needs to load the libraries, not the executable. (Sometimes the dynamic linker can also be run as a program, in which case it has to load the executable too.)

DSOs have dependencies on other DSOs, in the form of NEEDED records. You can see these in the dynamic linking information dumped by readelf -d. For example, here is what I get from a simple hello C program, which needs only one library.

Dynamic section at offset 0x2df8 contains 26 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x1000
 0x000000000000000d (FINI)               0x11b4
 0x0000000000000019 (INIT_ARRAY)         0x3de8
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x3df0
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x308
 0x0000000000000005 (STRTAB)             0x3d8
 0x0000000000000006 (SYMTAB)             0x330
 0x000000000000000a (STRSZ)              130 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x4000
 0x0000000000000002 (PLTRELSZ)           24 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x548
 0x0000000000000007 (RELA)               0x488
 0x0000000000000008 (RELASZ)             192 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffb (FLAGS_1)            Flags: PIE
 0x000000006ffffffe (VERNEED)            0x468
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x45a
 0x000000006ffffff9 (RELACOUNT)          3
 0x0000000000000000 (NULL)               0x0

These records give just the names of the depended-on DSOs (actually their soname—more below). Where in the filesystem these binaries are to be found depends on the linker search path, which is controlled by your LD_LIBRARY_PATH environment variable, by a configuration file, and by any RUNPATH records in the loaded file (also visible using the command above, although our hello does not contain any) and, transitively, those in any files loaded to satisfy a NEEDED dependency. The ldd command asks the dynamic linker to do a trial run and show where it located the dependencies (or failed to!).

In memory, each DSO gets a range of the virtual addess space, starting at some load address. So in the DSO's ELF binary on disk, all “addresses” are actually offsets relative to the load address. It used to be that executables always had a load address of 0, so the executable file's addresses were bona fide virtual addresses. However, nowadays executables are typically position-independent and get loaded at a non-zero address too.

(DSOs are mostly contiguous in memory. However, nothing prevents them from containing holes in between their mapped segments. In edge cases it is possible for holey DSOs to be loaded such that they overlap. This is rare, but I did have to go to considerable lengths to support it, or rather prevent it, in one of my runtime projects.)

After loading, there may be some load-time relocation required. This is performed by the dynamic linker and is very much like the relocation that happens during static linking, but is used to resolve inter-DSO references. (In some degenerate cases, it also resolves intra-DSO references that have been deferred until load time.) Many of the entries in the output shown above are telling the dynamic linker where to find the relocation and symbol information for this.

A “shared library” is a library DSO whose text segment does not need load-time relocation. In other words, it works fine as-is when loaded at any address. This flexibility to map the file simultaneously at many addresses, without the need to fix it up in memory after loading, is what allows memory to be shared between many processes under ELF dynamic linking. (Some other linking systems systems, including Windows DLLs, instead force the library to be placed at the same virtual address in all address spaces, hence the infamous process of “registering” DLLs.) Elimination of any need for load-time relocation is achieved by indirecting inter-DSO references that would otherwise be (say) direct call instructions, so that they occur instead via a couple of (non-shared) tables in the data segment: the Global Offset Table (GOT) and Procedure Linkage Table (PLT). Roughly, the latter is used for calls, the former for data access. Others have written about those (see Eli Bendersky's very nice in-depth article), so let me just say that the GOT is exceptionally poorly named: it is not global (there is one per DSO), and does not contain offsets (it contains pointers, at least after relocation). They should both arguably be called “vectors” rather than tables, because they don't have multiple columns.

Despite the “S” in “DSO”, there is no requirement that library DSOs satisfy this shareability property. Instead, ELF linkers usually complain if you try to create a DSO that does not (often via an inscrutable message ending with “recompile with -fPIC”). The dynamic linker doesn't care, however. If it has to fix up your text, it can do so, relying on copy-on-write to allocate memory as needed. Shared libraries are “shared” only because that copying doesn't happen.

Since libraries and executable are now deployed as separate files, they are subject to version skew. Libraries often try to offer backward compatibility, such that a newer library can still support an older executable (or, equally, support some old library declaring it as NEEDED). A vaguely standard symbol versioning mechanism is defined for ELF, to allow a library to offer older and newer versions of “the same” symbol and have references get bound to the right one, preserving binary compatibility. Alternatively, a binary-incompatible version of a shared library may be issued, in which case it's good practice to reflect that in the library name. There is a convention for recording version bumps in a number that forms part of the “soname”, or shared object name, hence “libc.so.6” in the above listing. This suggests six version-bumps have occurred in the GNU C library. In this way, many distinct incompatible verisons of a library may be installed on the same system. Symbol versioning takes a bit more discipline and more build-time ceremony to set up, but it tends to be preferred over sonames, especially for libraries that are widely deployed. The GNU C library has not bumped its soname for many years.

DSOs have identity at run time. The dynamic linker maintains a doubly linked list called the “link map”. It is possible to load new DSOs, such as plug-ins, during execution, via the dlopen() library function. This adds them to the link map, alongside all the libraries loaded earlier. Indeed the dynamic linker itself shows up in the link map as just another library, providing (among other things) the dlopen call (or at least its back end; on GNU systems there is a separate libdl as a front-end). It also provides a symbol that gives access to the link map itself (_r_debug). Over this, a breakpoint-based protocol allows debuggers to walk the link map, so they can find out about the libraries that are loaded. DSOs can also designate initialization and finalization routines to be run when they are loaded and unloaded.

That's enough for a brief introduction. To get more details about what's going on at run time, you could read my older post about ELF introspection.

A few years back, Simon Ser did some great work on an executable semantics for dynamic linking, as a summer internship of the REMS project. Since then I have been struggling to get time to resurrect and publish this work, and it no longer falls directly within my prioritised interests. But if anyone is interested in taking it up, do contact me... it is begging to be polished off and written up.

[/devel] permanent link contact

Thu, 14 Oct 2021

Tracing system calls in-process, using a chain loader

Some months back I wrote about a possible better alternative to LD_PRELOAD using chain loading. And I promised a follow-up with more details... so here it is. We'll walk through a little system call tracer built using this technique. Unlike strace and similar, it will do its tracing in-process, without using ptrace() or equivalents. I should add that it's a proof-of-concept only so isn't nearly as good as strace in practice; in particular it can't yet accurately print the arguments to most syscalls. (I have some work towards doing that generatively, using kernel DWARF; more later.)

System call tracing is a good demo of chain loading, for three reasons.

Firstly, it's basically impossible to do with LD_PRELOAD, which works at the higher level of the C library and dynamic linker, so doesn't have sight of system calls.

Secondly, it's close to what people very often want to do with LD_PRELOAD, which is to change the behaviour of particular system calls. No longer do we have to make do with overriding the C library wrappers; we can intercept the system calls themselves, catching certain cases that would not be visible at the C library level.

Thirdly, intercepting system calls specifically provides the foundation for a more general bootstrapping approach to arbitrary binary instrumentation of programs, as I hinted last time. If we can catch all the system calls which introduce new instructions to the address space, and prevent instructions from being introduced otherwise, we can ensure that literally all code is instrumented.

Some years ago, Patrick Lambein and I wrote a library that can patch and emulate most system calls on x86_64 Linux, by a really simple double-trap technique: overwrite the system call with ud2, x86's “always-illegal” instruction, then handle the resulting SIGILL. The handler can do the real syscall or whatever else it wants, before returning. This is not the fastest, but it's reliable and easy(-ish) to implement—at least to 90% of the way. So, for example, if we disassemble the C library's _exit function which once contained the following instructions

44 89 c0                mov    %r8d,%eax
0f 05                   syscall  
48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
76 e2                   jbe    0x7f173bf269c0 <_exit+32>

under our library this gets clobbered to instead be

44 89 c0                mov    %r8d,%eax
0f 0b                   ud2  
48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
76 e2                   jbe    0x7f173bf269c0 <_exit+32>

which will always cause a SIGILL. Our SIGILL handler scrapes up the syscall arguments into a structure and then passes this to whatever registered handler is installed. For example, a tracing handler, which prints some stuff and then resumes the system call, looks like this.

void print_pre_syscall(void *stream, struct generic_syscall *gsp, void *calling_addr, struct link_map *calling_object, void *ret) {
        char namebuf[5];
        snprintf(namebuf, sizeof namebuf, "%d", gsp->syscall_number);
        fprintf(stream, "== %d == > %p (%s+0x%x) %s(%p, %p, %p, %p, %p, %p)\n",
                calling_object ? calling_object->l_name : "(unknown)",
                calling_object ? (char*) calling_addr - (char*) calling_object->l_addr : (ptrdiff_t) 0,
// what actually gets called from the SIGILL handler
void systrap_pre_handling(struct generic_syscall *gsp)
        void *calling_addr = generic_syscall_get_ip(gsp);
        struct link_map *calling_object = get_highest_loaded_object_below(calling_addr);
        print_pre_syscall(traces_out, gsp, calling_addr, calling_object, NULL);

To ensure that all loaded code gets instrumented appropriately with the ud2s, I've recently expanded this to use a bootstrapping technique: after initially applying our trapping technique to hook mmap(), mremap() and mprotect() calls, our hooks ensure that if the guest application is asking to create any further executable memory, we will instrument its contents before we make it executable.

But how can we get all this code into the process in the first place?. What I did initially was use our old familiar LD_PRELOAD. We preload a library whose initializer stomps over the loaded text, clobbering syscalls into ud2 and installing the SIGILL handler. This is fine as far as it goes, but it misses the very beginning of execution. A tracer doing this cannot trace anything happening before the initializer runs, for example in other libraries' initializers or inside the dynamic linker. What we get is the following.

$ LD_PRELOAD=`pwd`/trace-syscalls.so /bin/true
== 26439 == > 0x7f6fb2a4f9d4 (/lib/x86_64-linux-gnu/libc.so.6+0xc99d4) exit_group(0, 0x3c, 0, 0x7fff766064b0, 0xe7, 0xffffffffffffff80)

It's neat that it looks like true only does one syscall, to exit_group(), as we might think it should. But if we try strace we find that in truth the process does a lot more than that one system call.

$ strace /bin/true
brk(0)                                  = 0x2447000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6ba773c000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libfakeroot/tls/x86_64/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/usr/lib/x86_64-linux-gnu/libfakeroot/tls/x86_64/x86_64", 0x7ffd49ca38e0) = -1 ENOENT (No such file or directory)
... snipped a *lot*! ...
arch_prctl(ARCH_SET_FS, 0x7f6ba7578740) = 0
mprotect(0x7f6ba7732000, 12288, PROT_READ) = 0
mprotect(0x605000, 4096, PROT_READ)     = 0
mprotect(0x7f6ba7765000, 4096, PROT_READ) = 0
exit_group(0)                           = ?
+++ exited with 0 +++

All these other, earlier system calls are happening during dynamic linking, or in general, before our constructor that clobbers the syscalls into ud2. (The strace manual page laments: “it is a pity that so much tracing clutter is produced by systems employing shared libraries”. My take is that it's better to know the truth!)

Let's repackage our code as a chain loader, as described last time. That way it will always run first, and because it's responsible for loading the dynamic linker, we can ensure that all its syscalls are trapped from the very start. This is a non-trivial delta to the donald-based code I mentioned in the last post. After we map the “real” dynamic linker, we also need to instrument its code. We do this by wrapping the enter() function from donald.

void __wrap_enter(void *entry_point)
        /* ... */
        trap_all_mappings(); // only ld.so is loaded right now
        /* ... */

What about code that the dynamic linker loads? Well, we use our bootstrapping approach to catch that. Whenever new code is loaded into the process, some system call is responsible for doing that. So as long as we catch that call, we can trap the system calls at load time. A bit of macro magic helps generate the wrappers, and some elided hackery is required to divine the ELF-level structure of the file being mapped.

#define mmap_args(v, sep) \
    v(void *, addr, 0) sep   v(size_t, length, 1) sep \
    v(int, prot, 2) sep      v(int, flags, 3) sep \
    v(int, fd, 4) sep        v(unsigned long, offset, 5)
    if (!(prot & PROT_EXEC)) return mmap(addr, length, prot, flags, fd, offset);
    // else... always map writable, non-executable initially
    int temporary_prot = (prot | PROT_WRITE | PROT_READ) & ~PROT_EXEC;
    void *mapping = (void*) mmap(addr, length, temporary_prot, flags, fd, offset);
    if (mapping != MAP_FAILED)
        /* ... */
        if (/* needs trapping*/)
            inferred_vaddr = (uintptr_t) mapping - matched->p_vaddr;
            trap_one_executable_region_given_shdrs(mapping, (void*) mapping + length,
                    filename, /* is_writable */ 1, /* is_readable */ 1,
                    shdrs, ehdr.e_shnum,
            // OK, done the trap so let's fix up the protection (always different)
            ret = mprotect(mapping, length, prot);
            if (ret != 0) goto fail;
            /* ... */
        /* ... */
    /* ... */

Making this work on Linux/x86 (32- and 64-bit) required some “interesting” code in a few cases, over and above the chain loader trick. The current version still has some unsupported features (sigaltstack() among them) but can support a lot of stuff thanks to the following gyrations.

The good news is that things mostly work. In particular, I can do:

$ ./trace-syscalls-ld.so /bin/true

and get “full” output analogous to that of strace, instead of the old output that was missing many system calls.

What about correctness of all this? This bootstrapping approach only gets us 100% coverage of system calls on some assumptions.

Assumption 1 isn't true for gettimeofday() in Linux's vdso, which implements it as a read from kernel memory. No “interesting” system calls are implemented that way. Do we really just mean “system calls”, though? One could argue that fault-handling paths are in some ways like system calls; we can't trace entries and exits to/from these, though neither does strace. Signal delivery is also a bit like a return from a system call, and we don't trace that, in contrast to strace (which gets any event ptrace exposes, and that includes signals). We could trace some of it by installing our own handlers for all signals, though that would require a lot of care to preserve the program's original semantics... it starts to become a full-on “transparency” problem, of the kind DynamioRIO's authors have articulated.

For assumption 2a to be true, we must forbid mappings that are simultaneously writable and executable. Some programs, such as the just-in-time compilers of V8 or PyPy, still likes to create executable memory and write into it. If we allow such mappings, we have no synchronous opportunity to modify code before it becomes executable. My best guess is to allow these programs to carry on with an emulation of writable-and-executable mappings, adaptively switching between really-writable and really-executable and handling the segmentation faults in user space. The overhead would be great if done naively, but some kind of adaptive hysteresis might handle it well enough (even maybe as simple as “flip after N faults”).

A more subtle version of the same problem is that we need to avoid distinct simultaneous W and X mappings of the same underlying object, such as a file that is MAP_SHARED. That would have the same effect as a writable-and-executable mapping. Such a check is easy in principle, but requires tracking the inode and device number behind every existing file mapping. (File descriptors are OK because we can fstat() them on demand.) Attempts to remap the same object should probably return -ETXTBSY. I've left that as an exercise for someone else, for the moment.

Assumption 2b brings the issue of overt instructions versus punny instructions. On x86 there is no alignment requirement for instructions, so we can jump into the middle of instructions and thereby do something quite different than intended, perhaps including system calls. For example, the innocuous instruction

 mov    $0x9090050f,%eax

assembles to

 b8 0f 05 90 90

but if we jump one byte into this byte sequence, we do a syscall followed by two nops. Preventing this is tricky. Perhaps we trust code not to do this. Perhaps we built the program using some heavyweight control-flow integrity mechanism (and can also check that dynamically loaded code satisfies its requirements). Perhaps we are on an alignment-preserving architecture (so, not x86). For x86, once we have a sufficiently general jump-based trampoline mechanism, perhaps itself making use of instruction punning, we can forcibly stub out any instruction or instruction pair that internally contains a system call byte sequence. Doing this is sufficient only if our trampoline itself is guaranteed not to include such instructions. Such a guarantee is quite hard when immediates are involved... we would have to forbid them to contain the bytes 0f 05, for example. The same is true for addresses or offsets, but these are easier under PC-relative addressing (we get to choose where to place the trampoline). Most likely we can rely on such immediates being so rare that if we simply clobber the instruction with a ud2 and emulate it with something like libx86emulate, nobody will notice any slowdown.

Erk. That was a deeper dive than I planned. But the take-home is that early experience suggests this seems to be possible. We can intercept system cals using a chain loader, bootstrappingly, and so get control of the program's interactions across the system call interface. This lets us build all manner of tools rather like LD_PRELOAD, but directly at the system call level and with stronger coverage properties.

[/devel] permanent link contact

Wed, 13 Oct 2021

No more Dr Nice Guy

[I wrote this back in April, at a point when my time at Kent was drawing to a close and the overload factor had been high for a long time. My current situation at King's is radically different! Whether that will last is less clear.]

As a graph theorist might put it, my in-degree is too high. My time and (especially) head-space are scarce resources. Access to me needs to be limited. Ironically, it requires a certain amount of big-headedness to say this.

Big-headed it may be, but failing to attend to this has proven bad for my health. Every incoming request contributes to a sense of being under attack. Avoiding this is difficult when you're a nice person, working within a culture of niceness and an institution with financial incentives to be nice to (especially) students.

From now on, I have to make an effort to put up a protective veneer of non-niceness. Here I am writing some ground rules to and for myself. It will be tough to keep to them, but as I now know, it will be impossible to do my job if I don't.

[/highered] permanent link contact

Fri, 28 May 2021

Role again

Sorry for the pun. Yes, it's supposed to be about rolling the dice. I've recently(-ish) handed in my notice here at the University of Kent. In July I'll be starting a new job (the same job, but) at King's College London.

The reasons for the move are primarily personal. Of course, that doesn't mean they are unrelated to work... in academic jobs especially, work is personal.

The move is not a promotion. In terms of accumulating money and status, if anything it will probably slow me down. So, it's just as well that those are not what motivate me.

What does motivate me? In work terms, my strongest motivation is to make progress with my research. In career terms, it's to find a home, metaphorically speaking. I will leave behind at Kent many great colleagues, and an institution that I'll remain fond of on many levels. Still, ultimately I had to admit that Canterbury didn't feel like home, and neither did the institution.

It would be wrong not to regard that as a failure. Both institution and city have many home-like qualities. I have failed to turn them into an actual home. I've also failed to repay the confidence of the institution and the senior colleagues, who invested in me on the promise that I would do so. I don't feel great about all this. Being an engineer, occasionally a scientist, I suppose I shouldn't mind failing when it teaches me something. But I prefer to succeed... and although I've learned a lot, I'm not sure it amounts to enough for me to succeed next time.

What I do know is that for most of my time in the job I've been putting a cheery face on an unhappy existence. Even before the pandemic, my move to Kent had coincided with both work and life generally becoming a lot less fun and a lot more stressful. My research usually felt like it was barely moving forwards. That does likely owe partly to my incompetence, sloth and/or feebleness of constitution. However, although I was once slapped down for saying it, I'll say it again: the teaching load has been unquestionably high. That's not a function of the number of lectures per year so much as of less quantifiable factors: the huge amount of direct contact with students (often with few TAs to absorb this), the poor design of admin processes (academics are not removed from the loop to anything like the extent they could be), and a heavy reliance on coursework and the like. (There's more; I'll save the complete list for another time.) The kinds of work that give me satisfaction are not just research—they do also include giving good lectures and explaining stuff to students. But in all cases, the satisfying kinds of work have been a small part of what I do, and the ones that feel most squeezed by other things. In all this, the institution has more often felt like an adversary than an ally. I really hadn't been expecting to apply for jobs elsewhere—my tentatively planned “drastic life change” was something like “move to Whitstable”—but when the pandemic hit, my ties to the institution felt weak. The noises coming out of the central university ‘management’ left me feeling more alienated than ever. Suddenly, moving seemed like the right thing.

Will I enjoy things better at King's? Obviously I think there's a chance. It's difficult to feel certain though. While the University of Kent has been going through an especially rough patch lately, no UK university is in great shape. My impression is that the marketisation of HE has been particularly unkind to Kent—an institution born in the optimistic sixties, and firmly of a plucky-yet-small character. It's located in the corner of a wider region not short of powerful “competitors”. Historically, public policy gave institutions room to carve out their own niche, amid an environment of only gradual change. Since the nineties but especially in the last ten years, politicians have cast aside stability and artificially induced competition. To understand the present, you can do worse than look back twenty years and recognise the glacial forces in action. Kent's Research Strategy from 1999 tells the story of those times: despite working harder and doing better, government policy had already placed it on the side of the curve that is deemed to deserve less, in a misguided belief that induced struggle would lead to a better-functioning institution. It hasn't.

Staying with the history theme, I enjoyed reading Graham Martin's account of the University of Kent's first twenty-five years, “From Vision to Reality”. Those years included an especially high-minded first fifteen followed by an apparently strong showing in the next ten, even when university budgets were coming under a big squeeze under Thatcher. Someone still needs to write the next book, covering 1990 onwards, which I would love to read. Somehow, those thirty years somehow went from pragmatic getting-by to a battle for survival. A (very) senior colleague from another department mentioned that when he had started as a lecturer, for undergraduate applicants Kent was “a viable second choice to Oxbridge”. Some of its sibling “plate-glass” institutions, such as York and Warwick, have consolidated that status... Kent definitely has not. I don't know when or whether a change occurred, and perhaps the memory is a little rose-tinted. (I'm curious about what happened around 1994 in the graph on page 11 of a little report I wrote, where the research indicators started going down... not clear it can be explained simply by sudden competition from ex-polytechnics. In any case the data is noisy.)

Another major factor for me has been that Canterbury, although charming, is a small place. I tend to like smallish places and was feeling pretty optimistic when I moved. In hindsight I should have foreseen that there is a world of difference between Cambridge and Canterbury. Not only is Canterbury half the size population-wise, it has less than half of the cultural activity. In fairness, that is mainly because Cambridge punches far above its population-weight on that. Canterbury is a fine place if you're a family type, and it's great for countryside, coastline and lazy weekend charm. For a thirtysomething single person who wants culture and social life, it's not the best. London is just far enough away to be an exertion. Back when concerts were a thing that happened, I lost count of the times I pencilled in my diary “maybe go to London to see X” on some weeknight, but just wasn't up for the trip. Reverse-commuting from London might be a better option, but that entails the double-whammy of London housing costs on a non-London salary. Big-city living doesn't appeal much to me in any case. (I've already moved back to Cambridge, and will become some kind of “part-time commuter” for my King's job.)

Ironically, the university itself is surely a place where its educated thirtysomethings could find community. But the idea that academic staff can expect to find not just “work” but also community in their jobs is increasingly alien. At the departmental level, we in “Computing” (I still hate that name) do very well at being a friendly and sociable bunch. But more widely in the university there are few footholds available. (To an extent, students at Kent also seem to get siloed by subject, which can't be good.) The relatively small numbers of PhD students and research staff at Kent also means there isn't a “long tail” to the age demographic, which might organically mix together lecturer with postdoc with graduate student with undergraduate. Such an organic long tail is of course something Cambridge thrives on... in addition to a considerable institutional culture of community-building. (In Canterbury, as I guess at most “normal” institutions, institutionally provided booze does nor pour forth to anywhere near the same extent.) In former times at Kent, I imagine that the Colleges would have provided something of these kinds of community, though they appear not to done that for some decades now. In fact rumour has it that “they never worked”... but I've also seen evidence that they once worked a lot more than they do now. The closest I get to feeling part of a wider university community is in union meetings, where I am reminded that I do have peers in the wider university, but we're all exhausted by our jobs. Indeed I'm sure I could have made a better go of things socially in Canterbury if I'd had more energy to spare.

Despite all that, I am seeing a few green shoots at Kent. In the new School-in-Division structure, we have better availability of certain kinds of support staff. At the very top of the institution there is a potentially leaner and more academic-focused structure including several fresh faces. The Senate is once again (just about) a majority-academic body that is starting to rediscover its voice. The university is unfailingly keen to reassert itself as a still-“excellent” research university, even if ‘management’ knowledge of how to do so seems patchy. In Computing I am seeing (unscientifically) a small increase in the average quality of our undergraduate applicants. We may even have made headway against the chronic understaffing and duplication of teaching work that have overloaded us during the few years I've been around. That's all good.

However, in common with many institutions, I also see signs of the opposite: the still-expanding reach of an entrenched managerialism, poor judgement in treatment of staff and finances, an ever-increasing culture of top-down decision-making, the creep of corporate culture at the expense of the academic, and an increasing disconnect between ‘management’ and the academic staff who carry out the mission of the institution. Just as some ‘management’ want to get more serious about research, so the same people seem to view this simply as a way to make academics raise an ever-greater share of the funds behind their own jobs. All UK universities are subject to these or similar forces right now. My pessimistic take is that our universities are managing to preserve academic values only to the extent that they preserve academic self-governance and self-administration—that is to say, hardly at all. Even in Cambridge, where academics have both a culture of pushing back and some mechanisms for doing so, it felt to me much like the option take up a cudgel while the artillery rains down. It's unlikely it does more than fractionally slowing the institution's progress in the sector-wide race to the bottom. That helps to preserve its advantage, but doesn't make for a well-functioning institution.

So, in my new job at King's, the most reasonable expectation is an institution with a somewhat different selection of faults. I have to roll the dice and hope that what turns up is a better fit for me personally. It's also a chance to think more strategically about what I want to do with my limited time on this planet. The past few years have been a blur. I've tried to keep too many plates spinning, with the result that all of them are teetering if they're not already on the floor. My health and self-confidence have suffered too. I'd been considering Kent the only roll of the academic dice that I was likely to get, so I'm feeling fortunate to get another one. I mustn't waste it by plodding on with “more of the same”.

I will have to let go of some of my research avenues, so that the others might actually lead somewhere. I need to focus on the ones where I'm most likely to make a difference to the world over the course of a longer stretch of my career. It takes courage to adopt a longer-term view. I will need to care less about covering my backside on short-term “impact” or “track record” issues. In caring less, I will likely trade away short-term productivity for the potential longer-term... so will probably close myself off from future rolls of the dice. I therefore need to make sure that my long-term plan can actually deliver.

I will need to be more ruthless in protecting time for the things that are important, and organising my so-called (work) life a bit more selfishly around what works for me. If I do this right, I will sadly be a less generous colleague who more often does a poor job of certain less important things. This brings an emotional cost; I enjoy doing a good job of things, even the unimportant ones, and am averse to disappointing people.

Another way to see that is that I need to choose the strategically better flavour of disappointment, and get used to it. I have repeatedly let my mind and schedule become overwhelmed by admin, teaching, marking, service, and making myself available to others. That's not a statement of how generous I am, but of how limited my capacity for those things really is. Recently it has become self-defeating, and I've been disappointing myself and others in what really matters—research very much included. It's been a downward spiral that has limited my ability to function in general. The lesson seems to be that in this job you need to declare bankruptcy early. Bad feeling is guaranteed in some form or other. I need to do what is in my own interests, and be hard-nosed about that.

“Nose of steel” is not one my attributes. And more generally, although I consider myself a natural “classic” academic, I'm still not entirely convinced that modern academia really is where my vocation lies, or should lie. I am especially resistant to shaping myself into the mould of “fundraiser–manager”—increasingly the presumed template for “science” academics, in the eyes of universities and funders and governments. Although it might make sense in bench sciences, it is a poor fit both for the work I do and for me personally.

Despite all that, the academic role is emphatically not one thing, to the point that generalisations quickly get annoying. The regrettable “winners and losers” political set-up amplifies even more the extent to which the “same job” can be radically different for different people and situations. It also means that change is possible. It seems worth one more roll.

[/highered] permanent link contact

Wed, 03 Mar 2021

Career thoughts on academia, industry and points in between

In early 2018 I was at a crossroads in my career: academia, industrial research, or somehow go it alone? I was a postdoctoral researcher at the University of Cambridge, but feeling some pressures to move on. This post is a (somewhat edited) time capsule containing some notes-to-self I made about that decision.

What happened is “history”, of course: I became an academic at the University of Kent. However, you wouldn't necessarily have predicted that from reading these notes. I'm still not wedded to the academic path; most of the questions here are still live in my mind. What I am wedded to is my personal line of research; I'm not just going to “go and work for Google” (shudder). Being a computer scientist, the options and trade-offs I face are somewhat different from those in various other disciplines.

(I should qualify: here academia means mostly “UK academia”, which has some particular problems right now... but is, sadly, not too divergent from the rest of the world.)

I have not made that much effort to edit the notes, so they are sometimes staccato, sometimes repetitive, and don't reflect where my thinking has moved on (especially from experience at Kent; in hindsight they contain a lot of Cambridge privilege). They start here.

Industrial research obviously forms a spectrum: some has more of a development flavour, whereas in some extreme cases it resembles academic research. I would want to fall near the latter end, although not necessarily maximally so.

Problems with academia. There are many. Let me start somewhere, with the following.

Government interference, managerialism. I have become jaded and frustrated by these aspects of the academic sector. They are arguably worse than corporate nonsense in industrial research, because they are so needless. Endless political “initiatives” and fad-chasing, in academia, seem less excusable than when arising from a profit motive. See also “overheads” below.

Research councils are unreliable; it's better not to depend on them. However, as an academic one has the option of using them or not; industry funding is also an option, modulo perhaps high overheads.

Unlike most sciences, I don't need big funding from research councils. My skills are in demand; perhaps I could make enough money from consulting work? My corner of CS is one where a small group, or even a focused individual, can do great work; I don't need to pay for a huge lab or huge group.

I have always felt like an outsider in the academic world. I have a habit of rejecting established research programmes, and another habit of attacking cold problems. My work meets with bewilderment fairly often. Many of the people who seem to “get it” come from industry—thinking of my blog, Twitter and industry-talk audiences. Industry conferences are interesting—they make me feel sometimes a peer, sometimes a learner, sometimes an educator. It is an educational environment, but worlds away from undergraduate computer science; food for thought.

Skill set, part 1. I see myself as both a creative individual and an individual creator. Unlike many [natural] “scientists” I do not see myself as a collaborative agent in the service of some a grand research programme. To stick with academia seems to mean sticking with a system where this is frowned upon or misunderstood, partly from how my kind of CS research is different from “science” “research” in the main. (Of course creative direction doesn't preclude “collaborating”, but it has to be a collaboration on the basis of shared vision, not just nuts and bolts.)

Skill set, part two. Context-switching is not my forte. I don't know for sure, but I feel I could easily be crushed by even a moderate teaching or admin load.

Textbook teaching is a turn-off. In Cambridge I already spend a lot of teaching effort on material that I don't even know, that I've never needed to know, that is not where my expertise lies, and that has dubious educational value (never mind training value). In this instance I'm thinking mostly about the tedious way in which compilers are taught here [in Cambridge] and elsewhere. That perhaps comes down to the fact that my strength is not mathematics, yet in CS, at least in Cambridge, the educational material is desperate to make things mathematical wherever it can. I care first for education rather than training. But this is neither good training nor good education for many of the people with relevant skills/interests. (It is interesting that education may be tailored to a skill/interest profile without becoming “training”—or so I contend.)

Textbook rejoinder: I should take care to remember that the textbook stuff did teach me some things. As a beginning programmer, the mechanics of programming were fine, but I couldn't come up with how to solve certain problems: how to write a game or other reactive system (using an event loop), or how to do heuristic rather than brute-force search (I struggled with my “Countdown numbers game” program; I remember marvelling at the pathfinding abilities of later Sierra SCI games); or how to write a parser that worked systematically rather than by ad-hoc rewritings (I remember wondering how Sierra AGI games did this... noticing a theme here). Only by imbibing some textbook-ish wisdom (sometimes from lectures) could I solve these problems.

Academic, non-practical teaching of practical topics: I've wasted too much of my life staring at students' handwritten code on paper not being sure whether it works or not. A very innovative academic course might avoid this using automated testing. But the cost of redeveloping these from scratch is prohibitive if done on a per-institution or per-academic basis... once again, justifying MOOCs (more on MOOCs below). One could argue that a lectureship would allow me to fix that, by designing the courses differently. However, no doubt my room for variation would be limited, and it all comes out of the time budget.

Teaching repetition, teaching overstretch. Lecturers produce hastily-written slide decks over and over again. They throw away their predecessors' and start over, but throw in their own new fresh set of mistakes when they do so (thanks, Philip). As a supervisor I'm tired of reading hasty, sloppy, unclear lecture materials and of seeing older materials thrown away once a new lecturer starts. The set-up of departments asking lecturers to teach courses, which the lecturer then “owns”, encourages this. Might it be better if the institution owns the course? It's hard to say. Might it be better if lecturers own courses but maintain them whether or not the institution asks them to keep giving it? Seems optimistic. Continuity seems to be the important thing: not just of running the course, but in the creative vision behind it. The current system takes liberal opportunity to break continuity.

High overheads. The academic sector has high and increasing overheads on the funding it receives, not unrelated (call me cynical) to its growing managerialism. If someone in industry has an interest in funding some exploratory research with a university, paying 130\% overheads (or whatever) can quickly erase their perceived value-for-money. It doesn't help that (Cambridge) institutional policies on overheads fail to distinguish big/rich companies from small/startup ones.

Availability of eager young helpers. This is where the academic world does well. You get access to some bright young things who can contribute prodigious efforts towards your cause for little or no money, because (sad but true) they're still in a mode of paying money for the privilege of learning. Still, to date I've yet to really make student projects work for my research; I've seen others do it (e.g. Hrutvik's CakeML project), but it seems better in Cambridge than elsewhere simply because strong students are in greater supply. Industrial research does well too in this regard, usually via established programmes of internships, but that works mostly with more advanced students and requires big money.

Libraries, journal subscriptions etc. Any form of going it alone, as some mix of consultant and “gentleman scientist”, would suffer from lacking these, unless I could also keep (some fraction of) a foot in an academic institution. Or maybe physical “local resident” access to an enlightened university's library is enough.

Future of academic CS, part one: does traditional academic learning have a healthy future in general? Does it have a healthy future in CS-style learning in particular? I'm being vague because I start to doubt the value of much of CS teaching—although I should preface that “in Cambridge”. In certain other institutions, it looks more vocational, which is potentially fine, although it's not really my calling and I don't like to see it masquerade as “education”. My working hypothesis is that textbook academic-style CS content can be delivered via MOOCs fairly effectively, so there's no point investing myself in this style of teaching unless I have the appetite to innovate in that space (I don't, yet).

Future of academic CS, part two: two other aspects of “CS education” are less easily dispensed with. Firstly, practical skills; they take practice. Secondly, deeper and longer-term perspectives, which in theory are what academics are good at. These are the sorts of things that (I hypothesise) make my writings and talks appealing to audiences like Strange Loop, Hacker News, etc.. These people mostly have experience under their belt and enjoy material that helps widen their viewpoint. It would be hard to teach undergraduates, meaningfully, the sort of content that I write in those articles. This could all be an artifact of how we have to teach CS “basics” at degree level. Perhaps (I'm guessing) in other subjects one can deliver a more educational degree because students know the basics and are ready to think about the deeper things.

Many things I wouldn't miss about leaving academia. Two are constant pressure to overextend myself, and constant dissatisfaction from doing a mediocre job. (These have “Matt Welsh resonance”.) Another is the expectation of poor work/life balance. I'm okay with the Collegiate Cambridge model of building your life around your work, i.e. deliberately eroding any separation. In fact I think that dedicated creative work benefits from this. But it doesn't mean that one should be trying to work unhealthy hours. Creativity and clear thought don't benefit from that. Modern academic careers are so pressured that it seems hard to avoid this.

The PhD mill, and my moral objection to it: there's a sort of expectation that as an academic I would acquire funding and “acquire” people, typically PhD students (and worse, RAs) to do “my work” (as opposed to just “their work, that I supervise”). Overall I'm not sure I can get behind this modus operandi. As a PhD student I did my own work. That only created trouble for me later, since it's not the normal thing, but still I feel no need to apologise for doing it. Meanwhile, fetishising lineage in research is part of what turns mostly-spent research programmes into self-perpetuating monsters, and promotes monoculture. Even the word “mentorship” makes me feel cynical.

There are some good things about academia. One reason for wanting a steady academic job has been stated to me as a “cushion for the lean years”. But there are other ways to build a cushion; how much cushion is really necessary? Do I expect any lean years? One should expect some periods that area lean in terms of grants/funding. But if I were a consultant, should I expect years that are lean in terms of clients? One would just have to put up with less interesting work, presumably.

The state within a state. Academia offers institutional support. Sometimes, purely public services (thinking public libraries) and/or wider enterprise (thinking co-working spaces etc) provide similar support. Is that enough? An intellectually stimulating environment is something academia can provide, but probably only the top institutions do really well.

R-e-s-p-e-c-t. Respectability of the academic path is another factor; but I reject that, in principle at least (though I'm not completely immune).

Growing a group: an ability to build a group around my agenda is probably a potential good thing about academia, despite the aforementioned distaste. A small group could perhaps be manageable without abstracting too much time away from activity that is nominally research. But is having 2–3 PhD students, plus teaching and admin, really productive and/or enjoyable relative to what I could do by myself (say, on research time derived from consulting income)?

Scaling to more than one person: academia lets one grow small and even medium-sized teams, although not without also becoming something of a manager. A small team is still many times better than a team of one. There is also some built-in direction: a PhD supervisor has some leadership-style influence. Of course I instinctively dislike that the moment it becomes “control”. In other contexts the same dynamic might conceivably fly (or not) for my work... I can think only of open-source as a team-building tactic. Getting an equivalent extent of team-building that way seems hard. Open source has its own problems: even fewer funding avenues, potential for pulling in different directions (but this happens in academia too), project politics, unreliable (and/or physically remote) contributors, unpredictable timeframes, mob rule of online forums, etc..

What do I think about US-style CS degrees and academia? They admit some amount of liberal arts-style breadth, and take longer to teach material [than in Cambridge], including (sometimes) proper practical classes. But their research story is probably still infected by many of the things I don't like: mathematical orthodoxy, more generally the tyranny of many incumbent research programmes that I don't believe in, the gameability of modern research culture, ditto for career incentives, and the faddish government/funding initiatives (is that less bad there than here? unclear).

Some more points for going it alone. I could pursue long-term research in perhaps a more focused, lower-stress working environment—but perhaps lonely and isolating if it's only me there... ideally want a balance. I could be free to tick some personal “living as I choose” boxes in doing so, regarding lifestyle, physical environment and working environment. After years in West Cambridge on a woefully mis-designed or undesigned site, I miss what makes the centre of Cambridge such a good space: space-efficient small-town planning. The University is moving further from this model, partly for understandable reasons, yet its managerial rather than academic leadership means it is failing to develop an acceptable alternative. I like the idea of working somewhere more rural, with easier access to nature. I could vote with my feet in favour of economic localism and other values that large organisations are painfully slow to catch on to.

Reading the page on "rat race" on Wikipedia, it fits academia well. I am reminded that although the academic world attracts a formidable density of very bright people, bright people need not be far-sighted or big-thinking people. In fact I suspect the correlation between these is fairly weak. We shouldn't suppose that adherence to “the system” is a consequence of intelligent analysis and rational optimisation; bright people can be remarkably strongly bound to social norms and mores, even when they have the intellectual capacity to question them.

An institution in mine own image? I am somewhat inspired by the Recurse Center. And I am a believer, in principle, in “if you don't see the institution you want, create it”. Creating “my own institution” really just means doing things in ways that work for me but also in a way that might provide “a home for other people”, i.e. to fulfil the moral duty of an elder. This has some appeal. If I did create a space for others, it would be intentionally small and definitely not mainstream, so it would be fine if it didn't appeal to many people. It would mainly be about enabling people to do research, as well as enabling myself; but maybe it'd have an educational angle too. I am a fan of Cambridge-esque college-style residential establishments, as a way both to build communities and to limit costs (but this may not fly without big/expensive perks, and needs updating anyway). I'd be pleased if such an institution could contribute to a low-wage-economy area (which would keep property costs down; thinking the antithesis of the San Francisco) as long as it was not too far from civilisation. My educational interests are more in substantial life-long learning, than in an undergraduate hothouse or “stamping machine”. Ditto research. How could it work?

[/highered] permanent link contact

Mon, 04 Jan 2021

Chain loading, not preloading: the dynamic linker as a virtualization vector

Suppose you want to run a pre-existing binary program but in some kind of instrumented or modified form. On ELF-based systems, people often do this with LD_PRELOAD: you preload a library that interferes with some C library calls or somehow tweaks the process state at startup (say to install a signal handler). The program itself is (hopefully) oblivious to what you've done, but its behaviour is modified accordingly. Some years back I saw a nice survey of applications in a presentation by Kevin Pulo; there may be other resources out there.

This post is about some alternative ELF-level trickery, that is sometimes a better option for doing this. Here it's shown on GNU/Linux, but any ELF platform should allow it.

If you've tried LD_PRELOAD, you'll know that there are lots of problems and gotchas with it. Your code has to deal with being initialized at an unpredictable time; the C library's binary interface evolves from version to version, so is prone to additions that sidestep your preloads; it doesn't work for programs that do direct system calls (not via the libc) or are statically linked. What I'm about to describe avoids many of these problems, by creating what I'll call an ELF chain loader. This loader must do any program instrumentation at the instruction level, rather than the linkage level as with LD_PRELOAD. So in a sense, chain loading is less powerful than preloading: it's working at a lower level, and all your tweaks must be somehow “installed” at startup, rather than as part of the “dialogue” of library calls. However, there's a pattern we can use to overcome that limitation, so in fact I'd argue the result is more powerful than LD_PRELOAD, although (for now) harder to use.

Suppose you have a command invocation

$ ./myprog

and want your tweaked command to be invocable something like

$ LD_PRELOAD=mymods.so ./myprog

or (nicer)

$ ./mymods ./myprog

In the latter, mymods is a chain loader binary. A chain loader is basically a stub dynamic linker. It doesn't actually do dynamic linking; it's invoked like a dynamic linker, but it just loads another dynamic linker—after doing whatever tweaks and fiddles you'd like it to do.

Dan Williams has written (on a page now sadly disappeared, but WayBack-available) about how to build your own dynamic linker or ld.so. Building a binary of the right form is the easy part. Building one that actually links and runs real programs is a big job, especially if you're using a complex C library (like glibc) or complex run-time features (like thread-local storage). But in outline, a dynamic linker basically looks like this.

  1. do bootstrap relocation
  2. initialize some stuff (link map, TLS, ...)
  3. locate and map libraries, transitively, resolving symbols, applying relocations, etc.
  4. call constructors
  5. jump to the main program's entry point

The chain loader that we'll build is basically just a stub, which will call onward to the real ld.so to handle most of the above stuff. So it is much simpler:

  1. do bootstrap relocation
  2. mess with the process however we like
  3. locate and map the next loader (e.g. the real dynamic linker)
  4. (optional) disguise the fact that we ever existed
  5. jump to the next loader's entry point

If we do it right, the next loader won't even know that the stub loader has run. We can think of a chain loader as a “virtualization vector”. It sets up a (somehow-) virtualized environment, in which the entire original process, from dynamic linking onwards, then proceeds to execute.

(The “disguise” part is interesting. It makes us almost indistinguishable from malware! Virtualization is about transparently subverting a pre-existing process's execution, and there's a large class of malware that also does this.)

A few years ago I wrote a skeleton of a dynamic linker, called donald. It can load only very simple statically linked programs—hardly a real dynamic linker! Nevertheless it's a good vehicle for understanding what goes on during process start-up, and is a basis for building our chain loader. Caveat: it's very quick-and-dirty; it doesn't always use robust coding patterns, and it uses some unportable hacks to keep the code simple. That makes it possible to show almost all of it in a single blog post! Following Dan Williams's recipe, the command we use to link donald is the following.

$ cc -std=gnu99 -g -fPIC -fno-stack-protector \
  -fuse-ld=bfd -Wl,-Bsymbolic -nostdlib -nostartfiles -shared \
  -o "donald.so" premain.o main.o entry.o load.o \
  -Wl,-Bstatic /path/to/musl/lib/libc.a -Wl,-Bsymbolic \
  -T donald.lds -Wl,-soname=ld-linux.so.2 

where we're using a nice simple C library implementation, namely musl, to provide the libc calls that we use internally. The donald.lds file is a lightly tweaked linker script that adds a symbol that'll help us do bootstrap relocation, as per Dan Williams's article. The work of donald is done in the C files premain.c, main.c, entry.c and load.c.

“Bootstrap relocation” refers to the solution to an essential problem: who links the dynamic linker? In old-fashioned static linking scenarios, binaries contain a ready-to-go memory image. The program is loaded at address 0, all address bindings are known, and references have been fixed up when the binary was linked. By contrast, in dynamic linking scenarios, a library might be loaded at any address, meaning it still contains some relocation records describing fixups needed at load time. These are done by the dynamic linker. But what if you are the dynamic linker? The dynamic linker itself is just ld.so, a shared library; it, too, can be loaded anywhere. So who relocates the relocator? The answer: it relocates itself. The entry path of the dynamic linker consists of code specially crafted to run correctly regardless of where it is loaded. One of its first tasks is to locate the table of its own relocation records, relocate itself using them, and then continue execution in a “normal” environment.

On the x86-64 architecture, this is pretty straightforward because most data and code references are PC-relative. (On other architectures it's often much hairier, because it's harder to avoid using any absolute address references before you've completed bootstrap relocation.) Usually the entry path is written in per-arch assembly, precisely to accommodate the less straightforward architectures. In donald, which works only on x86-64, we have the option to write it directly “in C”. It's not really C, because we're bound by a very wacky contract: don't write any code that will cause the compiler to use non-relative addressing modes! The entry point of donald, called directly by the kernel, looks like the following.

/* The function prologue pushes rbp on entry, decrementing the stack
 * pointer by 8. Then it saves rsp into rbp. So by the time we see rbp, 
 * it holds the entry stack pointer *minus 8 bytes*. */
#define BP_TO_SP_FIXUP 0x8
void *rsp_on_entry HIDDEN;
/* This isn't the usual "main"; it's the raw entry point of the application.
 * We link with -nostartfiles. We then define our own main. */
int _start(void)
    /* gcc doesn't let us disable prologue/epilogue, so we have to fudge it.
     * We assume rsp is saved into rbp in the prologue. */
    register unsigned char *bp_after_main_prologue;
    __asm__ ("movq %%rbp, %0\n" : "=r"(bp_after_main_prologue));

    int argc;
    char **argv;
    rsp_on_entry = bp_after_main_prologue + BP_TO_SP_FIXUP;
    preinit(rsp_on_entry, &argc, &argv); // get us a sane environment

    printf("Hello from " DONALD_NAME "!\n");

    int ret = main(argc, argv);

    /* We're executing without startfile code, so returning 0 would not make sense. 
     * Calling exit() brings a dependency on fini_array stuff that we need to avoid
     * since it depends on the startup files. So just do the syscall directly. */
    syscall(SYS_exit, ret);


When the kernel jumps to the above code, the stack pointer is pointing at a bunch of useful information: the program arguments and so forth. But the C compiler will generate a prologue that moves it away from that. So the first thing we do is use a hard-coded hack to “undo” that, and get the “stack pointer on entry”, which we pass to our preinit function.

Using this pointer, preinit can navigate the initial stack to find all that data of interest. The data starts with argc; above that is argv (the vector of pointers to argument strings), above that envp (the same but for environment variables), then the auxiliary vector of records containing “stuff the kernel wanted to tell us” and finally the strings these argv and envp vectors point to. You can read more about auxiliary vectors here. Our preinit parses this chunk of memory, and extracts a couple of facts that we'll need to do our work: the page size and our own base address.

Bootstrap relocation uses the latter of these. We find our own dynamic linking information (symbol table and relocation table) using the _DYNAMIC symbol. But remember “this isn't C”: simply referring to _DYNAMIC may not work yet, because it's defined outside the current compilation unit (by the linker, in fact) so the compiler may emit a reference that needs relocation. By definition, the relocation hasn't been applied yet. In donald I used an unorthodox (and unportable) hack: taking the address of _DYNAMIC in our C code before we relocate ourselves gets us the unrelocated (file-relative) address, to which we manually add our own base address as snarfed from the auxiliary vector. (Better code would probably declare _DYNAMIC as file-local, using the ELF “hidden” or “internal” visibility, which would allow the linker to fix this up into rip-relative addressing.) Either way, we can get to the _DYNAMIC vector, which gets us to our symbol and relocation tables; then we walk the relocation table and apply each reloc.

static inline void __attribute__((always_inline)) bootstrap_relocate(unsigned char *at_base)
    /* We scan _DYNAMIC to get our own symbol table.
     * HACK: we manually relocate &_DYNAMIC
     * by our load address to get its actual address. */
    ElfW(Dyn) *p_dyn = (void*)(at_base + (uintptr_t) &_DYNAMIC);
    ElfW(Sym) *dynsym_start = NULL;
    unsigned long dynsym_nsyms = 0;
    ElfW(Rela) *rela_dyn_start = NULL;
    ElfW(Rela) *rela_plt_start = NULL;
    unsigned long rela_dyn_sz = 0;
    unsigned long rela_dyn_entsz = 0;
    unsigned long rela_dyn_nents = 0;
    unsigned long rela_plt_sz = 0;
    while (p_dyn->d_tag != DT_NULL)
        if (p_dyn->d_tag == DT_SYMTAB) dynsym_start = (void*)(at_base + p_dyn->d_un.d_ptr);
        else if (p_dyn->d_tag == DT_SYMENT) dynsym_nsyms = p_dyn->d_un.d_val;
        else if (p_dyn->d_tag == DT_RELA) rela_dyn_start = (void *)(at_base + p_dyn->d_un.d_ptr);
        else if (p_dyn->d_tag == DT_RELASZ) rela_dyn_sz = p_dyn->d_un.d_val;
        else if (p_dyn->d_tag == DT_RELAENT) rela_dyn_entsz = p_dyn->d_un.d_val;
        else if (p_dyn->d_tag == DT_JMPREL) rela_plt_start = (void *)(at_base + p_dyn->d_un.d_ptr);
        else if (p_dyn->d_tag == DT_PLTRELSZ) rela_plt_sz = p_dyn->d_un.d_val;
    if (rela_dyn_entsz > 0) rela_dyn_nents = rela_dyn_sz / rela_dyn_entsz;
    /* We loop over the relocs table and relocate what needs relocating. */
    ElfW(Rela) *p_rela = rela_dyn_start;
    for (int i = 0; i < rela_dyn_nents; ++i)
        do_one_rela(rela_dyn_start + i, at_base, dynsym_start);
    p_rela = rela_plt_start;
    /* Also do .rela.plt */
    for (int i = 0; i < (rela_plt_sz / sizeof (Elf64_Rela)); ++i)
        do_one_rela(rela_plt_start + i, at_base, dynsym_start);

I haven't shown the function that applies a single relocation record, do_one_rela, but it's pretty simple. In practice it only needs to know a few flavours of relocation.

static inline void __attribute__((always_inline)) 
do_one_rela(ElfW(Rela) *p_rela, unsigned char *at_base, ElfW(Sym) *p_dynsym)
#define SYMADDR(r_info) (p_dynsym[ELF64_R_SYM((r_info))].st_value)
    Elf64_Addr *reloc_addr = (Elf64_Addr *)(at_base + p_rela->r_offset);
    switch (ELF64_R_TYPE(p_rela->r_info))
        case R_X86_64_RELATIVE: // no symbol addr, because we're RELATIVE
            *reloc_addr = (Elf64_Addr)(at_base + p_rela->r_addend); 
        case R_X86_64_64: 
            *reloc_addr = (Elf64_Addr)(at_base + SYMADDR(p_rela->r_info) + p_rela->r_addend);
        case R_X86_64_JUMP_SLOT:
        case R_X86_64_GLOB_DAT:
            *reloc_addr = (Elf64_Addr)(at_base + SYMADDR(p_rela->r_info));
            /* We can't report an error in any useful way here. */
#undef SYMADDR

We're now getting ready to take on the main task of being a chain loader, which plain donald is not (at least not intentionally). Ideally we want to be invocable in two ways: as a bona fide dynamic linker requested directly by an executable, as we see here:

$ readelf -Wl /bin/blah

Elf file type is EXEC (Executable file)
Entry point 0x401432
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
  INTERP         0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /path/to/my/ld.so]

and as a command-line tool

$ /path/to/my/ld.so /bin/blah

These two cases work out slightly differently at run time. The first is the “normal” case, and the kernel has already mapped both the program itself and us the dynamic linker. In the latter, the kernel thinks it's loading us as a statically linked binary (curiously not caring that we're ET_DYN not ET_EXEC), and it's up to us to parse our arguments, identify the program we want to run, and load it. The contents of the auxiliary vector are slightly different in the two cases, and indeed that's how we'll tell the difference and behave accordingly.

int main(int argc, char **argv)
    // were we invoked by name, or as a .interp?
    // use AT_ENTRY to find out: it's _start if we were invoked as a program,
    // otherwise it's the program's _start
    int argv_program_ind;
    uintptr_t entry = (uintptr_t) &_start;
    _Bool we_are_the_program = 1;
    for (ElfW(auxv_t) *p = p_auxv; p->a_type; ++p)
        switch (p->a_type)
            case AT_ENTRY:
                if (p->a_un.a_val != (uintptr_t) &_start) we_are_the_program = 0;
                entry = p->a_un.a_val;
    fprintf(stderr, "We think we are%sthe program\n", we_are_the_program ? " " : " not ");

Since we're a chain loader, most of what we want to do is load the next loader and jump to it. To load the next loader, we'll need to memory-map some chunks of the file, by reading its ELF header and program headers. We can simply do this by file I/O.

    /* We always chain-load the ld.so and let it load the program. Let's read it. */
    const char ldso_path[] = "/lib64/ld-linux-x86-64.so.2"; // HACK: sysdep
    int ldso_fd = open(ldso_path, O_RDONLY);
    if (ldso_fd == -1) { die("could not open %s\n", ldso_path); }
    struct stat ldso_stat;
    int ret = fstat(ldso_fd, &ldso_stat);
    if (ret != 0) { die("could not stat %s\n", ldso_path); }

    // read the ELF header
    ssize_t nread;
    ElfW(Ehdr) ehdr; nread = read(ldso_fd, &ehdr, sizeof (ElfW(Ehdr)));
    if (nread != sizeof (ElfW(Ehdr))) die("could not read ELF header of %s\n", ldso_path);
    // check it's a file we can grok
    if (ehdr.e_ident[EI_MAG0] != 0x7f
        || ehdr.e_ident[EI_MAG1] != 'E'
        || ehdr.e_ident[EI_MAG2] != 'L'
        || ehdr.e_ident[EI_MAG3] != 'F'
        || ehdr.e_ident[EI_CLASS] != ELFCLASS64
        || ehdr.e_ident[EI_DATA] != ELFDATA2LSB
        || ehdr.e_ident[EI_VERSION] != EV_CURRENT
        || (ehdr.e_ident[EI_OSABI] != ELFOSABI_SYSV && ehdr.e_ident[EI_OSABI] != ELFOSABI_GNU)
        // || phdr->e_ident[EI_ABIVERSION] != /* what? */
        || ehdr.e_type != ET_DYN
        || ehdr.e_machine != EM_X86_64
        die("unsupported file: %s\n", ldso_path);
    off_t newloc = lseek(ldso_fd, ehdr.e_phoff, SEEK_SET);

    // process the PT_LOADs
    ElfW(Phdr) phdrs[ehdr.e_phnum];
    for (unsigned i = 0; i < ehdr.e_phnum; ++i)
        off_t off = ehdr.e_phoff + i * ehdr.e_phentsize;
        newloc = lseek(ldso_fd, off, SEEK_SET);
        if (newloc != off) die("could not seek to program header %d in %s\n", i, ldso_path);
        size_t ntoread = MIN(sizeof phdrs[0], ehdr.e_phentsize);
        nread = read(ldso_fd, &phdrs[i], ntoread);
        if (nread != ntoread) die("could not read program header %d in %s\n", i, ldso_path);

This is all fairly common to any dynamic linker. Now we've collected the PT_LOAD program headers, which tell us which parts of the file to map into memory. To rule out partial failures, we reserve enough memory to be sure we can map the loader in a big contiguous chunk, even though that might be split over many PT_LOADs.

    ElfW(Addr) max_vaddr = 0;
    for (unsigned i = 0; i < ehdr.e_phnum; ++i)
        ElfW(Addr) max_vaddr_this_obj = phdrs[i].p_vaddr + phdrs[i].p_memsz;
        if (max_vaddr_this_obj > max_vaddr) max_vaddr = max_vaddr_this_obj;
    uintptr_t base_addr_hint = 0x555555556000;
    void *base = mmap((void*) base_addr_hint, max_vaddr, PROT_NONE, MAP_PRIVATE,
        ldso_fd, 0);
    if (base == MAP_FAILED) die("could not map %s with PROT_NONE\n", ldso_path);
    uintptr_t base_addr = (uintptr_t) base;

Once we've done that, we have reserved space at some load address which could be arbitrary (but we'll use a hint of 0x555555556000)... all we need to do is turn each PT_LOAD header into the (usually) one corresponding mmap() invocation.

    uintptr_t phdrs_addr = 0;
    for (unsigned i = 0; i < ehdr.e_phnum; ++i)
        if (phdrs[i].p_type == PT_LOAD)
            _Bool read = (phdrs[i].p_flags & PF_R);
            _Bool write = (phdrs[i].p_flags & PF_W);
            _Bool exec = (phdrs[i].p_flags & PF_X);

            if (phdrs[i].p_offset < ehdr.e_phoff
                    && phdrs[i].p_filesz >= ehdr.e_phoff + (ehdr.e_phnum + ehdr.e_phentsize))
                phdrs_addr = base_addr + phdrs[i].p_vaddr + (ehdr.e_phoff - phdrs[i].p_offset);
            ret = load_one_phdr(base_addr, ldso_fd, phdrs[i].p_vaddr,
                phdrs[i].p_offset, phdrs[i].p_memsz, phdrs[i].p_filesz, read, write, exec);
            switch (ret)
                case 2: die("file %s has bad PT_LOAD filesz/memsz (phdr index %d)\n", 
                        ldso_path, i);
                case 1: die("could not create mapping for PT_LOAD phdr index %d\n", i);
                case 0: break;
                    die("BUG: mysterious error in load_one_phdr() for PT_LOAD phdr index %d\n", i);

Now we've mapped the program we want to run, i.e. the next loader. In a real dynamic linker, our work would only just be beginning: we've mapped the next program, but still have to walk its depended-on libraries and load those (transitively), apply relocations in all the above, and so on. But in our case, we're most of the way there. All we have to do is do whatever tinkering is demanded by our use case, then transfer control to the next loader, using the entry point in its ELF headers. The next loader is not just any old program, however. To cover our tracks and ensure it can find the auxiliary vector just as we ded, we also return the stack pointer to where it was when we started.

void __attribute__((noreturn)) enter(void *entry_point)
    fprintf(stderr, DONALD_NAME ": jumping to system ld.so entry point %p with rsp %p\n",
        (void*) entry_point, rsp_on_entry);
    __asm__ volatile ("movq %0, %%rsp\n"
          "xorq %%rbp, %%rbp\n" /* clear rbp to avoid confusing stack walkers */
          "jmpq *%1\n" : : "m"(rsp_on_entry), "r"(entry_point));

In fact, we need to cover our tracks a bit better. If we're being invoked as a command, not as a requested dynamic linker, then we want to make the auxiliary vector look as if the next loader is the one that was invoked as a command too. So we do a pass over the vector to fix it up. One such fixup is required in the opposite (“requested”) case too.

for (ElfW(auxv_t) *p = p_auxv; p->a_type; ++p)
        switch (p->a_type)
            case AT_ENTRY:
                if (we_are_the_program) p->a_un.a_val = entry_point;
            case AT_PHDR:
                if (we_are_the_program) p->a_un.a_val = phdrs_addr;
            case AT_PHENT:
                if (we_are_the_program) p->a_un.a_val = ehdr.e_phentsize;
            case AT_PHNUM:
                if (we_are_the_program) p->a_un.a_val = ehdr.e_phnum;
            case AT_BASE:
                if (!we_are_the_program) p->a_un.a_val = base_addr;
            case AT_EXECFN:
                if (we_are_the_program) p->a_un.a_val = (uintptr_t) &argv[0];

Note that in the “requested” case, since the AT_BASE vector entry is set to the loader's load address, it needs to be fixed up to that of the next loader. If we're invoked as a command, it'll be absent or 0 and can stay that way. The other cases are ones that only need fixing up if we're run as a command, as the vector needs to reflect the next loader's structure not our own (and not the program's, which it only reflects if we're run as a requested loader).

There's more bit of track-covering we can do, and in fact need to, if we're “requested” rather than invoked. One interesting quirk of what we have so far is that the next loader, although its name is /lib64/ld-linux-x86_64.so.2, will think it's called /path/to/my/ld.so, and will create a link map entry accordingly. This will confuse debuggers, e.g. because they will look at the wrong file's symbols. It took me a while to figure out where the dynamic linker gets its own name, but it's obvious in hindsight: from the .interp string in the mapped program binary! (If we were invoked, it more obviously comes from argv.) So to avoid confusion, we want to overwrite the .interp string before we call the next loader. That takes some jiggery-pokery at link time, but we're already doing some of that to be an alternative requested linker—i.e. to link with --dynamic-linker=/path/to/my/ld.so. Linking in a small additional file assembled from

#if defined(__linux__) && defined(__ELF__)
.section .note.GNU-stack,"",%progbits
.section .interp, "aw"

is enough to make the .interp writable, and can also be used to add more padding if the stock path (which we'll be writing) might be longer than our requested path.

There's another wrinkle with this. Since the default linker script will place .interp among stuff that is otherwise read-only, this extra linked file will also have the side-effect of making writable the program headers, various notes and other stuff appearing in the segment. I'm not yet decided on the best way to deal with this. We could mprotect() this segment after we're finished with it, if we're sure we're the only reason why it should be writable. We could skip the above and simply use a pair of mprotect() calls to do the write, at least if we don't need to add padding. We could try to move .interp into the RELRO (read-only after relocation) part of the executable. Or perhaps we could arrange that our ld.so has two .interp-like chunks, one for its own name and one for the delegated ld.so make them page-aligned, and do a remap rather than a write. This would waste some disk space in the binary, but avoid wasting memory in systems where many processes are loaded this way—a common trade-off in dynamic linking.

We could go further with track-covering, such as attempting to unmap ourselves as we hand over to the next loader. That would involve more sneaky malware-like tricks, and is overkill for most purposes.

So why is all this better than LD_PRELOAD? I can think of several reasons.

Why's it not so good?

The “but...” refers to a possible “big-hammer” way around the setuid problem: make our loader setuid root, albeit dropping root privileges super-early. That sounds crazy, but I believe it'd be an interesting demo for software verification. It only needs to be root just long enough to open the target binary and test for which UID to switch down to (either the invoking user or the file owner). The test could be done in a very short code path making probably just two system calls: one to open a file descriptor on the loaded file, and another to fstat() it. On many architectures it could be done before bootstrap relocation. Can we prove that code correct? Doing so is feasible if we have trustworthy formal specifications of instruction sets, system calls and (if needed) relocation. Not coincidentally, I've been involved in some research on these; there is more to do.

In a future post I'll demo a very simple syscall tracer built using this combination of chain-loading and bootstrapping instrumentation. It's basically an in-process strace. System call instrumentation is usually what people want when they use LD_PRELOAD, and are forced to settle for libc instrumentation instead.

Code is available (though definitely not out-of-the-box demoable; maybe by the next post) in my donald repository, and a bit more (mostly to be explained next time) in my liballocs (see allocsld) and libsystrap.

[/devel] permanent link contact

Thu, 23 Jul 2020

Building a simple toolchain extension, the subversive way

I recently ran into a nasty problem, arguably a bug, in GCC. I decided to write a little tool to help me detect when I'm triggering this bug. This post is about how I did that, using some simple but handy helper scripts I've been growing for a while.

The bug is to do with “constructor” functions in GNU C (not to be confused with C++ constructors!). I'll quote from my bug report, slightly edited.

I was surprised to find that my __attribute__((constructor(101))) on a function was
not giving it priority over another constructor whose attribute used a higher number.
It turns out this was owing to an additional prototype that did not mention the
function as being a constructor.

This caused a really subtle initialization bug in my code, and was tough to track
down. At the least, I would expect a warning when the information is discarded.

$ cat test.c
// comment these for correct behaviour, i.e. 1 then 2
static void do_init1(void);
static void do_init2(void);

static void do_init2(void) __attribute__((constructor(102)));
static void do_init2(void)
        printf("Init 2!\n");

static void do_init1(void) __attribute__((constructor(101)));
static void do_init1(void)
        printf("Init 1!\n");

int main(void)
        return 0;

$ cc -save-temps    test.c   -o test
$ ./test
Init 2!
Init 1!
$ grep -B2 init_array test.s
        .size   do_init2, .-do_init2
        .section        .init_array,"aw"
        .size   do_init1, .-do_init1
        .section        .init_array

The attribute's priority argument is being discarded. It should be visible in the
section name (".init_array.0101" etc.). If the extra prototypes are removed, the
programmer-intended behaviour is restored.

I can see how this would happen. Normally, constructor functions are not called explicitly, so don't need to be prototyped. In my liballocs project, however, sometimes my code can be called during very early initialization of the program. This is mostly because it preloads a wrapper for malloc(), which is called by the dynamic linker before it even gets around to running constructors. To deal with these incoming calls at such an early time, the library can be initialized on demand, earlier than the usual initialization sequence. To allow this, several constructors are also protoyped functions that can be called explicitly across module boundaries.

Although I could do a one-time pass over my code to fix the problem, this wouldn't prevent the same problem recurring in the future. Getting a diagnostic into GCC would be a long, slow process, and wouldn't help if I switched to another compiler that takes some other questionable interpretation of multiply prototyped constructors. So instead, I thought I'd write a simple and fairly standalone tool that can detect the problem during build. In short, I want to check that if a priority is given in a prototype, then that priority is reflected in the assembly code; if not, an error is raised.

My toolsub repository, although in somewhat rough-and-ready shape, provides a collection of scripts that let us interfere with the compilation and linking toolchain in useful ways. Most of the tools it provides take mechanisms provided by existing interfaces, such as the GCC driver's -wrapper option for supplying a wrapper script, but make them easier to use. It's tricky to use -wrapper because you have to parse various command lines, and some of these are compiler-internal commands like cc1 rather than the public cc, cpp and so on.

Firstly I want to do a pass over the preprocesssed C code, outputting a summary of any constructors that are declared with a priority. This priority information is what the compiler is discarding; I want to catch when this happens. To fish out these priorities, I wrote a short OCaml program using CIL. This post isn't about CIL, but you can see that tool here; it's about 80 lines of commented OCaml. (To build it, you'll need to use my branch of CIL, because it uses one API feature that I haven't yet contributed back to mainline.)

This tool calculates what I'll call a “generous” summary of the constructor priorities declared in the file—if you gave a priority on any prototype, it will always include a priority in its output. (Currently it doesn't help much with the case where multiple distinct priorities are used. There is also a bug about that.) What GCC does is the “mean” version: unless you're consistent, it will silently drop the priority; this is the problem we want to catch.

We could just report an error if we see inconsistent prototypes, which is the case that provokes the bad behaviour in GCC. However, I wanted to be a bit more “ground truth” than that: can we test what the compiler's output actually says about the constructor's priority? To do that, we need to know how the constructor priority mechanism works, and indeed about how the constructor mechanism works at all. It's roughly as follows.

For each constructor, the compiler generates assembly code to output a pointer to that function in the .init_array section. At run time, initialization code will walk this section and call each pointed-to function in order (this is done by the dynamic linker, if you're linking dynamically). To give constructors a priority, the compiler tweaks the name of this section so that it is something like .init_array.0123, for a constructor with priority 123. It is then the linker's job to sort these into order. The linker script rule that generates the final .init_array section is something like this.

  .init_array     :  { *(SORT_BY_NAME(.init_array.*)) *(.init_array) }

The real script is actually a bit messier than that, but the point is that for init-array sections named with a suffix, we link them in sorted order of their name, so lower-numbered sections will go first. Section named without a suffix come last.

To check that the compiler didn't discard the constructor priority, we need to find out what section name is used in the assembly code. That's pretty easy if we can run our tool immediately after the compilation step proper, i.e. the step that inputs preprocessed source and outputs assembly code. Doing that is pretty easy with the wrapper feature of toolsub. The following shell script fragment does the work; dumpconstr is the OCaml tool I mentioned, which gives the “generous” summary.

# Our check works by observing a single cc1 invocation...
# already-preprocessed source goes in, and assembly comes out.
# We cross-check each. We pass the cc1 preprocessed input
# to a CIL tool that dumps the with-priority constructors.
# Then we check that the assembly output contains the expected
# incantations in each case. If it doesn't, we flag an warning.
my_cc1 () {
    # drop the literal "cc1" argument
    # run the compiler
    # Now do our check... our caller has told us what infiles and outfile are
    for f in "${infiles[@]}"; do
        case "$f" in
                # use the CIL tool to find out what priorities are at source level
                constrs="$( "$(dirname "$0")"/dumpconstr "$f" )"
                # for each one, check the assembly agrees
                while read func constr prio; do
                    if  -n "$prio" ; then
                        # the symbol should be referenced in a section
                        # named ".init_array.0*%04d"
                        # where %04d is the priority, zero-padded
                        section="$( sed 's/^[[:blank:]]*\.section[[:blank:]]*/\f/g' "$outfile" | \
                            tr '\n' '\v' | tr '\f' '\n' | \
                            sed -rn "/$regex/ p" )"
                        actual_prio="$( echo "$section" | sed -nr "/${regex}.*/ {s//\2/;p}" )"
                        if ! [ ${actual_prio:-0} -eq $prio ]; then
                            echo "Error: inconsistent use of constructor attributes on function $func" 1>&2
                            exit 1 
                done <<<"$constrs"
                # we process at most one input file
            (*) # skip files that don't look like preprocessed C/C++

# delegate to the generic wrapper -- we've set WRAPPER so it won't re-source the funcs
. ${TOOLSUB}/wrapper/bin/wrapper

It works by defining a shell function, here called my_cc1 which is declared to the main wrapper logic via the CC1WRAP variable. The main wrapper calls this whenever the compiler wants to compile one file to assembly code, after preprocessing. The shell function's arguments are literally the command line that the compiler wants to run, and we run exactly that command. But immediately after that, we also run our OCaml tool on the input file (preprocessed C), then scrape the relevant section names from the output file (assembly code), and check the priorities match.

The tool works! I can drop it into the Makefile of the project I'm using it in, by adding

CFLAGS += -no-integrated-cpp -wrapper /path/to/wrapper

and hey presto, my compilation job complains that my priorities are getting dropped, forcing me to fix the bug. I'm providing the compiler a wrapper script, but most of the wrapper logic is generic stuff provided by toolsub; I only had to write the shell function shown above, and the OCaml program I mentioned, to implement my tool.

So, toolsub's core wrapper logic is pretty useful, but there are at least a couple of things I'd like to fix with it. One is that wrappers should not be dealing in terms of the cc1 command, which is a compiler-internal thing. It'd be better if the core wrapper could translate such commands into “documented” interfaces like the compiler's top-level cc command line. It already does this for preprocessing, translating from a cc1 to a cpp command line (er, mostly). For the assembler it's also handled, trivially since the driver runs this directly. It's not yet done for linking, nor for the compiler proper... the shell function above still gets given a cc1 command, albeit assisted by some other logic that has pre-extracted the input and output filenames. That logic itself remains a bit rough.

The other thing I could fix is that arguably, this wrapper logic should not be using shell scripts. They're neither fast nor pleasant. But actually... it's hard to beat them for simple things... I hate how clunky subprocesses quickly become in Python, for example. In the above example, we call out to an OCaml tool to do the analysis of C source. Scraping the assembler file using regex hackery doesn't seem so bad, since the assembler is line-oriented.

My goal with toolsub is to have it become an umbrella project for small utilities, like the wrapper being used here, that help people modify their toolchain's behaviour in ways that are easy to “drop in” to existing builds. Think “prototyping and personal use”, not production; research prototypes are my particular motivation. The scripts are there to help avoid getting bogged down in compiler-specific things, such as compiler internals or details of the command line. Another of the tools in there is cilpp which again uses CIL, this time packaged to enable CIL passes to be invoked as “preprocessor plugins” (using something like -Wp,-fplugin,/path/to/cil/pass.cmxs). Injecting these as arguments is easier than using an alternative driver like cilly.

A final tool I want to mention is cccppp, unfortunately just a proof-of-concept for now, written in the very fun five days I spent at the Recurse Center in April last year. It was motivated by the lack of anything comparable to CIL for C++. I want to prototype tools that instrument C++, but I don't want my prototypes to get caught up in the fast churn of LLVM and Clang internals. So the idea is to provide a simple, fairly stable tool that can “virtualize” C++ code by lifting all the built-in operators, like pointer dereference or array subscripting, into templates that the tool invoker controls. By default, these templates just do the normal thing. But, for example, to instrument array accesses, the invoker can supply a custom definition for the built-in array access operator (as a template specialization); I'm using this to provide bounds checking. The key idea is to work using only source-level rewrites of the C++ code... a relatively stable feature of the Clang tooling API. And it's a source-to-source pass, so as long as the tool can parse your code, you can run any compiler on what comes out. Expressing these source-to-source rewrites using Clang's API is, however, pretty nasty. It still needs more work before it can work on large programs; I'm hoping it won't need loads more.

What's the lesson here? One could still argue that I should just get this problem fixed in the compilers I care about, rather than writing hacky wrapper scripts. But I tend to think the opposite: during development, every systems codebase should be compiled via a hacky wrapper script! It's a good way to add custom sanity checks for properties that your project cares about, even if most others don't. It suits properties that are quick to check, are reasonably syntactic or local to specific code (at least to specific files or functions), justify a “fail fast” error report (e.g. because it's critical to correctness), and therefore are better expressed as a static check than in a test case. Link-time invariants are another thing that liballocs could use some checking of—though sadly I haven't implemented wrapper support for the linker yet. Some other properties, like naming conventions or checking for ABI stability, arguably fit this category, depending on how strict you want to be. Anyway, we'll see if I end up using this facility more often.

After subverting your toolchain, you may also want to subvert your runtime. I'll write more about that in future posts.

[/devel] permanent link contact

Tue, 26 May 2020

Mission and marketing in computer science degrees

At lunch with colleagues some months ago (remember those days?), I provocatively suggested that our course offering would be improved if we eliminated marketing terms from the names and specifications of our taught programmes and modules. Depending on exactly what counts as a marketing term, this might mean doing away with “cybersecurity”, “Internet of things”, “cloud computing”, “big data” (thankfully not currently used here at Kent) and ideally also “artificial intelligence” (once respectable, but no longer). To replace them, there are perfectly good time-honoured and serious phrases for the same areas of study: “computer security”, “distributed computing”, “embedded software development”, “statistical inference” and even “machine learning” (usually what “AI” means nowadays, and while faintly marketing-y, still more descriptive of that).

Although my suggestion was rather tongue-in-cheek, there is an important issue lurking not far behind it. Serious study on any technical topic, even in primarily “applied” areas like CS, is very different from product development. It's not about “learning the latest technologies”. An academic degree should not be a “workplace simulation” or an apprenticeship; it's about developing skills that are more underlying, more general and more timeless. We need to give students the message that university study is about bulding those skills. We need to give ourselves, as academics, the same message. If we don't, and instead fool ourselves that our responsibility is something about “teaching what will help the students get a job” then as an institution we're not fulfilling the very function of a research university, and as individual academics we're undermining the long-term justification for our own jobs.

The paradox is that here at the School of Computing, even our own industrial panel is reported to emphasise, consistently, that they want us to teach principles rather than specific technologies. Certainly our degrees do contain a decent amount of principled content, even if there is room for debate on what fraction of our students come away really understanding those principles. Despite that, currently I'd say we don't give the “timeless skills” message to our students very much. But we do give out the “get a job” message all the time. We do this by endorsing marketing-speak in what we offer, but also by other messages we send far more subtly, pervasively and latently, in what we teach and how we teach it. We award good marks to relatively insubstantial final-year projects whose core is to implement CRUD functionality mimicking commercial web sites or apps, provided they show due allegiance to the hallowed engineering methods we teach. We teach “software engineering” by attempting to rote-instil good practice, but backed by relatively little critical or analytical content, and taught at a stage where we are inevitably asking the students to take this wisdom on trust. (That is undergoing overhaul for next year, but it's unclear whether the new version will change this. If not, I will be partly to blame, as a co-author of the module specification.) We abolished as superfluous a module on “people and computing”, when at least its title captures what ought to be the single biggest topic in the whole discipline. We teach very little that conveys the historical, cultural or political aspects of technology. We continue to downgrade our “unpopular” theoretical content below even a basic minimum level (and I'm no theory zealot). And we let our pedagogy be compromised by the unwisdom of student “feedback”, such as, in response to claims that a module was not “up-to-date”, de-emphasising principles in favour of technologies—a move which ultimately leaves students uninformed and confused, since it necessarily blurs the distinction between what is essential and what is accident.

Of course, in all this we are pushed along by external pressures. One message that circulates loud and clear among colleagues, at staff meetings and elsewhere, is that the market matters: bums on seats are our financial bottom line. This follows through: we find ourselves specifying new modules according to our perceptions of marketability. Let me say that again: we're guided not by actual marketability, but our perceptions about marketability. Maybe my colleagues are more informed than I am, but I don't think many would disagree that we have limited basis for these perceptions. We allow our fuzzy mental caricatures of “the job market”, “industry”, “applicants” and hence “what will sell” to determine what and how we teach.

These fuzzy beliefs are why, for example, I have to respond periodically to the question of “why teach systems, when hardly any of then will get a job writing operating systems?”. Meanwhile, teaching “AI” left, right and centre is not questioned at all. If we really believed in “principles” and degree skills as “long-term” we would be more interested in breadth and balance. Instead we have collectively imbibed the message “AI is big; AI is the future” and regurgitated it into our curriculum development. I've nothing against the longstanding field of artificial intelligence, and I fully expect us to teach from it. But this eagerness to exploit its current trendiness is a symptom of something far more toxic than pragmatic compromise. Putting trend-chasing at the heart of our endeavour, not just on the surface of our marketing materials, undermines the social purpose of a research-intensive university—supposedly an institution for the long-term.

The prevailing line is that this is a financial necessity. That might even be the case. But to me, as a relative newcomer, that seems to be an assumption rather than an established fact. Nominally, the selection of our curriculum still rests with us, the academics, as it should. But the current pressure to be “market-driven” reduces us to headless chickens: base reflexes fill in for a lack of informed thought. Having absorbed a belief that an unapologetically “long-term” degree course is “unmarketable”, we knee-jerkingly propose to do what we perceive to be marketable. Where's the basis for believing any of this?

In my more optimistic moments I believe that for the good of everyone—our students, our academics, and our fortunes as a credible research university—there are things we can and should do to change this. Although I'm not a marketing expert, it seems that what I propose is not about ignoring “the market”, but rather about more seriously “doing marketing”, as distinct from mere publicity. One message we could plausibly try to get across is something like the following. Yes, we will teach you all about hot topics like AI, cybersecurity or whatever else the current trends favour. But that's not all we'll do, and we're offering an education that will last. What we give you is substantial, durable and transferrable.

Trying harder for an approach that emphasises the transferrable over the short-term vocational would be a huge challenge. It is the exact opposite of following the line of least resistance. It would take creativity, skill and effort. Still, it's what we academics mostly believe to be the right thing. I've no reason to believe these efforts would not be forthcoming. What is really stopping us?

The real problem seems to be risk. Proposing any change in this direction brings worried-sounding replies. Trying such a revamp would be a risk—how else would we discover what is marketable than by trying something different? If our intake takes a hit as a result of such an experiment (which would necessarily take several years to run), our political and financial position within the university would suffer. This is why having a university sector that is market-driven and financially straitened is so toxic. Doing the very job your institution was set up for has been turned into a radical feat of “innovation” whose risk is too great to take on. The very purpose of the institution has been redefined as impossible.

Nevertheless, I believe we can do more than we are doing. More cleanly separating our taught programmes' specifications (what we set for ourselves) from their marketing pitches (for applicants) might be one small, simple but possibly significant measure. It would help us make a mental distinction between our mission and our marketing strategy. One way to re-spin that mission might be explicitly to contrast the flawed presumption of “preparing students for industry”; with a more forward-thinking “preparing students to prepare a better industry”. This matters especially in our field because much of the software industry is, regrettably, built on froth and fads. If we want the software industry to be better, we need to turn out students who know how to look for the substance underneath the froth; who have a questioning mindset; who aspire to be more than just hired guns (to borrow the words of Christopher Alexander). That's a tall order, and perhaps many of our students aren't especially inclined to ask these questions... but if universities don't make it clear that substance exists beneath the froth, what hope do we have?

We also need to remember that boom easily turns to bust. “Computer science” is currently attractive to students because software is perceived as a boom industry. In this sense, courting the market seems to work in our favour. But again, this is happening because our applicants think “job” when they should think “career”. We do nothing to correct this. “Cybersecurity? Artificial intelligence? Internet of things? Walk this way.” Our university is collapsing financially and compulsory redundancies are on the horizon in some subjects. But at the School of Computing, complacency rules because “we're doing well” and “this doesn't affect us [much, directly]”. Maybe not—this time. But one only has to go back to around 2008–09 we there was last a threat of compulsory redundancies among computer science academics here at the University of Kent. Something called a “credit crunch” was happening around then. Only a few years years previously there had been something called the “dot-com bubble” which burst in 2000–01, seeing many institutions' CS applications take a sudden dip, one which did not recover for about ten years.

These market lurches did nothing to diminish the importance of what we teach. Yet somehow they had a keen effect on the capacity of our institutions to support that teaching. That is a sign of the unquestionably appalling stewardship of our institutions in recent times, and sadly, in the ten years since, the grip of marketisation has only got stronger. But we cannot entirely blame external factors for our choice to emphasise froth, nor can we rely on what for us are “favourable market conditions” remaining so—and this ties back to the same problem. When an industry is based on froth, spooked investors are all it takes to bring the finances crashing down. The same goes for a computer science department and spooked applicants. What goes around comes around. If we believe that what we teach does not suddenly become irrelevant the moment “the market” takes its next lurch, we owe it not just to ourselves, or our colleagues, but to society at large, to doubt the wisdom of “the market” in deciding what passes for an education. Can't we at least try to make our case, rather than chasing what we think the consumers think they want?

[/teaching] permanent link contact

Wed, 11 Mar 2020

Fund institutions, not projects

[This post follows a previous post discussing changes to UK government research funding, which was itself a follow-up to my earlier “Postdoc myths” piece.]

In my last post I finished by mentioning Alan Kay's favoured dictum that we should “fund people not projects”, and that this has reached the attention of Dominic Cummings in his plans to create an ARPA-like research agency for the UK. In fact the dictum is itself borrowed from J.C.R. Licklider, an early and influential divisional head at ARPA, widely credited as a progenitor of the Internet. I also noted that the point of this dictum is easily misunderstood. Here I'll discuss how this is so, and what I think is a better way to capture its intention in the case of government-funded university-based research. In short: “fund institutions, not projects”.

Consider this well-meaning article which claims to be advocating such an approach (using the dictum in the title of the piece!). It totally misunderstands the idea. It seems to think it's about the question of which ‘faculty members’ should receive the ‘grant money’. Not coincidentally, the post's ideas are feeble—stuck in the paradigm of seeking relatively centralised ways to assess the merits of individuals. Tweaking these criteria is not what Kay or Licklider were talking about. Rather, they were critiquing the very notion of project grants, and consequently the very idea of nominated “leaders” following pre-approved programmes of work directing a “team” underneath. “Funding people” does not mean “funding the empire of professor X”, via grants naming X as PI! Even the article's mooted “fund everybody” assumes a fixed prior notion of who is eligible—“faculty” in American lingo. This inherently fails to address the postdoc issues I discussed in my previous posts. The very notion of “postdoc on a project” is antithetical to Kay's (or Lick's) suggestion. For them it is simply the wrong basis on which to pay people to do research (and remember that a postdoc is by definition an employed, qualified researcher—not a trainee).

My re-statement of the idea, focused on universities rather than (as Kay tends to) industrial labs, is that we should fund institutions, not projects. In other words, devolve the decision-making: universities can hire people “on merit” as they usually aspire to doing, but without a preordained project in mind. This answers a common rejoinder, of “who decides who ‘gets funded’?”. The short answer is: institutions do. They are used to making such decisions: how do you decide which postdoc to hire, or which lecturer [a.k.a. Assistant Professor]? Even in the postdoc case, we like to think that research merit is a major factor. So our answer remains mostly the same, but project-specific criteria are explicitly removed from hiring, and project-specific direction is explicitly removed from the job that even a relatively junior (but post-PhD) person is hired to do. “Fit to the institution” is still a valid criterion of course. Let the institutions attract people who want to make a career there. If the ongoing projects are any good, they'll contribute to them; otherwise, or additionally, they'll come up with their own, and more generally contribute to the “problem-finding”, whose importance Kay also often speaks of. Problem-finding is ruled out if you employ people on preordained problems.

This brings me to my next point: it is far better to spread funds out among institutions, and devolve selection, than to run relatively centralised selection exercises like fellowship schemes. The “fund people” line often encounters an attempted rejoinder, amounting to “fellowships exist”. Some people ask “isn't that funding people? And we already ‘do it’, so maybe we just need to publicise fellowships more?”. That is privilege talking. Of course, research council-funded fellowships do exist, and yes, they are “funding people”. But they are the exception not the norm, and are set up to be so. They are the “prestige case”, and are highly competitive. (And they are, anyway, awarded on the basis of a project proposal!) The vast majority of money paying for the employment of early-career researchers is not funding them on a fellowship basis; it's on someone else's grant, meaning a project someone else proposed. The extreme competition for fellowships—a phenomenon caused by policy, not nature, as I covered in previous posts—means only fellowship applications that are “fully baked” (to borrow the words of Martin Sadler from the aforementioned NCSC RIs' conference) have a chance of being funded. Only those applicants who have received substantial patronage and/or prior funding are likely to have the resources to produce a fellowship proposal that both is and appears fully baked, and get it through the narrow review funnel. The effect is inherently conservative, and again antithetical to the idea that “funding people” is how research at large is carried out.

(More generally, people are often oblivious to their privilege. The people who speak most loudly in favour of fellowships tend to be the people who've received them. That's good for them, and very often these people are great at what they do. But as is often the case with privilege, many are slow to recognise how structural factors have acted in their favour. Sadler's point was that inevitably, polish and patronage become decisive elements in many cases. The way the money is split conspires to ensure that however good the pool of eligible researchers, only a slim fraction will be funded in this manner.)

A slightly more subtle phenomenon is that under a system of funding institutions, many more people will “get funded” in their own right since it inherently involves spreading the money out more widely, building a much wider and flatter structure rather than a “fat pyramid”. (That is rather assuming institutions don't find new, internal ways to subjugate people to projects; but I don't believe our universities have yet become so unenlightened that they would do so.) The goal is not to fund “a team under lead researcher X”; it's to fund more potentially-lead researchers and fewer subordinate ones. I say “potential” because the choice of whether to lead or become a non-leading collaborative partner rests with the researcher.

Fellowships' extreme selection practices, like long proposals, postal review and panels, are far less useful in such a context. Similarly, once institutions are free to hire people as they usually do, by job application—but with more such jobs!—we eliminate a certain fraction of the (hugely effortful) grant applications made by academics, since more work will be achievable with the institution's (increased) block funding. There is nothing infeasible about this; it is exactly the way UK university research funding worked until the 1970s. The total number of research-active roles may well work out about the same; that's an orthogonal issue, in that supposing we hold the budget fixed, the pay distribution could stay exactly the same or could change, as could the salary distribution. Even if the staffing level goes down (i.e. average pay goes up!), I'm confident that the effective research capacity would be much greater, since any shrinkage would be offset by eliminated costs: grant application effort, but also the wastage induced by postdoc-style person/project “compromises”, projectwise fragmentation personnel churn and personal upheaval (“move to another city”) that I've written about previously.

Note also that funding people and institutions in this way does not mean “make everybody permanent”. That misunderstanding arises from the same myth I wrote about earlier: the opposite of “postdoc” really is not “permanent position”. It's potentially fine for early-career research appointments to be fixed-term—if the term is long enough and if the process for renewal or progression is sufficiently lightweight (i.e. definitely not “9 months' funding left; start applying for fellowships!”). Five years seems a sensible minimum for undertaking serious work while living an episode of one's life... and not coincidentally, is what established early-career researchers used to be offered in Cambridge. Going further, in fact, there is an argument that late-career appointments in research roles should also remain conditional on actually being research-productive. An oft-noted flexibility in the current system is that institutions can move academics “sideways”, into teaching and/or admin, when they're no longer research-productive. Increasing institution-centric funding would not diminish that option; it can only increase it, since greater funds would be pooled at institution level.

One more objection that might arise is: are institutions wise enough to spend this money well? My answer is “yes, for now” and again it's because the decision-making is inevitably devolved from the centre. Although many of our universities are run appallingly badly by central administration, at the departmental level academic merit often is still recognised and does still count for something. Of course our means of assessing this are not perfect, and I get frustrated when colleagues resort to “counting papers” rather than weighing contributions. Patronage is sometimes a factor too. But at least in my limited experience, most colleagues still look for mostly the right things.

Finally, it's interesting that Cummings takes inspiration from high-profile “breakthroughs” such as the moon landings, the Internet, and no doubt other things like human genomics. I'd like to sound a note of scepticism that much of the research we really want is going to take this form. In an age of technological plenty, it is wrong to assume that what we “should” work on, in the sense of research that will improve people's lives, takes the form of identifiable “breakthroughs”—and certainly not ones in “new areas” pre-selected by government, whether they be quantum computing, “AI”, or the next fixation of government technocrats. The disconnect between apparent technical progress and improving ordinary people's lives has long been present. (On the subject of moon landings, Gil Scott-Heron's “Whitey on the Moon” comes to mind.) But this seems destined to become even more pronounced. While in the biomedical technologies, true life-improving “breakthroughs” do seem more plausible, I still have an overarching feeling of scepticism—perhaps traceable to Ivan Illich's critique of late-C.20th medicine as primarily enabling survival in an unhealthy society. In general we can learn much from the writings of Illich, E.F. Schumacher and others who have questioned the axioms of “development” and its economics. I'm not a trained philosopher or economist, so if you know other work either in the spirit of these, or critiquing them, I'd love to hear your recommendations. In my actual area of training, I've already been developing my case that advances in software are not clearly helping humanity... but I'll save that topic for another time.

[/highered] permanent link contact

Wed, 26 Feb 2020

Postdoc follow-ups

[I'm on strike again at the moment, just as when I wrote my last higher-ed piece, to which this is a follow-up.]

My last higher-ed piece, about postdoc myths was read rather more widely than I expected. (Thanks for reading!) That has left me with a few things to clear up, and a few follow-up thoughts which I didn't get on to last time.

Firstly, let me qualify: my take on postdoccing is more than a little UK-centric, and certainly doesn't generalise in all possible directions. However, I do believe it generalises to many places outside the UK, perhaps in non-obvious ways. The most contentious question of generality (at least in the Hacker News discussion) was whether postdocs “formally exist”. I gathered that many US institutions offer roles like “Postdoctoral Scholar”, for example. But my point was more about how the regulations of institutions and of funders haven't adapted. Job titles are at best a weak indicator of this, and to see jobs advertised as “postdoctoral X” is not enough to infer that there is any recognised status of “postdoc” in the institution or the wider academy, beyond “paid lackey”. Even in the UK, we see jobs advertised, including at the University of Cambridge, with titles like “Postdoctoral Research Associate”. That doesn't mean the institution has any official position of “postdoctoral” anything; it doesn't. The word is simply added for illustration; it is formally meaningless. Such employees' academic standing has been more accurately summarised as “people who do not exist” (to borrow a phrase from Anthony Edwards's remarks on the history of such positions at Cambridge). The high-level point is that institutions' and funders' processes are not designed around present career structures—where one might spend an unbounded number of years as a qualified researcher of considerable independent potential but not holding a “full” “academic” “position”, however that might be recognisable locally. Advertised job titles are not a good guide to reality.

For the same reason, it's wrong to suppose what's happening is “higher supply leading to lower price”. I've been talking about a degradation of the offering—early-career research jobs being offered on shorter contracts with fewer rights and less institutional status—and it's appealing to suppose this degradation is the result of “universities extracting more value” from the labour pool. But that is factually wrong. Neither pay nor status is re-negotiated on the basis of changing supply. Pay scales are hard to change; university regulations are even harder. To redefine positions at lower pay or lower status is a political act; someone has to pull the trigger on it. That isn't what has happened. Equally, in those cases where we would expect upward pressure we also don't see upward changes: universities and academics often find it difficult to hire postdocs with certain skills they want, but that rarely creates any action to improve pay and status (beyond a regulation-limited amount of salary-bumping), because the relevant political change is mostly beyond the means of the academics who are hiring. A key example is that many institutions face a chronic difficulty in hiring research software engineers. As far as I know, this hasn't driven many universities to reform their regulations. Instead, they have shown a boundless capacity simply to limp along with the problem uncorrected. For the same reason, there's no reason to believe downward pressure actually has much effect in cases of oversupply.

So if it is not a case of rational decision-making by universities in the face of increased supply, what is causing the body of underpaid under-statused researchers to get larger? In the UK at least, the answer is simple: it's the government, stupid. What we've seen is that the relative occupancy of pre-existing pay and status levels has been changing. That change arises not from the dynamic between universities and the labour market, but from that between universities and government. It's not supply and demand; it's poorly chosen public policy, formulated by ministers and civil servants who (as far as I can tell) don't understand research. What does change far more easily than pay-scales and regulations is budgets—what government controls. Hence the degradation is arising indirectly, not via the labour-market mechanism but by external changes to distribution of money between streams, and hence of people among the distinct scales and regulations that those pots feed. In short: for a given level of spending, we are relatively funding more postdocs and relatively fewer “full” academic staff. Note, as I argued last time, it would be wrong to equate the latter with “permanent” positions (or even with “teaching” positions). Note also, as I'll return to, the problem is emphatically not one of “not enough money”.

Career-wise, what were once stopgap arrangements—“spend a couple of years on this contract before a ‘proper’ academic role comes around”—have, creepingly, become the norm. Longstanding regulations and arrangements for “contract research staff” are applied increasingly far beyond their originally conceived uses, to an ever-larger and more ill-fitting body of staff, and over longer durations for each individual. But from the universities' point of view this is a case of boiling frogs, not rational agents. Meanwhile, I don't believe government is doing this deliberately; they're just asleep at the wheel. In fact, they don't realise they have the wheel. The (limited) evidence I've seen, such as government's response to the damning 2002 report of the House of Commons Science & Technology Committee and more recently then-minister Chris Skidmore's confused remarks on the subject (Hansard; tweet with video), is that government imagines it has no role in this, and it's all the universities' doing. But universities' hands are largely tied by how money is delivered. Of course, top-end institutions including Cambridge are culpable for their complacency in failing to challenge government.

Those two streams are “core funding” and “project funding”, which in the UK are known as the “dual support” system. I have a draft of a working paper on this subject, which I wrote as an assignment for a module in my PGCHE. I am hoping to expand it into something publishable; comments are very welcome, but be aware it is very much a draft at present. It is, necessarily, very UK-specific. It argues that the changes have come about indirectly, as unintended consequences of well-intentioned (but misguided) policies going back at least as far as 1981 and the attempt to “protect science” from wider public spending cuts. Later changes, to do with funding capital costs (“sustainability”) and with fairness and transparency of funding allocation (“selectivity”) have exacerbated the problem. The foul icing on the horrid cake is a lurking confounding variable—how much core funding is de facto being used to match-fund project grants that are under-costed.

This latter effect is subtle, and is the aspect most in need of further research. Although the headline data is clear that the block/project split has flipped from 60:40 to 40:60 between 1978 and 2018, the reality is almost certainly more drastic than that because more of the block grant is used as “match” or “top-up” support for the increasings volume of projects that are funded at below full economic cost. My lone data point so far (detailed in the draft article) is that in Cambridge, nearly all of the block research funding is being spent on subsidising project funding, i.e. on allowing it to continue being costed below the full economic rate. That's something my future research must dig into, along with a cohort-tracking study of the pre-92 universities to separate out the effects of debinarification in the early 1990s. To make clear statements about career progression, it'll also be necessary to make corrections for rank inflation: early indications are that it's now easier to get to [full] Professor, but no easier to get to lecturer [a.k.a. Assistant Professor], with consequences for how spending is distributed. Figuring out how much this generalises beyond Cambridge is another goal; my article does include some study of Kent, but so far it's less conclusive. If anyone knows another UK pre-92 university that publishes (or makes available to researchers) good-quality data about its staffing and spending over the past decades, please let me know.

The final thing to remember is that real-terms government spending on research has gone up considerably. Therefore, it's doubly unforgivable that career structures are in such a mess. When people like Sir Leszek Borysiewicz say “we don't have money to create better positions”, they're either ignorant or lying. The scarcity is entirely artificial, created by how the increased spending has gone disproportionately on project funding. This is both directly harmful (projects in themselves are a poor basis for both research outcomes and for careers), and indirectly harmful (projects, being under-costed, soak up additional block funding).

To sound a note of optimism, there are multiple ongoing shake-ups of UK government research funding. One is the mooted creation of an ARPA-like agency. Another is the “rebalancing to the regions” which suggests a brake on various institutionwise preferential attachment effects (discussed in my previous post) that have harmed career structures under the project-dominated funding regime. Both of these shake-ups are being driven by Dominic Cummings—a dislikeable figure to put it mildly, but one whose influence may yet do good in this space. At the recent Research Institutes' Conference organised by the National Cybersecurity Centre, the panel session involved three videos, one of which featured Cummings quoting Alan Kay's dictum that we should “fund people not projects”. I think Kay is exactly right, but it's interesting how often his words are misunderstood, and unclear whether Cummings has understood them. In a later post I'll continue this discussion with some notes on how this can go wrong.

[/highered] permanent link contact

Mon, 02 Dec 2019

Postdoc myths

[I'm on strike at the moment, largely in solidarity with my more precariously employed colleagues, whether hourly-paid or fixed-term or never-endingly “at risk of redundancy”. So it seemed a good time finally to finish and publish this post. I wrote most of it during the final couple of years of my seven as a postdoc, which ended in 2018.]

Lots of things are said, written and believed about postdoctoral researchers that are simply not true. This matters because real policies, initiatives, attitudes and actions are shaped by what people believe—true or otherwise. In this post, I'll tackle a depressingly long list of such myths. (I'm trying to keep this post snappy, but the flip side is that I have left out examples in many cases. For some, I also have hard data. So let me know if you'd like more specifics on anything.)

Myth: postdocs formally exist. In almost all universities I know, formally there is no such thing as a postdoc. In research councils' view of the world, it's the same: there are no postdocs, only “Research Assistants” and “academic staff (lecturer or equivalent)”. This matters because when practice on the ground no longer matches the ontologies on paper, systems become prone to poor outcomes and to abuse.

Myth: postdocs are homogeneous. Generalisations and stereotypes abound, both in writing about postdocs and in commonly held beliefs. This is unfortunate because postdocs are a highly heterogeneous bunch. Lumping them all together encourages wrong stereotypes. When these stereotypes hold sway with funders, institutions and departments, misguided policies result.

Myth: postdocs are all aspiring academics (lecturers). Clearly, some are. But there are many skill sets required in a healthy research environment. If you agree with “required”, then it follows that there should be a career path for all of them. Although there should be, currently there isn't. Once upon a time, the University of Cambridge did embrace this idea and had a variety of offices which had real status within the university, with titles including Senior Technical Officer and Computer Officer, as well as the research-oriented Senior Assistant in Research and Assistant Director of Research. These practices are mostly forgotten, and these posts replaced with lower-status unestablished positions: on the academic side, “Research Associate” is increasingly a catch-all, while on the other, technical officers are far fewer and computer officers are no longer the “peers of academics” that they once were.

Myth: postdoctoral work is “study” or “training”. It isn't; it's actually doing the actual research. I had to grit my teeth when applying for some funding that mentioned “postdoctoral students” in its particulars. Meanwhile, at a publication venue I plan to submit to, conflict-of-interest rules mentioning “advisor/advisee” seem to think there is a “postdoc” version of that. There isn't. At any career stage, we have people we turn to for advice, and people we work with. But by definition, someone with a PhD is a qualified researcher, not a student.

Myth: postdocs are “intermediate” between graduate students and more senior positions like research fellows and academics. The phrase “postdocs and PhD students” abounds. But in a university with many postdocs, the population of Research Associates is on average older and has more research experience than the holders of many flavours of early-career research fellowship. That's not surprising when the latter positions come with time limits (e.g. years since PhD) whereas the former don't. People can be postdoccing well into their thirties, forties and sometimes beyond. The “overgrown graduate students” caricature is wrong, disrespectful and leads to wrong-headed policies. (For a game of bingo, try this New York Times article from a few years ago.) According to University of Cambridge data current on 30th November 2017, of university-payrolled Research Associates and similar, over 40% had more than three years' service in the role, and around 10% of the total had over ten years of service. These numbers are underestimates of post-PhD research experience because they exclude postdoctoral experience at other institutions, and because the “and similar” positions include some of the aforementioned research fellowships which lower the average.

Myth: postdocs are on a journey to “research independence” (but are not there yet). This line is popular with funders, who don't seem to realise that their cause and effect are backwards. “Independence” is in practice a statement of one's funding status, not one's stage of personal development. As the mix of funding, in the UK and elsewhere, has moved further and further in favour of project-based grants, and away from institutional funding, hey presto! We have fewer “independent” researchers—on paper, but not in reality. In the UK, suppressing “independent” status is also a useful tactic for gaming the REF, as long as postdocs always co-author with their PIs. (If they don't, their publications are curiously lost into the REF-ether.) Again, the “paper ontologies” are a poor caricature of reality.

Myth: the opposite of “postdoc position” is “permanent position”. This comes up time and time again, but is completely false, at least in the UK. In all institutions I know of, academics (i.e. “lecturers or equivalent”, to borrow an RCUK phrase) may be appointed on limited tenure. They remain first-class citizens for the purposes I've been describing. Yet the justification for depriving postdocs of any given right or privilege is usually “they're not permanent” (a line often pulled out on-the-hoof, rather than reflecting any real rule). In fact, many postdocs are permanent, legally speaking, thanks to the 1999 EU Directive on Fixed-Term Work. Even those who aren't have a legal right not to be treated less favourably. Sneaky tricks skirting or infringing the edges of these laws are among the many ruses used by universities to keep their research staff dangling.

Myth: postdocs are itinerant, unlikely to be at the University in a few years' time, and/or are otherwise “not committed” to the university. To the extent that this is true, it is circular: funders' policies, and institutions' interpretations of them, base themselves on the assumption that postdocs will move on, and proceed to help make that assumption true. In Cambridge earlier this year, a certain fly-sheet had the temerity to claim that Research Associates did not deserve representation because they had not shown “commitment” to the institution (and that the university was not at all complicit in the underlying funding changes that had precipitated the growth in postdoc numbers; no, certainly not). An academic need not be “committed” to an institution beyond their contractual notice period. But a postdoc who spends years at an institution that can only offer them dribs and drabs of extension is showing a very strong commitment to that institution indeed.

Myth: postdocs are provided for by their PIs, so do not need representation, recognition or autonomy. There is a widespread strange belief that a postdoc's PI will “speak for them” and reliably look out for their interests. Again, this came up in certain governance debates in Cambridge. It is obviously false; a PI is only ever a partial ally, and can just as easily be an adversary. Yet these debates threw up bogus arguments hilariously reminiscent of those opposed to female suffrage—exclaiming in outrage, “they will just vote the same way as their husband!” and in another breath, equally outraged, “they might vote a different way than their husband!”. (Yes, this was a real argument of the day.)

Myth: increase in postdoc numbers is somehow a “natural” phenomenon. It's easy to encounter the belief that some sort of bumper PhD harvest, no doubt caused by a mixture of climate change and modern agriculture, has led to there being “too many people chasing too few positions”, and that is why so many people are employed on exploitative terms. This is an appealing folk theory, but it simply does not explain what is happening. Positions are not created by nature; they are created by money, spent according to policies. Suppose there are many qualified candidates competing for a fixed number of jobs. Of course, the more candidates there are, the more competition for the jobs. But it doesn't follow that the jobs' terms will become increasingly exploitative, such as being offered in shorter term, with less pay, lower status and fewer benefits. That requires a separate decision to degrade the offering. That is exactly what's happened in early-career research, by turning the knob in favour of creating only these lesser positions. Why so? It's the same story as in the wider economy these past decades: maximising the exploitable workforce, concentrating capital (research funds) among relatively few incumbents. Anyone who tries to explain it purely as supply and demand, or even “nature”, is either ignorant or is trying to sell you something. (Perhaps they're Liz Elvidge, “Head of Postdoc Development” at Imperial College London, who has a book to sell? I haven't read it, but based on her NPM 2017 performance, I assume it's peddling nonsense like this.)

Myth: postdocs are pro-postdoc. Nobody in their right mind actually wants to be a postdoc per se, with all that that implies. People mostly become postdocs because they want to do research. If there were fewer postdoc positions but overall a better path for research careers, few current postdocs would oppose it.

Myth: “nothing can be done”, or “there isn't enough money”. This is the academic version of Theresa May's “magic money tree” line, and is equal nonsense. The issue here is not the amount of money, but about how the money is spent. Policy knobs are very obviously available, but are being turned only in the wrong directions. This is a failure at the top, since that's where the knobs are. All this is outwith the control of research councils, who (despite their many gaffes) just allocate the budget they're given in the ways they know how. The blame lies with central government. In 2002, the House of Commons Science & Technology Committee produced an astonishing report which is entirely damning of the status quo and skewers the problems of short-term research positions. Government's response was a case of fingers-in-ears. Sadly the report dates itself by its naive optimism that the EU Directives I mentioned above would help; we now know that they can be bypassed. In the 17 years since, we've had no action, beyond creation of another worthless pile of paper.

Myth: postdocs just want to stay in the same job or city forever, but that's clearly unsustainable. It's particularly easy to encounter this belief in Cambridge. But the number of postdocs in Cambridge is a function of money, not of wishes. What's really unsustainable is piling all the research money into a small number ever-fatter institutions, on terms that permit only junior and short-term appointments. These institutions gain a large workforce skewed towards the relatively young and exploitable. Later these workers face a vexed choice: either be spat out to make room for the next lot of eager young things, or (if you're lucky) project-hop to another exploitative job in the same city or institution. In contrast with typical PhD-age life stages, postdocs are generally old enough to have put down roots, or to want to. Special as Cambridge is, it is nonsense to credit it with what primarily a desire for stability in one's personal life. Funnelling the bulk of research money to a select few institutions, and primarily on a project basis, is the underlying mistake.

Myth: institutions are doing what they can to support postdocs. In fact the big institutions are heavily invested in suppressing postdocs' career development. Our “leading” universities are the prime foot-draggers and non-movers in this game, and it's not hard to see why: their senior academics profit from cheap highly-skilled labour serving their research empires. Government will only change tack if academics speak to it, but those with the controlling voice have a vested interest. Of course, these already-established research agendas are not necessarily the ones most deserving of support. And even worse, project-based funding bakes in huge inefficiencies which harm outcomes.

Myth: increase in postdoc numbers is essential to creating an agile, global workforce of the future. This sort of neoliberal nonsense is popular among administrators buying the usual wrong assumptions of elasticity and fungibility of people—in short, treating people like a commodity. But on the ground, it's clear that this is a poor model of how research works. Thinly sliced short-term project-based funding not only creates poor-quality jobs, making for unhappy people, but also gives poor research outcomes. Despite the (mythical) “bumper PhD harvest”, (we) academics tend to bemoan how hard it is to find “a good postdoc” to work on their Highly Specific Project X, starting at Highly Specific Start Date D. With those constraints, that's hardly surprising. So begin the compromises. Many postdoc appointments are major compromises on both sides. Sometimes it even works out. But the failure modes are many: people don't fit the project or the PI; they become unhappy; they jump ship or jump career. Then someone new gets hired on an even shorter contract! No doubt the leaving postdoc also spent a good chunk of their work time applying for other stuff. As a result of this churn, much time is lost and much research goes unfinished or unwritten-up; this is “academic backlog” mentioned at length by Dorothy Bishop here and in this talk. Many small grants also push absurdly detailed administrative work onto PIs. All in all, it's an insane way to spend our research budgets.

Given all these problems, what should we be doing? I believe we need a substantial transfer back to core funding and away from research councils. In the UK, a little-known story of the last 40 years has been a series of transfers from core funding to project grants. Where forty years ago it was 60:40 in core funding's favour, now the balance is reversed, and the “match funding” demands of 80% FEC makes the effective core funding level far lower. My investigations suggest that this has not been driven by policy (with one notable exception), so much as it has occured through accidental drift (over roughly three further distinct periods). To fully reverse this, and provide true core funding, we must eliminate the de facto use of core funds as match funding for projects. Projects must be costed at 100% FEC, even if that reduces the fundable volume. In any case, that fundable volume should be reduced! The balance must be made up by increased core funds that institutions can use to hire and retain researchers on better terms, not simply to top up their projects. I'll write more about these issues in a future post.

I fear another “accident” is brewing. Among UK academics disgruntled by REF, it's increasingly popular to say that we should just allocate the whole budget to research councils. No matter how flawed the REF, wholesale transfer to research councils would be a terrible mistake. REF has problems, but the dominance of project grants creates far more problems. It is the over-use of projects, with their thin slicing, and not limited overall spending, that has weakened career structures. In any case, academic time spent writing and reviewing low-success-rate grant proposals dwarfs that spent on REF. The only way to provide sensible careers is an institution-oriented and actively redistributive model making proper use of core funding. It does follow that REF or its successor must leave behind its current rhetoric on “excellence” and emphasis on scores (metrics), since these also exacerbate preferential attachment. Instead it must actively spread out core research funds according to a broad assessment of an institution's capacity for quality research, including for potential and planned growth. Research funding should be a question of steady nurture and wise investment, not a laissez-faire market for powerful rentiers to battle over. The UK is fortunate, in that there is (for now) no shortage of research talent wanting to work here. It will only flourish if we can provide not just jobs, but careers.

[/highered] permanent link contact

Fri, 23 Aug 2019

Research travel, climate change, and why we must educate our institutions

Like many academics, I travel quite a lot. I'd rather do so in a way that is environmentally sustainable. Most conferences I travel to are sponsored by ACM SIGPLAN, which in recent years has developed a ad-hoc committee on climate issues.

It's great that SIGPLAN has taken this issue seriously. I do agree that carbon offsetting, SIGPLAN's current recommended action, can be worth doing. The following is my greatly simplified summary of the committee's argument for recommending this action. (For the actual argument, which is far more carefully argued, see the preliminary report.)

  1. Research benefits from high-quality communication, which entails travel, e.g. to conferences. [This is clearly true, for now at least.]
  2. Travel “very often involves flying”. [My quibble in this post: let's not assume this is a variable over which we have no control. In practice, many of us have some choice at least some of the time.]
  3. Flying is dependent on burning fossil fuels, therefore on emitting carbon.
  4. Therefore, as a short-term action it is feasible only to offset carbon emissions, not to reduce them.

To me this comes across as a North American perspective, and overlooks the transport situation in Europe. Within Europe it is largely possible to avoid flying. Here are some conference trips I've made by train (and sometimes ferry) in the past few years. In each case I was starting in south-east England, either from Cambridge or Canterbury. For clarity: many of these were not SIGPLAN-sponsored events, but I don't think that detracts from my point.

Hopefully the above shows that a lot of travel within Europe can be done without flying. So what stops people? My best guesses are the following.

The first step to changing this is to recognise that we live in a “flying first” culture. The power of unfamiliarity and inertia, both individual and institutional, is considerable.

The second step is to challenge it. This means asking for our institutions' support, and questioning policies that appear unsupportive. My previous institution's travel guidance sanctions Eurostar trains only for journeys “to France”, apparently not aware that they also run direct to Brussels. No doubt minds would be blown if I were to describe how by changing trains, you can reach even more countries. I wrote to the page's maintainers in March 2018, but despite a friendly-seeming response, there has been no change as of August 2019. Perhaps I'm optimistic, but I believe that institutions will learn we keep telling them. Flying in Europe mostly isn't necessary. The last time I chose to fly within Europe was in September 2012. Since then, I've done it exactly five further times (one way), Aside from the case of industrial action noted above, these have been where the choice was effectively made by some institution or other: if I'd taken the train I likely wouldn't have been reimbursed.

The third step is to accept taking a short-term hit in time and money. Even if our time overheads are somewhat increased in real terms, we should see sustainable choices as an intrinsic good, and choose them to the extent we personally can (which will vary; it's easier if you have your own travel budget). Not only do academics have a responsibility to society, but also we are unusual in the extent to which we can not only “afford” additional travel time, but perhaps even benefit—given extra time spent on trains or ferries, we might even get more work done! I admit that the above record of relatively little flying has only been achievable because I've sometimes ponied up some extra cash myself—but this shouldn't be necessary.

A fourth step is more political, but is necessary to make that hit “short term”: demand progress! Solidarity around sustainable transport is sorely needed. Even in Europe, loss of political favour has been causing the offering to go backwards. Cross-border train and ferry services are overall becoming more sparse—especially night trains, as I noted above. Sometimes, faster daytime trains are a suitable replacement, but often this is not the case. “Voting with your feet” is the most basic and important form of this, but other things, like supporting campaign groups or writing to your representatives, can make a difference. It's the sort of thing most of us could do more of—myself definitely included. The reasons for the cutbacks I mentioned are complex, but are in large part regulatory and therefore political. (Some useful articles about the DB cutbacks are here and here. Despite this, there are reasons to be cheerful: see here and here, where there is a very nice map.)

Coming back to SIGPLAN: I think there is room to develop practices that encourage SIGPLAN members to use more sustainable travel options where they exist. A positive framing is necessary, which will take some careful thought. I believe a financial incentive is also necessary—but straight-up differentiated registration fees might be hard to administer, and perhaps unpopular.

As one possible positive framing, perhaps SIGPLAN could develop a pool of funds, analogous to PAC, but used to pay top-up contributions for those who choose to travel sustainably and cannot otherwise fund the difference. One way this might work is that after a conference, those who submit evidence that their travel was sustainable would a earn a small rebate on their registration fee. This would be offered partly as a token redeemable against future registrations (benefitting the institution paying for the travel) and partly as a deposit into the pooled contribution pot I mentioned. For the latter, the individual or institution receives no money but gets some kind of “points certificate” (to help gamify things). I note the mooted ACM-directed carbon pricing recently proposed in a SIGPLAN blog post; which would also generate some pooled funds. I'm not yet sure whether it should be the same pool; perhaps not.

In October I will be travelling to SPLASH in Athens. From the UK this is one of the more challenging European destinations for surface travel. But it is certainly feasible and I'll certainly be doing it! The core of the journey is an overnight train from Paris to Milan, a train to Bari or perhaps Brindisi, and a ferry (again overnight) from there to Patras. Let me know if you're interested in joining me or learning more.

[/research] permanent link contact

Wed, 15 May 2019

Research careers in UK Universities: questions few are asking, part one

(Those who follow me on Twitter may have a sense of déjà vu about this, but I thought it worth elevating to blog level. I wrote most of it over a year ago... must get better at timely blogging.)

Back in September 2017 I attended the UK's inaugural National Postdoc Meeting organised by the Postdocs of Cambridge (PdOC) Society. We were fortunate to receive a flying visit from Borys, a.k.a. Professor Leszek Borysiewicz, at that time the Vice-Chancellor of the University of Cambridge. This was fortunate in that it is always illuminating to hear the thoughts of those in a position of academic leadership. It was also unfortunate, in that what he had to say showed a distinct lack of leadership. Alarm bells sounded in what he piped up from the floor during the questions in the previous session, which was about career development opportunities. His contribution struck me an astonishingly ill-considered; I would later tweet-summarise his most clueless moments, of which the first two were during this question session. Freed from character limits, here the wording is elaborated slightly.

  1. “Postdocs shouldn't be allowed to hold grants because then the institution must commit to employ them for the duration; that's too much commitment.” (tweet)
  2. “This cannot be solved by making grants more portable. This would do harm by removing incentives from smaller Universities—whose grants could be poached by competitors.” (tweet)

What was most alarming was the self-assurance with which he conveyed these completely bogus arguments. He continued the theme during his own slot, with another clanger.

  1. “We can't avoid putting research staff on short-term contracts, because we lack the funds to create more academic positions.”
  2. (tweet)

It's difficult to overstate how wrong all this is, and doubly so as an outlook for a supposed “leader”. My responses, again more-or-less as tweeted, were as follows.

To (1) (tweet): why wouldn't institutions be prepared to commit to that employment, if the grant-winning postdoc brings in their own salary, overheads, and maybe others' salaries too? (I'm aware of the alleged <100% FEC issue; is that the reason?)

To (2) (tweet), this naturally equilibriates: grant success makes institution more attractive; some do stay, so expect net gain in research power. The institutional incentive is still very much there.

Point (3) is (tweet) the classic canard. “We can't create more academic jobs” is neither here nor there. Improving researchers' lot is not about funding more positions; it's about how a fixed research budget is spent. Current research politics say, implicitly, that we should choose to employ lots, cheaply and precariously. This maximises the raw number of research staff, but keeps turnover high and skews towards the young and inexperienced. Is this really optimal? It seems hard to believe. But it is quite well explained by the incentives faced by established PIs.

What can we do about all this? The first step is clearly to challenge these bogus arguments. Sadly, the world of higher education is full of oft-repeated lines that talk about policy as if it were a law of nature. The “FEC issue” my tweet alluded to is one such line: that “universities lose money on research, since grants bring in less than the full economic cost”. Although of course grants do not always pay full FEC as costed, it is a nonsense to say we “lose money”. Universities are funded from the public purse precisely in order to host research activities (among others). So the worst that can be said is that each research undertaking consumes some of that block funding, and must be fairly accounted internally. To say “consuming funding” is fine, but to frame it as “losing money” is overlooking the very mission of a university. There is a some argument that the FEC issue is ultimately a construction of accounting; one that (I cynically surmise) is convenient to policymakers and administrators because it keeps academics subordinate.

(I have witnessed FEC arguments deployed by administrations to create the impression “you should be grateful” or “we're doing you a favour”, then used as a devious pretext for cutting the support offered to researchers—increasing the pressure to raise even more funds. That specific issue was around the costs of “training and personal development” which, it had been argued, should be costed on grants now that ring-fenced Roberts money for training courses and the like was no longer available. Of course, such courses had been offered before Roberts money existed, and in any case would merit consumption of general funds since they obviously support the mission of the University. Even worse, creating pressure to raise external funds for such things is hiding a giant double standard: that administrative functions of the university rarely face equal challenges to justify their own costs. What grants are the administration applying for, to pay for their own staff's training and development courses? That's obviously an absurd idea, but my point is that value-for-money on overheads is already highly dubious. FEC arguments allow the administration to vent budgetary pressure back out onto academics—“raise more funds”—instead of creating internal pressures. Given the patchy quality of “training courses” and “career development” sessions I've attended, such pressure could hardly fail to be constructive. But instead, we're expected to raise funds for yet more of the same.)

Let's get back to Borys. There is a general pattern that those seeking to protect the status quo—whether owing to vested interests or plain unimaginative conservatism—often deploy “fear, uncertainty and doubt” tactics. They make a vague argument as to why an alternative “would never work”. I have always found this mindset particularly jarring when encountered among researchers, whose very job is exploring unproven ideas. But it is exactly the tactic Borys was deploying in his second point. To me it appears completely implausible that grant portability would take away incentives from institutions. Poaching is already rife (thanks to the REF, which remains unreformed on this point), But even the biggest institutions are not indefinitely elastic. Maybe indeed a certain fraction of grantholders would be poached, but that is second-order effect and is likely far outweighed by the first-order benefits of increased research income. Meanwhile, it's true that growing inequality among institutions is a problem, but measures to help postdocs receive grants would work to lessen this, not worsen it. That's because current grant-awarding policies contribute massively to the “rich get richer” phenomenon, owing partly to the weight placed on track record. Spreading grant money out further down the career ladder necessarily means putting greater weight on other factors (or perhaps greater intentional use of randomness) which will favour smaller institutions. All this is also presuming a lot about the overall system of project-based grants, which, as I'll note, is far from untouchable.

Borys painted the issue as one of funding more academic positions. That is not the issue at all. The real issue is this: how should we spend the amount we currently spend? It's rhetorically convenient to take Borys's approach, painting this issue as “postdocs making demands”—for more money, or more academic jobs, or the moon on a stick. Then it can easily be dismissed. Most of our “leaders”, like Borys, are invested in the status quo. This isn't a tirade against Borys: as Vice-Chancellors go I think he was not too bad. But even a “good” V-C was happy either knowingly to advance a bogus argument to protect that status quo, or more charitably, to attend a policy-oriented meeting of postdocs without engaging his brain sufficiently on the issues of what better policies might be.

It's not just Borys. Lack of leadership is a sector-wide problem. At the same National Postdoc Meeting, one panel included a Liz Elvidge, apparently “Head of Postdoc Development” at Imperial College London. She claimed that the dire situation is “the nature of the beast”. But it is not nature! It is nurture. It is a consequence of policy; the policies could easily be different. Of course, policies won't change if the people with influence hold opinions like these. It is a perversity typical of the present state of affairs that a reputable institution would create a “postdoc development” role whose de-facto job is to further entrench a system that is actively hostile to such development.

(The notion “professional development” is routinely used as a fig leaf. Institutions love to proclaim their commitment to professional development, to cover for their inaction on the core policy issues. We saw this yet again recently, in science minister Chris Skidmore's speech about “Securing the research talent of tomorrow” It begins to acknowledge the structural problem, but rather than facing up to it pivots straight to patching symptoms—mental health and wellbeing, “career development” initiatives—and generating mountains of worthless paper, namely the Concordat. The Concordat is full of worthy-sounding requests, but it is doomed to be ineffective as long as there is no change to the funding regime. There's no recognition from Skidmore that how the money is spent is not only the cause of the problem, but is something government directly controls.)

Today's incumbents—the ageing distinguished professors who now have the strongest academic voice in shaping future policy—mostly grew up under a different system. Where was the mandate for changing it? I'm still looking—this is a topic I'm investigating as part of my part-time PGCHE studies here at Kent (yes, I'm a student again)—but I suspect the story is one of creeping change without any meaningful mandate. Our funding landscape today is dominated by project grants, rather than personal or institutional grants. But it need not be so, and was far less so when our current senior citizens were starting out. For example, although it is rarely remarked, in Cambridge, until the 1980s it was common for academics to start their career on five-year appointments as “Senior Assistant in Research” or “Assistant Director of Research”; or as “Assistant Lecturer”. These posts have disappeared, for reasons owing directly and indirectly to funding policy changes. I will leave a full exposition of that to a future post in this series, but this 2004 report published by HEPI is a good starting reference. The story in brief is that the split of public research funding between Funding Councils (core funding) and Research Councils (project grants) has shifted substantially over the last forty years—but not on the basis of any clear mandate or strategic recommendation that I've yet managed to find. I'll report back with more detailed findings in a future post.

[/highered] permanent link contact

Fri, 11 Jan 2019

Mildly profane meta-advice for beginning PhD students

A while back I chanced upon yet another “advice for PhD students” article, which cajoled me into finally writing my own. I should mention that I don't really like this sort of article; as a side-effect of this cognitive dissonance, the text below will be somewhat profane.

(Just as many such articles contain unacknowledged US-centrism, so mine contains some UK-centrism. I hope you can deal with it.)

Figure out how to make yourself work effectively. If you're relatively young when you start, a surprisingly large fraction of your PhD could go by before you get there. For me, eating properly and sleeping properly are the hardest things to get right, and (of course) a lot flows from them. It might sound silly, but I was in my third year before I mostly cracked this. (Ten years later, I still struggle from time to time.)

Read. Knowing the prior work means really knowing it; not just the last ten years' worth or the high-profile stuff. To merit the PhD degree, I'd argue it's actually more important to be an expert in your field than it is to advance human knowledge. (Some institutions' regulations even reflect this. Not that that's relevant... in practice, your examiners will be guided far more by the folklore of their particular research culture than by any regulations. There's more about examination below.)

Learn how to give good talks. The easiest way is to give talks, and get frank feedback. Also, attend talks and pay attention to which ones work. (Flip side: even after learning this skill, not every talk you give will be good. Don't beat yourself up.)

Attend talks. Cultivate your breadth as well as your depth. This can be easier or more difficult, depending on the particular academic environment you're in. I was lucky on this one.

Don't let anyone tell you “how research must be done” or “what good research looks like”. There are a huge number of different styles of research, and most people know only a few. The more prescriptive someone's take, the more narrow-minded that person and the more loudly your bullshit detector should be sounding. Example: once I was advised that good work should generate “graphs that go up and to the right”. If we only had the kind of research that does generate such graphs, the world would be a poorer place, no matter how good that work. To this day, there are few graphs in my work (and those that do exist don't look like that).

Find your research community (or communities). This relates to the previous point: even people claiming to be doing similar research and/or have similar goals might actually have very different attitudes and methods. Communities have their own vibes and personalities even beyond those issues. It took me a few years, and a couple of wrong attempts, to find the community for me. I was glad when I did. No community is perfect, of course, and I still feel like an alien from time to time.

Don't let the bastards grind you down. Chances are, if your work is at all interesting, some people will want to shit on it. More generally, the research world attracts a lot of extreme personality types (including narcissists, bullies, hypercompetitive arseholes, manipulative gaslighters, etc.). Learn how to see them for what they are—even the ones who are well-regarded famous people.

Get access to good feedback. It doesn't have to be your supervisor (who, realistically, you might well have chosen based on limited information and/or without really knowing what you were doing). It doesn't have to be one person; it's probably better if it isn't. Despite good intentions all round, I found that my supervisor's feedback wasn't very useful (I have no hard feelings). But I found that through my surrounding research group (groups, in fact) I could crowdsource a lot of extremely valuable feedback.

Know how to disregard unhelpful feedback. Some people won't have anything constructive or insightful to say, but will still be happy to pick imaginary holes, make you feel small and/or make themselves feel clever. Don't let them sap your energy.

Learn how to defend against probing and/or hostile questions. In a talk situation, or perhaps a viva/defence situation, you will get these from time to time. It's rarely a fair fight; often, the hardest questions to answer are the most sloppily formulated or the ones founded on wrong assumptions that remain implicit. There are some tactics for handling these. The simplest is batting them back for clarification or re-statement. Another is offering back the question that you think was intended (and, hopefully, one that actually would have been a sane question). The more expert technique is on-the-hoof picking-apart of exactly what those wrong assumptions were. Some of my most awkward moments as a young researcher were when my lack of skill at this was exposed. So, the other point is not to sweat it when this happens.

Don't be afraid to talk to people whose work you admire—even by sending cold e-mail. Even “famous” people are mostly approachable and are happy to hear if you like their work and have something interesting to say. (And if they're not, then it's their problem not yours.) I was far too timid about this. You never know; it might lead to something. And even if not, there is value in just saying hi. I held back too much.

Socialise your work, and yourself. Get yourself invited to give talks at other universities or research labs. Talk at workshops, give demos or posters; take the opportunities. Although it's possible to overdo this—getting some time for actual work is important too!—it's easy to underdo it, especially if you're shy or not a natural self-publicist. (Sad but true: the more people know you personally, the more citations your work will attract, especially earlier on.)

When communicating with others, learn how to focus your thoughts, and tailor them to the audience. This is partly about having not just one but many “elevator pitches” (or, as we Brits say, “lift pitches”). But it's more than that. One of the things I used to get wrong when talking to people, especially by e-mail, was to let my message degenerate into a brain dump. There's always more you can say... but often, less is more. Select the thoughts or points that you're most interested/excited about, and filtered by relevance to or expected resonance in the recipient. Have faith that the other stuff will come out over time if it's fruitful. There will be a lot of implicit cues, in how you're writing or talking, hinting to the recipient that there is more where that came from.

Learn when administrative rules are not really rules. Even the best institutions will have over-narrow admin-defined ideas about how research students' journey should go. They might assign you to one research group even though your interests span several. They might refuse to fund your travel to a particular event that's considered out-of-scope. Towards the end of your PhD, they might neglect to provide extension funding, or not connect you with the help you need in applying for your next funding. In all cases, the rules often appear inflexible when you read them, but they are often fluid in practice, so be bold enough to ask the question anyway. (One example: I was told I had only a three-year studentship that could not be extended. But I asked for extension money anyway. The rules didn't allow extending me, but they did allow retroactively increasing my stipend, paying me effectively as additional back-pay; this got me an extra 4.5 months of funding.) Usually it helps to be on good terms with the relevant administrators—go and talk to them in person, be unassuming, try to understand where they're coming from and what the real constraints are. It can also help to have a supporting professorial type (again, need not be your supervisor) who will be your advocate and help you poke the administrative gears.

Know how your PhD will be examined, and do what you can to influence and optimise for that. Under the UK system, in practice it is your supervisor who will choose your examiners, but you may have some influence over that choice. Sadly, “examination-friendly research” and “good research” are overlapping but non-equal sets. This is particularly dangerous in the UK system where the examiners are all-powerful, and doubly so if your supervisor's social capital is low enough that the examiners do not see fit to moderate themselves on that account. It's pragmatic to see your thesis as a thing to be examined, rather than a thing you're writing for yourself (even though the latter may make you feel fuzzier and more motivated). This also comes back to the “many styles of research” point: if an examiner isn't used to work of your particular style, you will get a rougher ride than you deserve. Make sure the choice of your examiners is made with due consideration of this; it shouldn't simply be whoever your supervisor randomly bumped into and asked on a whim.

Be sceptical about advice. There's a lot of it about, given freely and often smugly by people who took a single path. This is generally well-meaning, but the people concerned don't necessarily know much about the other paths. Successful people often don't truly know why they were successful. I received huge amounts of bad advice during my PhD, which has made me a cynic about advice-giving and mentorship to this day. (Can you tell?)

Clearly I'm a hypocrite too, otherwise I wouldn't have written this. My so-called career path, though not without its good points, has definitely not been one to emulate. Sadly, I can't blame all of its failings on following other people's bad advice. Still, I hope this article has been different enough that it makes a useful counterpoint to advice in the usual vein.

[/research] permanent link contact

Wed, 17 Jan 2018

How to be a Scrutineer (and a better one than I managed to be)

From December 2015 until October 2017 I served on the University of Cambridge's Board of Scrutiny. This is a high-level governance body that is “the University's chief internal mechanism for ensuring transparency and accountability in all aspects of University operations”. The role of the Board is to “examine the way in which the University is run and to comment on this to the University's governing body, the Regent House”, taking its lead principally (but not necessarily exclusively) from the Annual Reports of the Council and the General Board, and from reports concerning accounts and allocations. My service on the Board was hugely valuable experience to me personally. I hope it was also, in some smaller way, of value to the University.

One of the most difficult things was not so much discharging the responsibilities as figuring out how best to do that in the first place. So I've written the following retrospective notes on how, in my opinion, one might serve on the Board effectively. They are more about reach than grasp. Doing all of them perfectly is impossible. Doing even most of them pretty well is extremely difficult. But it's what I would aspire to if I were to re-join the Board.

Serving on the Board is a combination of learning, doing good (according to the University's mission and hopefully one's own principles) and doing necessary: putting in service that others in the University will benefit from. When I joined, I was rather underestimating the significance of the latter. Doing a service to fellow Regents is the most constructive spirit in which to operate. As a non-obvious example, one of the pragmatic roles of the Board, even in a well-functioning University democracy (alas, currently lacking) is to communicate to the University community things that the administration can't say, whether for legitimate political reasons or for regrettable lack of spine (often the reality is somewhere in the middle). One example is criticising the national government. Another is admitting the University's errors in PR-sensitive arenas (e.g. the North-West Cambridge debacle).

Understand the formal structure of the University. I spent many hours studying various web pages before I began to get my head around this. The “Officers Number” issue of the Reporter, published every Lent Term (currently: this one), is perhaps overkill but useful. The governance web site has a gentler introduction. The Statutes and Ordinances are worth consulting and getting to know your way around (roughly). One of the difficult things for me has been to remember which stuff is in Statutes, which in Special Ordinances and which in Ordinances... it's easy to be scratching your head about why you can't find something you saw previously, until you realise that it's in one of the others.

Understand both the Board's formal remit (as specified in Statute A chapter VII and Ordinances Chapter 1 section 6) and (separately) its de-facto role—including differing opinions on what the latter should be. This has been controversial since the early days of the Board. Perhaps the best reading material is the Remarks at the Discussion of the Fifth Report, which showcase the whole spectrum of opinions. Gordon Johnson had previously (at the Discussion of the Fourth Report) espoused the view that the Board should not write a report, simply lead Discussions. Among his concerns was that the University's “officers... devote a tremendous amount of time and effort to meeting the Board, providing it with information, and answering its questions”. At the next year's discussion, then-Chair Michael Potter countered that claim, after which, Johnson laid into the Board's newest Report quite ferociously, reiterating his views that its approach is wrong. Richard Stibbs countered again with the view of the Board as “interested amateurs”, in both senses of the latter word. This is a view I personally find to be realistic, again in two senses. None of this is very conclusive, but it makes clear the spectrum of opinion and strength of feeling.

Understand the Board's history. Johnson was part of the Wass Syndicate on whose recommendations the Board was created. In the aforementioned Discussion, Johnson is the most forthright in this discussion; he is not necessarily correct. I personally found it very odd that he considered it the Board's responsibility to “to explain more in layman's terms the background to what was being proposed [in the Annual Reports of the Council, General Board and of Allocations from the Chest] and the reasonableness or otherwise of [their] content”. I see no reason why these reports cannot and should not be made readable to the lay-Regent within themselves, including their own background. And of course, the reasonableness of their content has very much been the subject matter of every Board's Report. In the present era, where these Reports contain more spin than ever, it becomes ever more important to highlight their omissions rather than explain their merits. The CAPSA debacle is often cited as one of the Board's major successes to date, since the Board was part responsible for instigating the Shattock and Finkelstein reports. (I must admit I have yet to understand by what mechanism the Board did this.) This Discussion of March 2003 is good too, covering post-CAPSA attempts at governance reforms which were rather mishandled.

Obviously, read the Reporter every week. Slightly less obviously, read the Council's minutes every month. Talk to people who have been through the political mill. But also, talk to the rank-and-file, such as colleagues in other departments. Talk to as many different kinds of staff and student as you can. I did not do wonderfully at this. One thing in that spirit I did, for a period, was to work from the Combination Room one morning a week; this was partly for focus, and partly in the hope I would overhear information about what was concerning the rank-and-file admin staff at the Old Schools. I'm not sure I overheard very much of interest, but I did get a sense of what I could (somewhat blandly) call the “different culture” that operates in the Old Schools and UAS worlds compared to elsewhere in the University.

Obtain minutes. Where minutes are not made available (whether publicly, only to the University, or only to the Regent House), consider finding out why not. For example, in August 2017 I noticed that the Council's Resource Management Committee appeared to have a practice of publishing its minutes online, but also hadn't archived any since 2012 in its online archive. Was this because the committee did not meet? That seemed unlikely. Had it changed its policy on publishing minutes? There was no indication. I queried this with the Registrary's Office; for four months I received no reply, until I re-sent my reply and (not coincidentally) spoke at a Discussion (not on anything related, I might add). I then received a reply saying roughly “we can't do anything about committees choosing not to publish their minutes”. But, coincidentally I'm sure, a big pile of RMC minutes did appear in the archive at around the same time. If there's a moral to this, it's that perseverance and a certain amount of pig-headedness are necessary to help ensure diligence is applied where needed.

It's not just minutes. Other widely-circulated documents can be telling about what is going on centrally. The University's Risk Register conveys a lot about current priorities, and is available online (internally only). The annual Planning Round guidance is also telling, and usually appears online too. (In general, “following the money” is illuminating.) There's probably others that I'm overlooking here.

Make the Reporter and collected minutes searchable, in a way that works for you. In my case this means saving them on my laptop and converting them to plain text. Automating this is not too hard, because the web pages have a fixed structure. (Ask me for my Makefile!)

Know how to juggle hats. When interacting with University staff to whom it might matter, be clear whether you're acting for the Board, or personally, or on behalf of some other group.

Keep at arm's length from the Council, but have your sources. The Board's main sources of information on Council business are of course the Proctors, but they often have reasons to be discreet. A lot of University business can easily pass you by unless you go out of your way to hear it.

Protect the Board's neutrality and perceived neutrality. Avoid appearances of conflicts of interest, even if there is no conflict. For example, in hindsight, it was a mistake of mine to nominate a candidate for election to Council while I was serving on the Board. Even though there is something of a a revolving door between a subset of the Board and a subset of Council, it is best to avoid being so nonchalant about this as to plant doubts in people's minds. (As an example of something less blatant: one of my nominators for election to the Board later joined Council part-way through my term. If anybody had been paying sufficient attention, this might have been perceived as a conflict of interest, but the chances are much lower. I hasten to add there never has been any conflict on that account; the question is one of appearances.)

Develop a mental model of Council's operation, and check it with actual Council members. One thing that I learnt only quite late was the complex dynamic between the Registrary's Office and Council. The Registrary's Office has huge power over the University, since it does much of the “real work” behind business that is ostensibly done by the Council. For example, as I understand it, responses to Discussion Remarks are drafted by the Registrary's Office, as are parts of the Annual Reports of the Council and General Board. The influence of that office is huge. On certain occasions there is tension between members of Council and that Office; usually the “official story” communicated to the University, such as in the Reporter, reflects the Registrary's Office's side of any such contended matter.

Understand the HE landscape beyond the institution. Initiatives in the Old Schools often have their origins in sector-wide movements, either directly provoked by government policy or indirectly emerging from ongoing cultural shifts. A good example is the “100% FEC” issue: the claim that the University loses money on research. If I had read this article in the Times Higher Ed, or these slides, I might have pressed certain people harder when this line came out of their mouths.

Understand personal dynamics, and compensate. Like any committee, not everyone on the Board will be equally engaged or equally effective at any given time (and for various good reasons, too). These need not be in proportion to the loudness of their voice or their apparent authority in the meeting room. Compensating for this is part of the Chair's job; but everyone can help.

See membership of the Board as both an individual and a collective matter. The Board will never be of one mind. Signing off on the same Report is the basic standard of consensus. Beyond that you should not expect necessarily to have the same priorities or same position as your fellow Board members. At the risk of displeasing the Chairs I have served under, who did not hold with this policy, I believe individual Board members should be free to exercise their right to see documents—as long as this is with a valid concern in mind. I don't see why any member of the Board could not be trusted to use this privilege reasonably. Seeing this privilege as only a collective Board matter seems to me to close off the opportunity for individuals to follow their noses. I must admit it is not clear to me to what extent there was a statutory intention to empower Board members as individual rather than just collectively. But at least, I perceive that individual dimension is recognised in the case of Council.

Don't be collectively timid. During my time, I felt we were too timid on certain occasions. As one example of a potential timidness that Council and/or the Registrary's Office may play to, the Board should never accept redacted documents, unless the redacted content is “irrelevant”—the latter being the only statutory grounds for withholding information from the Board without the VC's written permission).

Don't be ashamed to have issues you care about. Obviously there are some relevance criteria. Actions of the Board should be in the interests of the University, within the Board's remit, and allocate attention proportionate to issue's importance. But if you believe something is important by those criteria, it probably is. Don't be afraid to raise it even if other Board members appear uninterested. Put differently: being a diverse cross-section of the Regent House, different Board members will care about different issues; the intention of having a twelve-member board is to obtain a cross-section. Conversely, this means allowing time to other Board members' concerns even if they don't resonate with one's own experience.

Develop a habit of contributing to the Board's ongoing recruitment and succession. The Board is continually struggling to recruit. Many people seem not to care about whether their University is governed democratically and accountably. Many do care, but haven't yet realised that the Board might be a good way to act on that. Every Board member should take care to notice anyone who seems to be an Interested Sort of Individual. Find them and encourage them to serve. Note that filling the spaces is not enough; the Board will only be in good health when elections are contested. Having a choice of membership and some effective choice of officers (Chair and Secretary) are important. (I write that as a Secretary who felt somewhat roped into it, on the grounds that “nobody else will do it”. In hindsight I think it was a positive experience, but it was unwelcome at the time, and I don't think I'm being uncharitable to myself in saying I was not the ideal candidate.)

Don't believe that everybody knows what they're doing, or that you do. When I started on the Board, things were done a certain way; it had its benefits but it also had several drawbacks. In my second year on the Board we changed a lot of things, generally to the Board's benefit. I suspect several logical continuations of those changes will come about in the next year. If I'd known earlier how much flexibility there was in how the Board conducts its duties, I might have helped push some of those changes along sooner.

Doubt people, charitably. This extends to oneself, one's fellow Board members and also to the many senior figures whose work the Board scrutinizes—who are grappling with very difficult and demanding jobs. However formidable they are as individuals, they will invariably be grappling imperfectly and may even be happy to admit as much. Among its many other modes of operation, the Board has the potential to help people out, by nudging the gears into better alignment. (That is not to say that raising the red flag or kicking backsides are never appropriate modes of operation....)

Be cooperative where possible, but don't be a pushover. One of the trickiest and most controversial aspects of the Board's operation is the question of how much it should be a savage bulldog, and how much it should be a conciliatory force. At different times, both can be necessary; good taste is required. This is a difficult balancing act.

Believe in the Board's importance, so that others will. I was quite surprised at my first Board meetings to detect hints of nerves on the part of some interviewees—people who were very senior figures in the University. But also, in those early days, we had one instance of a senior figure “shutting us out” owing to accumulated bad relations. The Board's power may be limited to giving voice to issues and arguments, but that power is still considerable. There is much latent capability for doing good, but this is slippery; harnessing that capability effectively is difficult. If the Board acts and acquits itself with fitting diligence and seriousness, including a sense of realism about what options are available to Council, it can be taken seriously by others. It stops being effective when its demands are impossible, or when it seeks to direct attention where it is ill spent. In an era when both Council and ordinary Regents are ever more disempowered by central managerialism, a strong Board is essential to elevate the University's supposedly democratic business from mere theatre to the actual rule of good sense. During my time, the Board was an effective body only intermittently. Hopefully it can be made more consistently effective; the University needs it more than ever.

[/highered] permanent link contact

Tue, 09 Jan 2018

Undergraduate admissions in “computer science”: a plea

Dear Bogdan,

Thank you for this update on our undergraduate admissions testing, and very well done on what must have been a mammoth effort. Seeing the CSAT happening at the CL, I could see it did succeed in creating a friendly environment, and I was impressed by the organisation effort which had clearly been necessary to put on such a smooth operation over a fairly long period.

However, let me also sound a note of alarm.

Reading the CSAT web pages, it is (still) not even mentioned that the test is about mathematics. Rather, it is left implicit—as if it didn't even occur that it might be necessary to mention or justify this. The Frequently Asked Questions page repeatedly use phrases like “at A-level”, “at school” without even seeing fit to mention that school mathematics is what you're talking about. Your starting assumption has apparently been that mathematics skills are 100% of what we are looking for.

I believe this is completely wrong, and it really worries me that you are apparently not thinking about it at all. Computing needs a diversity of skills and viewpoints. It is a horrible mistake to assume that it is simply a question of mathematics. I've personally known many computer scientists whose interest in (natural) languages, philosophy, history, biology and other sciences has been far stronger than their interest in mathematics. Meanwhile, “systems thinking” is a large part of many areas of computer science and requires analytical skills in human domains as well as mathematical ones. Why can't we admit people based on their excellence in those areas?

I am also incredibly worried about the withdrawal of the 50% option, which will further narrow the horizons of our students. As we admit a larger cohort of students, we should be acquiring the necessary support staff to scale up our offering's diversity. We should not be perversely restricting it on the grounds that this is the only way to cope without any extra staff. Our admissions and intake are booming; we should be coming out with a better offering than ever. You might counter that few students have chosen it the 50% option this year, but that only points again to the narrowness of our selection criteria. (I realise that this decision was not yours, but am raising it because I'm sure you have more oversight of this than I do.)

Cambridge will not suffer directly from these mistakes, because we will always admit smart individuals and turn them out even smarter. But our discipline will suffer, in practice and in research. Consequently, society will suffer. I need hardly remind you of the huge power that technologists have acquired in recent years, and the cluelessness with which many wield it. This is one of the huge social problems of our time, and Universities are far from blameless in it. Technology shapes our society. Study of the historical, human, psychological, biological and other topics all provide necessary and complementary viewpoints. It is a recipe for disaster to make computer science a discipline of mathematical thinking alone.

I hope (albeit without optimism) that this will make some sense to you. I should mention that I have also posted this e-mail on my blog. Yours in desperation,


[/teaching] permanent link contact

Wed, 18 Oct 2017

Some were meant for post-hoc reflections

It was great to get so much positive reaction to my essay “Some were meant for C”, so thanks to everyone concerned. In fact, thanks to everyone who's read it, even if you didn't like it! There were lots of things the essay didn't cover, and a few which it did only on a way that perhaps wasn't very obvious. And there were a few common patterns running through various reactions. All these seemed worth addressing directly in a few paragraphs, which I'll hopefully keep brief.

What about C++? Firstly, it's true that C++ offers the same “communicativity” benefits as C. We all know C++ was designed, and continues to be designed, to support systems programming (among other flavours). So, the reason I focused on C is not that C is unique in this benefit; but simply that C is the more paradigmatic example. (Of what? Well, of systems programming languages in the “unsafe school”... see also the next paragraph.) C++, on the other hand, is more things to more people, so would require a much broader and more discursive treatment than my essay had space for. For the record, I use C++ quite a lot myself. When I use it, it is (I hope) because it is the better tool for a task. These tend to be algorithmically complex tasks, and tend not to be systems-programming ones. I wouldn't say I enjoy using C++ as much as I do using plain C, but that is largely a consequence of the tasks concerned (I hate algorithmically hard coding!) and not the languages. More generally, I think it's uncontroversial that the huge complexity of the C++ language is a source of both strengths and weaknesses.

What about Rust? If C is a paradigmatic example of oldskool languages in the “unsafe” systems programming language, then Rust is surely a paradigmatic example of a new school. Or is it? Actually, the benefits of Rust are pretty much orthogonal to everything I talked about in the essay, although I agree that this fact is non-obvious. Rust's selling point is its static checking, which I certainly agree is neat. This particularly suits critical systems programming applications that want a high degree of static assurance while avoiding run-time baggage. However, this static checking comes with a similar “must oversee the whole world” requirement as do managed languages—albeit overseeing it at compile time, rather than at run time. When using Rust for systems programming of the communicative kind that my essay talks about, one is dealing with foreign objects not defined within the Rust language; therefore one is necessarily using what I call “Unsafe Rust”, i.e. the subtly different sublanguage one gets inside unsafe blocks—at least within some “interface layer” in the code. (Of course one can build abstractions that are checkably safe assuming correctness of the unsafe interface layer.) In this respect, Unsafe Rust has the same properties as C: the dynamic safety that I talk about in the later sections of the essay would be equally useful, and equally possible, within Unsafe Rust. (That would make it Dynamically Safe Yet Not Statically Safe Rust—a catchy name indeed.)

Are you proposing a “safe subset” of C? No. The whole point is that no subsetting is required. That is a consequence of how C is specified; to assume that C “specifies” its own unsafety is a confusion, albeit a common and understandable one, between specification and implementation. If you want to get super-technical about the C language specification, you could push me to “almost no subsetting”, since there are some tiny corner details that might need to be tweaked to get the best implementability and efficiency trade-offs. One key area I'm thinking about is the rules to do with effective types, which I'd rather had a tiny bit more ceremony attached. The same goes for unions. Still, these changes would be working at the same scale as the present language standard's evolutionary process; the essence of the C language, and the deliberately underspecified style in which it is defined, are amenable to a fairly efficient, dynamically safe implementation.

So you think C is wonderful, right? Not at all! C has plenty of flaws and I'm more than happy to point them out (er, some other time). I don't think anybody out there believes C could not be substantially improved if we were to redesign it with several decades' hindsight. But my point is rather the converse: that we should recognise the non-obvious ways in which we might get these redesigns and replacements wrong, and recognise why people continue to use C despite today's candidate replacements. Among these reasons, the one I chose to expound on was communicativity, because it is among the most non-obvious (to me, of those I know). There are other reasons why C is an effective tool in ways that certain other languages aren't—one respondent helpfully pointed out that the preprocessor allows certain cases of portable code, for example, which is a very true and important observation. Overall, any attempted replacement must preserve (or ideally improve on!) all these strengths. And of course, it is worth re-emphasising a million times: this is not simply a matter of performance.

What about... (some other language)? Although I'm always interested in and (usually) sympathetic to new language designs, one of the main messages of the essay is (or was supposed to be) that we give too much attention to languages. So, asking me about languages has been a somewhat frustrating pattern in the responses I've got. It's understandable to some degree. As humans, we like to use labels to partition the world in front of us (into species, tribes, taxonomies, etc.). Languages are a readily available label, and the differences among them are very visible. But sometimes this desire to label and partition leads us astray; the point about how we confuse C with implementations of C, on which the essay labours at some length, is a key instance of this. So forgive me if asking me about this or that language makes me a bit grumpy, even if that language is really nice.

Aren't language wars a waste of precious time and energy? Yes! I've already hinted at my distaste for language wars in the previous paragraph. Still, I suspect this reaction came from people who read the essay's title but not its content.

[/research] permanent link contact

Sun, 17 Sep 2017

Project suggestions (for Part II and MPhil students)

I have several project ideas to suggest this year. Any of them I could be available to supervise, for the right student (and not too many!).

A caveat: most of these are researchy project ideas. Part II students should read the pink book carefully, and also read my advice about Part II projects from a student's perspective, including the potential dangers of getting too researchy. Nowadays as a researcher, I feel a strong incentive to suggest projects that verge on the over-challenging or overly research-led. We should discuss carefully any project idea before writing the proposal, to ensure it's within your interests and (projected) ability.

See also my rather cryptic and occasionally out-of-date list of other standing suggestions.

An observable OCaml

This is about an OCaml-to-C compiler that allows the generated code to profit from DWARF-based debugging support of C compilers, while still providing a coherent OCaml view of the debugged code. A very nice job of this project was done last year by Cheng Sun, but there is plenty of room for variation; please ask me.

A fuzz-tester over API traces

This is renamed from “DWARF-based”. I've done some more thinking about this project since the linked write-up a few years ago, so let me expand. Most fuzzers work by perturbing the data given to a particular piece of code. This project is about fuzzing library interfaces by perturbing the trace of calls. This may include fuzzing data, but is not limited to it.

A random test case generator for linkers

(This is probably more of an MPhil project than a Part II project, though it could be attempted as either.)

Tools like Csmith have found considerable use in testing C compilers. A similar tool could be useful for linkers, which continue to be buggily extended and reimplemented, e.g. in the LLVM project's LLD linker.. This project would be focused on ELF-based linking on Unix platforms.

Linking is rather simpler than compilation in some respects, but exhibits a large number of possible feature interactions. For example, how do symbol visibilities interact with weak definitions?

One challenge is to generate only link jobs that “should” work. For example, the job should usually (if linking an executable) contain non-weak undefined symbols that lack a matching defined symbol.

Another challenge is to generate code for which some test of behavioural correctness of the output program is not only possible, but also usefully exercises the linker's work in some way: we should obviously generate code that involves cross-module referencing, but it's not clear what else it should do.

A third challenge is to achieve a bias towards “interesting” intersections of features, to improve the probability of finding bugs.

Useful outcomes from a randomized test case would include crashing one or both linkers, or producing divergent behaviours between two existing linkers. Evaluation could be done in terms of coverage of the feature space, and also in bugs found (including known bugs, e.g. in older versions). A starting point would be to extract a corpus of bugs from the change history of LLD.

A testable specification of C and C++ ABI details

Compilers use documented ABI specifications, such as the System V ABI, its x86-64-specific adjunct, and the Itanium C++ ABI, to achieve compatibility at link time and run time. These fix standardised details for representing data, calling code, interfacing with the operating system and achieving link-time effects (such as uniquing of inlineable function instantiations, dynamically interposable bindings, etc..). Given a particular C or C++ source file, what linker symbols should we find in the output binary (.o file), with what attributes and satisfying what properties? What data layouts should be used for various data types? ABIs answer these questions, at least in theory.

Since ABIs are specified only informally and partially, and are not rigorously tested, divergence in these details currently causes compatibility problems between compilers, even if they are intended to implement the same ABIs. This project would involve writing a formal specification of (some of) these details, partly from ABI documentation and partly by observing the output of real compilers, then using differential testing to identify divergences and (perhaps) report bugs in real compilers. This includes such issues as how particular source language features map to particular linker-level features, across differnet code models and when built for static or dynamic linking. Some experimental work will be necessary particularly in the area of extended/non-standard C features (packed structures, bitfields, ...) and C++ (vtables, typeinfo, ...), with differential testing against gcc and LLVM toolchains.

A lightweight user-space virtualisation system using the dynamic linker

The dynamic linker (ld.so) is a critical component of commodity Unix platforms. It is used to interpret almost all binaries on the system, and contains the earliest-run user-space instructions. It is therefore a potential interface for virtualisation: a modified dynamic linker can provide unmodified application code with the illusion of an extended kernel interface and/or differing instruction semantics. As with hypervisors, this can be done with either trap-and-emulate or binary-rewriting techniques, or both. This project concerns building a usable dynamic linker which virtualises the user-space environment on a mainstream Linux or FreeBSD platform and adds one or more “added value” services or tooling features. The choice of added value is somewhat free, but a simple suggestion is a memory profiler, similar to Valgrind's massif but offering much lower run-time overhead. Another might be a filesystem extension something like flcow or fakeroot.

Either way, the project will involve implementing an extended version of some existing dynamic linker. To be useful, this should include (1) self-propagation support, so that child processes are executed under the new dynamic linker; (2) instrumentation of the necessary instructions or system calls to deliver the added value; and (3) an implementation of the added value itself. The starting point will include, should the student find it useful, a basic proof-of-concept tarball for extending the GNU C library's dynamic linker on Linux. This project will demonstrate understanding of operating systems, assembly-level programming, compilers, virtualisation, and instrumentation-based programming tools. There are lots of possible extensions, including instruction emulation, system call interposition (particularly challenging in multithreaded contexts), supporting statically linked binaries, supporting setuid binaries, adding simple instrumentation primitives (e.g. call relinking, GOT/PLT redirection), and integrating heavyweight instrumentation systems (by plumbing in a system like Valgrind, DynamoRIO or FastBT).

A testable formal specification for dynamic linking

(This one is probably more of an MPhil project.)

Dynamic linking exhibits subtle variations both across different Unix platforms (such as GNU/Linux, FreeBSD, OpenBSD etc.), and across different dynamic linkers available even within just the GNU/Linux platform (such as musl, uClibc and glibc).

This project will use a combination of documentation and experimental methods to produce a precise specification (yet “loose”, i.e. accommodating allowable variation) of what happens when a dynamic linker loads an executable and its constituent libraries.

Given an input binary, the output is a memory image in the address space containing the dynamic linker. A host of details matter to this transformation: symbol visibility, relocation semantics, program header semantics, and so on. We currently have a specification (in Lem) of some of these, which has been used as a testable specification for static linking. This project will most likely benefit by extending these, albeit in the context of a new executable specification simulating the dynamic linker rather than the (static) “batch” compile-time linker. Another useful component would be ptrace()-based tool for testing against the memory image actually produced by real dynamic linker.

Initially one would probably factor out the filesystem aspects, i.e. assume that the output of ldd is known and ignore features such as preloading and auditing that can affect this. These could then be added in later as extensions. One would probably also want to work with preexisting binaries; specifying and checking the (static) production of dynamically-linkable binaries could again be a useful optional extra, for which we have some initial work.

A machine-readable, testable system call specification usable by instrumentation-based tools

Many programming tools, including debuggers and tracers and also, especially, tools built with instrumentation frameworks such as Valgrind or DynamoRIO, require knowledge of the system call interface. Currently these use hand-maintained knowledge of syscalls which is often incomplete and/or incorrect. Pooling and automating these specifications would improve quality and completeness, reducing the costs of keeping tools up-to-date. Candidate tools/frameworks include qemu, DynamoRIO, Valgrind, gdb, strace, and our own ppcmem/rmem.

A system call's register footprint (arguments in/out, clobbers) is defined by the kernel ABI. Most system calls also have a memory footprint, which is much harder to describe. Some tooling for describing and testing Linux system call interface and a very partial specification of calls' read/written memory footprints can be supplied as a starting point.

Other useful information includes argument types (mostly done already), possible error codes and their necessary and/or sufficient preconditions, the latter perhaps initially limited to argument well-formedness but extending later to details of the existing memory map, file descriptor state, resource limits, signal dispositions, and so on. (Of course error conditions also depend on filesystem state, which is rather complex by itself, and subject to interesting race conditions; a thorough job of this is not feasible within one project, but some cases could be addressed.)

One extension of memory footprints concerns the well-definedness of the contents of memory written and/or read—for example Valgrind's Memcheck would ideally consume a specification of output (written) definedness as a function of input (read) definedness.

System calls are “downcalls”; the specification could also consider system “upcalls”, mostly meaning kernel-generated signals, the conditions on which they might occur and the properties expected to be true of the process state at those times.

This project would ideally pick a couple of tools and a couple of architectures (including AArch64) and produce a testable specification usable by both tools on both architectures as a replacement for current ad-hoc architectures, including specification of those aspects of the interface that matter to those tools. We assume the host tool/framework already has mechanisms for recognising when a system call is occurring, and what state the process memory/registers are in. We also assume the tool has a means to request system calls (typically emulated by issuing host system calls, perhaps doing a verbatim or modified version of the guest code's system call, e.g. to do guest/host address translation).

Maintaining testability of the overall spec is useful, e.g. using SystemTap or DTrace to check actual system state against what the specification says should happen. Again a proof-of-concept for memory footprints will be provided for extension.

[/research] permanent link contact

Tue, 14 Feb 2017

Custom ELF program headers—what, why and how

A couple of times recently I've found myself wanting to create ELF files with custom program headers. This post explains what that means, why you might want it, why there's a practical difficulty in doing so using a standard linker, and my slightly hacky way of solving it.

Program headers are metadata within ELF files. They describe a contiguous chunk of the file's address space. There are several different kinds of program header, of which the most critical is LOAD, signifying that the relevant part of the file should be loaded (or mapped) into memory, forming a segment within the program's memory image. Besides LOAD, other program headers are used to attach particular meanings to specific ranges within the binary. For example, the DYNAMIC program header tells the loader where the dynamic-linking metadata resides. The INTERP program header identifies to the (in-kernel) loader a string naming the program interpreter (the dynamic linker, which will actually do the loading proper). Some vendor extensions exist for adding features within the loading process, such as GNU_RELRO (which records a range that can be made read-only after relocation, bringing security benefits) or GNU_STACK (which does not identify an associated range, but exists as a flag to tell the loader whether the stack needs to be made executable). If you run readelf -l on some ELF binary, you'll see a dump of its program headers. On my machine I see something like this.

$ readelf -Wl /bin/true
Elf file type is EXEC (Executable file)
Entry point 0x401432
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
  INTERP         0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0058b4 0x0058b4 R E 0x200000
  LOAD           0x005e10 0x0000000000605e10 0x0000000000605e10 0x000404 0x0005f0 RW  0x200000
  DYNAMIC        0x005e28 0x0000000000605e28 0x0000000000605e28 0x0001d0 0x0001d0 RW  0x8
  NOTE           0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R   0x4
  GNU_EH_FRAME   0x004b4c 0x0000000000404b4c 0x0000000000404b4c 0x00023c 0x00023c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x005e10 0x0000000000605e10 0x0000000000605e10 0x0001f0 0x0001f0 R   0x1

Two reasons for wanting to add custom program headers are to add custom features to the loading process, or to make new uses of features it already provides. My first use case was an instance of the latter: to create a heap area at load time, as an additional segment, rather than the usual way of making explicit syscalls during execution. I was building ELF files for the ppcmem emulated execution environment. This doesn't implement any system calls, but does implement an ELF loader. By adding an extra LOAD program header providing an allocation arena, I could write a malloc() that does not require system calls—its heap arena is mapped at start-up by the ELF loader inside ppcmem. (Although I could have allocated space using a big char array, or by hacking the linker script to leave a big gap, a separate segment is logically cleaner, and a disjoint address range for heap allocations is helpful.)

The second use case for custom program headers comes from liballocs. When passed through the liballocs toolchain, a binary also generates extra metadata binaries which can optionally be loaded at run time to supply extra metadata similar to debugging information. Currently these files live outside the main binary, in a separate /usr/lib/allocsites hierarchy, which is a pain for deployment: if you move or copy an executable, you must move or copy its metadata as an extra step. I'd like to bundle the metadata into the binary somehow, and record its presence using a custom program header.

The way to specify the program headers is in the linker script. The problem with creating custom program headers is that a GNU-style linker's script language requires you to specify the entire set of program headers if you specify any program header at all. You can't just add one extra. Worse, the default set of headers is not even made explicit anywhere. The PHDRS command is what specifies the program headers, but the default linker script (try ld --verbose to see it) doesn't use it. Rather, it falls back on the linker's hard-coded default behaviour, which is to create all the familiar headers that we see above.

If we write our own PHDRS command we can create all the program headers we want—but we have to create them all ourselves, meaning manually creating all the ones that the linker would otherwise create for us. The set of headers created by hard-coded logic in the linker has grown over the years, to include things such as the GNU_STACK and GNU_RELRO headers I mentioned earlier. Re-creating this “standard” set of program headers is therefore quite a detailed platform-specific task, and is subject to change as new features are added; a burden we don't want to take on.

So, I came up with a way of getting the effect we want: “please create this extra program header, in addition to all the normal ones”. It is not pretty and not perfect, but it works as follows. Run the linker once, without any custom program header directives. Remember what program headers the linker generated, and what sections it assigned to them. Dump the linker script used during that link, and annotate it so that each output section is labelled with its program header assignment. Then, apply a diff to the linker script adding only the program header we care about. Finally, re-link with the diffed script.

The obvious downside is that we link twice. We're also a bit fragile to the organisation of the default linker script. Still, I'm prepared to pay these prices. I've implemented this approach with a pleasingly nasty (but small) pile of shell scripts, make rules and m4 macros. It now works pretty well for me. Here's a brief tour.

First, we have some makerules to dump the default linker script.

default-dynexe.lds: $(shell which ld)
    $(LD) --verbose 2>&1 |\
      sed -e '/^=========/,/^=========/!d;/^=========/d'\
          -e 's/\. = .* + SIZEOF_HEADERS;/& _begin = . - SIZEOF_HEADERS;/'\
      > "$@"

(Thanks to Dan Williams for the sed script, which I took from his page about creating a custom dynamic loader, which is well worth a read.)

Next, we have some makerules to turn such a script into an m4'ised macro-script. Essentially, this macroisation adds two “hooks” into the script: one to include a file describing the “normal” program headers and insert any new ones; and one to wrap each output section in a macro invocation that will decorate it with its corresponding program header names. This macroisation rewrite is easily done with regular expressions.

%-dynexe.lds.phdrify.m4: %-dynexe.lds
    cat "$<" | sed 's#SECTIONS#PHDRS\n{\n\tinclude(phdrs_dynexe.inc)\n\tinclude($(srcdir)/phdrs_extra.inc)\n}\nSECTIONS#' | \
    tr '\n' '\f' | sed -r \
    's@(\.[-a-zA-Z0-9\._]+)[[:space:]\f]+(([^\{]*[[:space:]\f]*)?\{[^\}]*\})@expand_phdr([\1], [\2])@g' | \
    tr '\f' '\n' > "$@"

Next, our phdrified linker script contains output section blocks that used to look like this

  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }

now wrapped in macro invocations, like this.

  expand_phdr([.rodata], [: { *(.rodata .rodata.* .gnu.linkonce.r.*) }])

Once we have a macroised linker script, we need to m4-process it with some macro definitions that turn it back into the script we want. We need two kinds of definition for this. The first is a list of the program headers in the file as it was originally linked. The easiest way is to use readelf to dump the existing ones. As a wart: since (on my machine at least) the linker doesn't understand the full range of symbolic (named) values for the program header's type field, we use the C preprocessor to macro-expand the symbolic names down to hex literals (as defined in elf.h), which it does understand. The second kind of definition is going to expand each output section to explicitly mention which program headers it belongs to. Note that a section can be in more than one program header. I used a bash script with associative arrays, again operating over readelf output, which handily outputs the mapping from segments to section lists. The script inverts this mapping, into a section-to-segment-list mapping. This generates a single m4 macro whose body is a giant if-else chain testing the argument section name against all the sections in the file, and yielding the list of program headers that the section is in. The easiest way is to use readelf to dump the existing ones. The result is a chunk of m4-ified text that describes the program headers in the original link, looking something like this.

phdr0 6;
phdr1 3;
phdr2 1;
phdr3 1;
phdr4 2;
phdr5 4;
phdr6 0x6474e550;
phdr7 0x6474e551;
phdr8 0x6474e552;

After all that. we then simply m4-include a second file describing the custom ones, and use m4 to expand out the program header assignments. Our output section before has now become

  .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } :phdr2

i.e. recording that the read-only data is in program header named phdr2 (the first LOAD in the dump we saw early on; this is the text segment, which also holds read-only data).

When we re-link with this script, we can add whatever extra linker inputs we want to go into our custom segment, and also apply a diff to the linker script. A simple diff looks like this, where the stuff in sections whose names begin  .insert will end up in a new segment.

--- default-dynexe.lds  2016-09-14 23:12:31.000000000 +0100
+++ insert-dynexe.lds   2016-09-14 23:13:46.000000000 +0100
@@ -187,6 +187,9 @@
   . = ALIGN(64 / 8);
   _end = .; PROVIDE (end = .);
   . = DATA_SEGMENT_END (.);
+  /* srk: new segment! */
+  .insert  : ALIGN(0x1000) { *(SORT_BY_NAME(.insert*)) } :insert
   /* Stabs debugging sections.  */
   .stab          0 : { *(.stab) }
   .stabstr       0 : { *(.stabstr) }

The .insert output section will also be aligned to a 4kB boundary, and be placed in the program header called “insert” by the script. If the input section is allocatable, it will also go into a LOAD header. By varying this patch, you can put whatever section you like in your program header, aligned however you like.

There are some warts in all this. One is that I found increasing the number of program headers would confuse the GNU linker into placing some of its implicitly generated sections, .gnu.build-id and .note.ABI-tag, at a silly offset in the file, and that would mess up the address assignment, by causing the file header no longer to be mapped at virtual address 0. To hack around this, I inserted a one-line sed command which explicitly places the .note.ABI-tag section. Another wart was that the PHDR header was no longer generated properly: it was coming out with an address of 0, not of the executable's start address (0x400000). This turned out to be because no LOAD header was including the region of the file containing the file and program headers themselves. To hack around this, we add a little more rewriting magic to our shell scripts, to put the PHDR and FILEHDR linker attributes to any program header entry with type PT_PHDR, and also to the text segment's LOAD program header (usually the first), then updating the latter's file offset and memory addresses so that the segment actually spans that part of the file.

I mentioned that I wanted to use custom program headers to embed one ELF file (a metadata object) inside another. How can we do this? Firstly, we embed the metadata ELF file into the enclosing ELF file and mark it with a custom program header. Secondly, at run time, when we want to load the metadata, we locate the program header and get a file descriptor pointing at the embdded content. Then we need a dynamic linker that can dlopen() from a file descriptor; we can hack the existing GNU dynamic linker for this fairly easily (not shown, but I've done it; ask me). Finally though, we need to handle the fact that file offsets in the embedded content will not match file offsets on this file descriptor! So we need to translate all the ELF file offsets in the embedded file by a fixed amount. File offsets live in the ELF header, program headers and section headers. (They might also live in other sections, but I'm not aware of any.) I wrote a simple C program to add a fixed offset to these. There's also some guesswork needed to figure out what that offset should be, since the linker will be in control of the in-file placement of the extra output section. This could be made reliable using one more link pass (bringing the total to three).

There's a GitHub repository containing these recipes and examples, here. It's all a bit GNU-specific and underdocumented for now.

All this effort to achieve a fairly simple task: package one ELF object inside another so that the “inner” object is loadable when running the “outer” object. What's another way of looking at this problem? We can think of ELF files as describing an initial state of the program (or, in the case of a library, a fragment of state). What we'd like them to be able to do is specify that this state includes a given file of a particular name and contents. In other words, we'd like them to describe not only aspects of the memory state, but also aspects of the filesystem state. Linkers already do something a bit this, namely static linking: objects that might not be on the destination filesystem can be bundled into the file, albeit losing their identity as files. Thinking of binaries as “partial initial program states” in this way, there's a clear relationship between ELF executables, Docker images and other collections of memory(-mappable) objects.

Our “embedded ELF file” requirement would be trivial if segments were nameable as files in the first place. This is of course a design feature of Multics, as was dynamic linking. Although the latter has now been picked up by Unix, files and segments have still not been fully re-unified. Maybe it's time? Dynamic linking is one of two key mechanisms, along with filesystems in userspace (whether this or this) that allow us to prototype extensions to Unix without pushing our changes into the kernel. There are usually performance, security or correctness reasons why such extensions are better off ultimately residing at least partly in the kernel. But making a playground entirely in user space is valuable for research and experimentation, and is something I'm working up to. The “embed part of a filesystem inside your binary” idea is one example of the sort of feature that we want to be easily prototyped in such a system. The core features we need—serving the content, and then binding it into the process's namespace”are already provided by the OS, so it's a matter of joining them together.

[/devel] permanent link contact

Mon, 30 Jan 2017

Debugging with the natives, part 2

Some aeons ago, I wrote a piece giving a rough outline of how a native debugger like gdb works, and promised a follow-up that looks at the same pile of techniques in a more principled way. I can't really excuse the delay, but if anyone's still listening, here goes.

Source-level debugging of native code is supported on (I make it) four different different levels in a complete hardware–software stack.

We can split these four into two pairs. The first two implement machine-level primitives and their “upward mapping” (virtualisation) into the operating system's process abstraction. The second two implement the source-level view that the programmer usually prefers, again mapping upwards from binary- to source-level.

Let's take the machine-level pair to begin with. From the operating system's point of view, all debugging support is designed around the following principles. (Perhaps I shouldn't say “designed” since in reality they “grew”— and only became principles later.)

Some surprisingly strong properties result from this design. Firstly, debugging can be done from a remote process, perhaps on a separate machine from the debuggee, perhaps even a machine of a different architecture. Secondly, debugging can be done post-mortem. Thirdly, the same infrastructure works for many source languages—albeit trivially so far, since we've only seen how to get an assembly-level view. There are some contrasts here with most language virtual machines (think JVM): these implement debugging using in-VM debug servers. These can work across the network, but don't support post-mortem debugging, and typically bake in concepts from source languages.

That's enough about the machine-level part. To go from machine- or assembly-level debugging to source-level debugging, we need help from the compiler. This is designed around the following principles.

Let's do a more concrete run-through of how it works. So far I've been fairly generic, but let's fix on GNU/Linux as our modern Unix—though all ELF-based systems are pretty similar—and a familiar architecture (x86-64) and specific metadata format (DWARF).

When I compile a program with -g it means “please generate metadata”. First, let's try without.

$ cc -o hello hello.c
$ readelf -S hello | grep debug  # no output! no debugging sections

You can still debug this program, at the assembly level, because the OS debugging mechanisms remain available. It's as if the compiler-generated assembly is code that you wrote manually by yourself. You can set breakpoints, watchpoints, single step, and so on.

$ gdb -q --args ./hello
Reading symbols from ./hello...(no debugging symbols found)...done.
(gdb) break main
Breakpoint 1 at 0x40052a
(gdb) run
Starting program: /tmp/hello 

Breakpoint 1, 0x000000000040052a in main ()
(gdb) disas
Dump of assembler code for function main:
   0x0000000000400526 <+0>:     push   %rbp
   0x0000000000400527 <+1>:     mov    %rsp,%rbp
=> 0x000000000040052a <+4>:     mov    $0x4005c4,%edi
   0x000000000040052f <+9>:     callq  0x400400 <puts@plt>
   0x0000000000400534 <+14>:    mov    $0x0,%eax
   0x0000000000400539 <+19>:    pop    %rbp
   0x000000000040053a <+20>:    retq 

Now let's compile with debug information.

$ cc -g -o hello hello.c
$ readelf -S hello | grep debug
  [28] .debug_aranges    PROGBITS         0000000000000000  00001081
  [29] .debug_info       PROGBITS         0000000000000000  000010b1
  [30] .debug_abbrev     PROGBITS         0000000000000000  00001142
  [31] .debug_line       PROGBITS         0000000000000000  00001186
  [32] .debug_frame      PROGBITS         0000000000000000  000011c8
  [33] .debug_str        PROGBITS         0000000000000000  00001210

What's in these debug sections? There are three main kinds of information. Firstly, there are files and line numbers (.debug_line). These encode a mapping from object code addresses to source coordinates (file, line, column). You can dump it fairly readably, as follows.

$ readelf -wL hello 
Decoded dump of debug contents of section .debug_line:

CU: hello.c:
File name                            Line number    Starting address
hello.c                                        4            0x400526
hello.c                                        5            0x40052a
hello.c                                        6            0x400534
hello.c                                        7            0x400539

Secondly, there is frame information (often this comes out in a section called .eh_frame so I cheated a bit above; to get exactly the above with a current gcc, you should add the -fno-dwarf2-cfi-asm switch). This tells the debugger how to walk the stack. What does walking the stack mean? Roughly it means getting a sequence of stack pointers paired with their program counter values, for each call site that is active on the callchain. The old stack pointer and program counter are always saved somewhere, otherwise we couldn't return from the call. To walk the stack we start from the current “live” register file given to use by ptrace(), which holds the “end” stack pointer and program counter. The DWARF then describes how to “rewind” these register values, and/or any other registers whose job is to record the callchain (rbp on x86-64; other callee-saves are often included too) back to their state at the previous call site in the chain. The description of this unwinding is logically a table, which you can see below for the main function. The cells are expressions describing how to compute the caller's value for a register, here rbp value (frame pointer) and also the caller's program counter, i.e. the return address (given as ra). The computations are both factored into two. Firstly we calculate a “canonical frame address” from the current frame's register values (see the CFA column): it's a fixed offset from rsp and rbp, and is actually a fixed address on the stack, but the expression changes from instruction to instruction as the stack pointer gets adjusted. Secondly we obtain the saved values we want by reading from the stack at fixed offsets from that address (c-8 means 8 bytes down). This factoring helps compactness, because the CFA-relative offsets don't change when the stack pointer moves; only the CFA column needs to describe that. However, although “stored at some offset from the CFA” covers a lot of cases, sometimes more complex computations are required, which usually appear as DWARF bytecode expressions.

$ readelf -wF hello
00000088 000000000000002c 0000001c FDE cie=00000070 pc=0000000000400526..000000000040053b
   LOC           CFA      rbp   ra    
0000000000400526 rsp+8    u     c-8 
0000000000400527 rsp+16   c-16  c-8 
000000000040052a rbp+16   c-16  c-8 
000000000040053a rsp+8    c-16  c-8 

The .debug_info section is the biggie. It describes the structural detail of the source program along both source and binary dimensions. It has a list of source files, but also a list of compilation units. The latter is where most of the structure is. It describes functions/methods, data types, and all the language-implementation decisions that the compiler took when generating binary code: how data types are laid out, which registers or stack slots hold each local variable over its lifetime, and so on. Although not shown much in the simple case shown below, addresses of program variables are described in a Turing-powerful stack machine language which is essentially a bytecode; the DW_OP_call_frame_cfa below is one operation, which simply says “push the address of the frame base, as recorded by the frame info”. The tree-like structure of the information also describes detailed static structure of code, including function inlining, the in-memory locations corresponding to particular lexical blocks in the code, and so on. (It's worth asking whether DWARF info it might usefully bundle the source code itself. I've never seen this done, but it would make a lot of sense to me.)

$ readelf -wi hello
Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:        0x8d (32-bit)
   Version:       4
   Abbrev Offset: 0x0
   Pointer Size:  8
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_producer    : (indirect string, offset: 0x2f): GNU C 4.9.2 -mtune=generic -march=x86-64 -g -fno-dwarf2-cfi-asm -fstack-protector-strong
    <10>   DW_AT_language    : 1        (ANSI C)
    <11>   DW_AT_name        : (indirect string, offset: 0x88): hello.c
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0xb5): /tmp
    <19>   DW_AT_low_pc      : 0x400526
    <21>   DW_AT_high_pc     : 0x15
    <29>   DW_AT_stmt_list   : 0x0
 <1><2d>: Abbrev Number: 2 (DW_TAG_base_type)
    <2e>   DW_AT_byte_size   : 8
    <2f>   DW_AT_encoding    : 7        (unsigned)
    <30>   DW_AT_name        : (indirect string, offset: 0x0): long unsigned int
 <1><34>: Abbrev Number: 2 (DW_TAG_base_type)
    <35>   DW_AT_byte_size   : 1
    <36>   DW_AT_encoding    : 8        (unsigned char)
    <37>   DW_AT_name        : (indirect string, offset: 0xa2): unsigned char
 <1><3b>: Abbrev Number: 2 (DW_TAG_base_type)
    <3c>   DW_AT_byte_size   : 2
    <3d>   DW_AT_encoding    : 7        (unsigned)
    <3e>   DW_AT_name        : (indirect string, offset: 0x12): short unsigned int
 <1><42>: Abbrev Number: 2 (DW_TAG_base_type)
    <43>   DW_AT_byte_size   : 4
    <44>   DW_AT_encoding    : 7        (unsigned)
    <45>   DW_AT_name        : (indirect string, offset: 0x5): unsigned int
 <1><49>: Abbrev Number: 2 (DW_TAG_base_type)
    <4a>   DW_AT_byte_size   : 1
    <4b>   DW_AT_encoding    : 6        (signed char)
    <4c>   DW_AT_name        : (indirect string, offset: 0xa4): signed char
 <1><50>: Abbrev Number: 2 (DW_TAG_base_type)
    <51>   DW_AT_byte_size   : 2
    <52>   DW_AT_encoding    : 5        (signed)
    <53>   DW_AT_name        : (indirect string, offset: 0x25): short int
 <1><57>: Abbrev Number: 3 (DW_TAG_base_type)
    <58>   DW_AT_byte_size   : 4
    <59>   DW_AT_encoding    : 5        (signed)
    <5a>   DW_AT_name        : int
 <1><5e>: Abbrev Number: 2 (DW_TAG_base_type)
    <5f>   DW_AT_byte_size   : 8
    <60>   DW_AT_encoding    : 5        (signed)
    <61>   DW_AT_name        : (indirect string, offset: 0xb0): long int
 <1><65>: Abbrev Number: 2 (DW_TAG_base_type)
    <66>   DW_AT_byte_size   : 8
    <67>   DW_AT_encoding    : 7        (unsigned)
    <68>   DW_AT_name        : (indirect string, offset: 0xb9): sizetype
 <1><6c>: Abbrev Number: 2 (DW_TAG_base_type)
    <6d>   DW_AT_byte_size   : 1
    <6e>   DW_AT_encoding    : 6        (signed char)
    <6f>   DW_AT_name        : (indirect string, offset: 0xab): char
 <1><73>: Abbrev Number: 4 (DW_TAG_subprogram)
    <74>   DW_AT_external    : 1
    <74>   DW_AT_name        : (indirect string, offset: 0xc2): main
    <78>   DW_AT_decl_file   : 1
    <79>   DW_AT_decl_line   : 3
    <7a>   DW_AT_prototyped  : 1
    <7a>   DW_AT_type        : <0x57>
    <7e>   DW_AT_low_pc      : 0x400526
    <86>   DW_AT_high_pc     : 0x15
    <8e>   DW_AT_frame_base  : 1 byte block: 9c         (DW_OP_call_frame_cfa)
    <90>   DW_AT_GNU_all_tail_call_sites: 1
 <1><90>: Abbrev Number: 0

That's it for the tour. Let me finish with some reflections on what's good and bad about this way of doing things.

Descriptive debugging: the good

Why do debugging this convoluted, long-winded way, with all this metadata, instead of the apparently simpler VM way of doing things? In VMs, debug servers are integrated into the runtime, and offer a fixed, high-level command interface. This allows the VM's compile-time implementation decisions to stay hidden. There is no need to tell the debugger how storage is laid out, where local variables live, and so on, because the command interface is pitched at a higher level of abstraction; those details remain internal to the VM. This is convenient for the VM implementer, since generation of debugging information is onerous to implement. But it also requires cooperation from the debuggee, couples debuggers to a fixed command language or wire protocol, and presents strictly less information to the developer. While VM debuggers are designed around the abstraction boundaries of the source languages, metadata-based debugging actively enables descending through these boundaries. It is sometimes very useful for debugging tools to expose implementation details in this way. The most obvious case is when faced with a compiler or VM bug; the user would like to “shift down” to the lower level to inspect the assembly code or VM state. At other times, there are performance bugs that the developer has a hunch are about cache or paging effects; being able to see the raw addresses and raw memory contents can help here, even when the program is running on a VM.

Being highly descriptive, debugging metadata documents a large number of implementation decisions taken by the compiler, so is useful not only to debuggers but also to profilers, language runtimes (C++ exception handling is usually built on DWARF frame information), other dynamic analysis tools such as Valgrind family, and so on.

Debugging optimised code (without deoptimising)

Debugging metadata must describe optimised code. By contrast, VM debug servers typically arrange that debug-server operations only need to deal with unoptimised stack frames and at most simply-generated code (e.g. from a template-based JIT). Confusingly, even the “full-speed debugging” feature of HotSpot uses dynamic deoptimisation to get back to unoptimised code—the earlier approach was to run the whole program under the interpreter whenever you wanted a debuggable execution. In general, a debuggable VM instance must either refrain from optimisation, or know how to dynamically undo that optimisation when a debugger is attached. So, dynamic deoptimisation is not exactly “full speed”—unlike with native debuggers, execution still slows down significantly when a debugger is attached. Having the VM implement debug operations only over unoptimised code is a restriction that helps make the debug server simple, at some cost in debug-time performance.

The flip side is that VM debugging is pretty good at precisely maintaining the source-level abstraction (modulo VM bugs), without complicating the task of implementing optimisations. Meanwhile, in Unix-land, the debugging experience remains best-effort and only as good as the compiler-generated metadata, which is sometimes wrong or incomplete following complex transformations. When optimisation and debuggability are in tension, debuggability usually takes the hit, so a smooth debugging experience still sometimes relies on a separate unoptimised “debug build”. Tail call optimisations are a classic debugging-hindering optimisation, since they rely on eliding stack frames, meaning the debugger cannot know how many recursive calls are logically on the (source-level) stack. Instruction scheduling is another: the order of operations in the executed code need not match source program order, and this can make for debug-time confusion.

The control problem

Some other properties are incidental but tend to be true of current Unix-style debugging. Debugger support for exercising the target language's control features (exceptions, threads, allocations, ...) is uneven, because this can't be described purely as metadata; there needs to be some interaction with the host runtime. DWARF and similar debugging information is good at the “statics” of describing how to decode the program state, but not good at describing the protocols of interaction with a language runtime, necessar for performing operations such as spawning a thread, allocating an object or throwing an exception. These tend to be difficult unless these happen to be cleanly exposed as entry points in the language runtime. In practice debuggers usually achieve these things by having magic knowledge of particular implementations.

At least one semi-portable interface has emerged with the aim of encapsulating run-time control operations for debuggers' benefit. I'm thinking of libthread_db, best described by Joe Damato's excellent article. Unfortunately it's an abomination, because it violates the principle that implementation details are described by architecture-independent metadata. An odd but cleaner and more consistent alternative would be to bundle snippets of DWARF bytecode for doing these runtime interactions—perhaps in the debug information of a language runtime, either simply calling into the runtime (for cleanly abstracted operations) or doing something more complex. But that is only a technical possibility; there are no proposals or working demos of that as far as I'm aware (maybe I'll make one). This might sound wacky, but if you know about the early history of Java, in Oak and the Green Project, you'll see a certain uncanny similarity in these ideas.

Levels of abstraction

Debugging at multiple levels of abstraction is a neat facility of Unix-style debugging, but is also a difficult power to control. It can be useful to switch down to the assembly-level view, or to switch languages, but this capability doesn't generalise to the case where many abstraction layers are built up within the same language (think C++). The debugger will let you “continue to the next source line”, but it doesn't know how to keep you at the same abstraction level. If the next source line is deep inside a library implementing something fairly basic like a smart pointer, it will skip only as far as that line, whereas you probably wanted to stay roughly at the same level of abstraction, or perhaps within the same codebase. Things get particularly bad when there is a lot of inlining (again with C++). The traditional “step over” and “step into” are hints at this need, but are too crude.

Doing better is currently beyond the debugger's ken, but this problem could be solved: perhaps by bringing in the knowledge of libraries and source file trees that the debugger already has, or perhaps most simply by allowing programmers to manually mark out the boundaries between layers. This could be a simple partitioning over source files and binary objects, or could be something more complex, perhaps sensitive to calling context or argument values (consider the case of the same library used from two places in the same application). “Next”-style operations could then be defined in terms of these layers. I'd love to see this, although the details would take a lot of working out.

To be continued?

There's plenty of research to be done on more debuggable language implementations. This research area doesn't seem to get the attention it deserves. One problem is that debugging is usually an afterthought for researchers, even though it is essential for programmers. Another problem is that not many PL researchers have sight of the design space; they're familiar with either the Unix-style approach or the VM-style one. I hope that in the future we can figure out how to get the best of both worlds.

[/devel] permanent link contact

Wed, 21 Sep 2016

Project suggestion: run-time type information in the FreeBSD kernel

It's past time I wrote up some Part II project suggestions. This post is about a new one. See also one from last year about a debuggable implementation of OCaml, also my suggestions from 2014, another from 2014 and other standing suggestions.

My liballocs project is a library and toolchain extension that allows tracking of run-time allocation types in large C programs, precisely and efficiently. On top of this, the libcrunch system uses this information to do run-time type checking, such as checking that a pointer cast matches the type of the actual pointed-to object. The idea is to make C coding a lot more easily debuggable, by providing clean failure messages at run time, rather like ClassCastException or ArrayIndexOutOfBoundsException in Java. There are various other applications in debugging and program comprehension, since liballocs effectively adds an oracle capable of answering the question “what is on the end of this pointer?”.

Both of these systems have been developed for user-space code but could usefully be applied in the kernel. The toolchain extensions, which gather type information from binaries and analyse allocation sites in C source code, should work with kernel code mostly unchanged. However, the liballocs runtime is written against user-level interfaces and cannot be used as-is.

There are two obvious approaches to creating a liballocs runtime suitable for the kernel.

One is simply to modify the existing liballocs runtime to support kernel-side interfaces. This should yield a liballocs.a library that can be linked into the kernel, adding code to observe the kernel's allocations and maintain type metadata on them as it executes. The main differences will be how the runtime allocates memory, and how it locates and loads type information. Likely, the latter will be trivial (the type information can simply be linked in to the kernel), except perhaps for loadable modules (they must be created with care, so that they contain any additional type information). The former is more complex, since the current runtime makes extensive use of Linux's unreserved virtual memory (via MAP_NORESERVE) which is not available as such in the FreeBSD kernel, although equivalent alternatives are likely possible with some hacking.

The other option is to use DTrace to write a non-invasive “monitor” process, started early during boot, that captures equivalent information to the liballocs runtime. One could then implement a client library which mimics the liballocs API and supports the same kinds of query. This approach is likely simpler to get started, and allows for a completely external, non-invasive client that can be used without recompiling the kernel. However, it is less reliable because it is likely to miss certain kernel allocation events, such as those happening early during kernel initialisation, or those that DTrace drops because of buffer contention. These will lead to incomplete type information. It is also unlikely to be able to answer queries coming from the kernel itself, so is useful for debugging and comprehension tasks (queried by a user-space program) but not for run-time type checking.

Both versions will require careful understanding of the FreeBSD kernel's allocation routines, for which some familiarisation time must be allowed. The DTrace option would be a good vehicle for exploring this, after which an executive decision could be made about whether to switch to the full in-kernel option.

[/research] permanent link contact

Thu, 25 Feb 2016

Debugging with the natives, part 1

By popular request, I'm going to run a couple of articles explaining how native Unix debuggers like gdb or lldb work. In this one, we'll take a quick cross-section of what's going on. I'll say “gdb” but I could be talking about any similar debugger.

Starting things up

When you start a program in a debugger like gdb, it does a variation of the usual fork()exec() trick, in which it sets itself up as a tracer of the child process before it exec()s the program you want to run. Alternatively, the same API can attach to a process already running. Either way, the effect is that any signals received by the “tracee” process will be delivered first to the debugger, and any system calls made by the tracee will also generate a signal. These signals can be inspected, discarded or modified as the debugger wishes, using the ptrace() API. In effect, the process (tracee) and the debugger (tracer) execute by mutual hand-off, suspending and resuming each other.

Walking the stack

Using the same API, gdb can access the raw memory image of the debugged program, and its register file. (Note that when gdb is active, the program is suspended, so its registers are saved in memory.) It can use this to walk the stack and print a backtrace: the register file gives us the program counter and stack pointer. If we have a nice simple stack which saves frame pointers, it simply walks the chain in memory. To print symbolic function names, it can use either the symbol table (if we must) or the compiler-generated debugging information (better; more on that in a moment).

Reading local state

To print a local variable, the debugger uses the same compiler-generated debugging information to look up where that variable is currently located. This lookup is keyed on the program's current program counter, to handle the way that locals get moved around between registers and the stack from instruction to instruction. The result of the lookup might be a memory location, a register number, or even, sometimes, something composite like “first 8 bytes in register X, rest in memory location Y”.

Setting breakpoints

To set a breakpoint on a particular source line, gdb uses a different part of the debugging information which encodes a mapping between binary locations (program counter values) and source file/line/column coordinates. A small subset of program counter values are specially identified as “beginning of statement” instructions; these are normally the ones the user wants to break on. Sometimes, hardware debug registers can be used to implement breakpoints (up to some maximum); the CPU generates a trap on reaching the program counter value stored in the register. More classically, the “soft” method is for the debugger to overwrite the instructions with trap instructions, saving the old instruction. This trap will cause a signal to be generated when the program executes it.

Resuming from breakpoints

To resume from a software breakpoint, a bit of juggling is required. Recall that we overwrote the original instruction. We can replace it, then ask the OS to single-step the program (using the hardware's single-step mode), then re-set the trap. This is racy in multithreaded programs when the other threads aren't stopped. So instead a good debugger will emulate the instruction itself! This means using an internal interpreter for the target instruction set, backed by the memory and register file access that it already has via ptrace().

Conditional breakpoints and expression evaluation

If we have a conditional breakpoint, we also need to be able to evaluate expressions when we take the trap (and silently resume if the expression is false). For this, the debugger has a parser and interpreter for each source language it supports. These interpreters are very different from an ordinary interpreter: the expression's environment comes from the binary contents of the debugged program's image, rather than being a structure the interpreter itself maintains. (The interpreter keeps a small amount of local state, such as for subexpression results, but it's mostly in the debuggee.)

Calling functions

To evaluate expressions containing function calls, we need to be able to run code in the debuggee. To do so, we craft a special frame on the debuggee's stack, a safe distance away from its actual top-of-stack. Having evaluated the arguments as usual, we put them in the relevant registers (saving what was there), tweak the saved stack pointer to point to this crafted state, set the instruction pointer to the callee, and set the debuggee going again. The function runs as normal, and on return it uses an on-stack return address that the debugger supplied. This points at a breakpoint-like instruction that raises a signal, returning control to the debugger. This then cleans up the stack as if nothing happened, and resets the registers accordingly. (Disclaimer: I haven't actually dug through gdb's code here, so I might have one or two things subtly wrong. At least, the above is how I'd implement it. Let me know if you know better.)

It goes on...

There's lots more, but if you understand the above, you've probably got the gist.

All that seems like a big bag of tricks. What are the principles at work in this design? Although the system has “grown” rather than being “designed”, and has a generous share of quirks and unstandardised corners, there are some surprisingly strong principles lurking in it. I've written a little about these in my paper at Onward! last year. Next time, I'll cover briefly some of the same observations, and a few others, hopefully thereby re-explaining how debuggers work, a bit more systematically .

[/devel] permanent link contact

Wed, 30 Sep 2015

Project suggestion: an observable OCaml, using liballocs

It's the season to suggest student projects, and this post is about one which follows up a series of blog posts (1, 2, 3, 4). Feel free to see also my suggestions from last year, another from last year and other standing suggestions

This post is about a suggested project which will produce a working implementation of the OCaml programming language, using an existing front-end but a very different back-end.

Unlike the mainstream implementation, this implementation will be optimized for debuggability, or more generally “observability”. The compilation strategy will be to translate to C, hence obtaining the debuggability (and, to some as-yet-unknown extent, the performance) of the C compilation toolchain.

This means at least three key differences. Firstly, all locally-bound names will map to C local variables. Secondly, OCaml's lexical closures will be implemented in a highly compatible way, perhaps using the technique of Breuel (which is already implemented in the GNU C compiler, but with stack allocation, whose lifetime semantics are not appropriate for OCaml). Thirdly, all allocations will have associated type metadata at run time, using liballocs This will allow parametric polymorphism to be “seen through” by a debugger—so it can tell you, for example, that some stack frame of a function whose type is 'a -> 'a is actually activated with 'a as int, say. (This may or may not be the right formulation of “seeing through”—stronger and weaker versions are possible; ask me!)

Key ingredients are the following:

Optional “extension”-style challenges might include the following:

Evaluation of the system is mostly via performance—the goal would be to get within a reasonable factor of the OCaml native-code compiler. We can also do some experiments to measure observability of our system. One simple experiment might interrupt both versions of a program at randomly chosen call sites and count how much of the local state we could recover (number of frames in the backtrace, number of locals in those frames; perhaps even doing a diff on their values). Call sites are a good choice because all but one frame of the stack is always stopped at one, and instrumenting the native code so that we can do lock-step breakpointing, by counting call invocations, should not be too difficult (though it might mean disabling inlining, which does affect what is being measured). Usually there'll be no contest, since the original OCaml native compiler doesn't let you observe locally bound values at all. There is, however, a development branch which does, and comparing with the bytecode debugger (ocamldebug) is also a good bet.

[/research] permanent link contact

Wed, 16 Sep 2015

Partially evaluating a bytecode interpreter using C++ templates

I should start this by saying that I'm definitely not a C++ metaprogramming genius. People have done outrageously clever and diverting things with C++ templates, and I am not about to outdo anyone. Nevertheless, this might be interesting.

I was facing the task of taking code in a simple but obscure stack machine language—the stack machine from DWARF, if you couldn't guess—and turning it into vaguely efficient machine code. My thinking was spurred on by systems like Truffle, or PyPy's tracing JIT, which produce compiled code from only source code and an interpreter for the source language. Unlike those systems, I didn't need dynamic compilation—offline is fine. But also, I wanted a system with easy support for C and C++ intrinsics, zero runtime baggage, and something I could prototype in a day or so. It turned out that writing the interpreter in C++ templates, and using the C++ compiler as a partial evaluator, was reasonably successful at all these. Here's how it works.

To start with, I simplified right down from DWARF to an analogous but much simpler stack machine where all operands are passed on the stack. We model instructions using simply an enumeration. This is not essential, or even desirable, except for exposition.

/* instruction -- a simple enumeration */
enum instr
    PLUS, /* add top two stack slots and push result */
    BZ,   /* pop val and dest off stack; if val is zero, branch to dest */
    PEEK, /* pop an address off the stack and push its contents */
    DUP2  /* clone the top two elements of the stack */

[I've edited this slightly from my initial version of this post, to add the DUP2 instruction. This allows me to include a slightly more meaningful example program—thanks to Ross Bencina for prompting that.]

The four instructions are representative of the classes of operation I care about: arithmetic, control, C intrinsic and stack manipulation. They're not a very complete machine, but they suffice to illustrate things. To witness the power of this somewhat-operational stack machine, a simple program which combines all four opcodes is as follows.

typedef program< PLUS, program< DUP2, program< BZ, program< PEEK, void > > > > prog;

This program add two numbers taken as input, and if they sum to something non-zero, it interprets that as an address and peeks it, returning what's stored there. Otherwise it returns zero. Note that since we only have a computed branch instruction, not a direct branch, the branch target exists as data in the initial stack. From top to bottom, our initial stack must be [arg1, arg2, 4], where arg1 and arg2 are the two input values, 4 is the program counter to branch to (the end of the program). We don't need to store the literal zero in this case (I'll let you figure out why).

So, clearly, a program is a sequence of instructions. We model it in the C++ compiler as a template-level sequence, with the opcode as one template argument and the program “tail”, i.e. the following instructions, as another template argument.

/* program -- a sequence of opcodes */
template <unsigned Op, class Next>
struct program
    static const long val = Op;
    typedef Next tail;

In effect, this gives us a “linked list” structure but at the template level. We can write some simple helper functions over this kind of structure, using a functional style in which template specialization is how we do case-splitting and template instantiation is how we do recursion.

template <class Seq> 
struct sequence_length
    static const long value = 1 + sequence_length<Seq::tail>::value;

template <>
struct sequence_length<void>
    static const long value = 0;

/* Utility for getting the tail of a type-level sequence. */
template <typename Seq>
struct tail_of
    typedef typename Seq::tail type;
template <>
struct tail_of<void>
    typedef void type;

/* Utility for getting the nth of a sequence, then the rest;
 * -- it's like "let s = drop n Seq in (hd s, tl s)" */
template <unsigned n, class Seq> // n > 0 case
struct nth
    static const long val = nth<n-1, typename tail_of<Seq>::type>::val;
    typedef typename tail_of< nth<n-1, typename tail_of<Seq>::type> >::type tail;
/* termination case: n == 0 */
template <class Seq>
struct nth<0, Seq>
    static const long val = Seq::val;
    typedef typename tail_of<Seq>::type tail;

The main use of sequences is to model the program itself, a sequence of instructions. We also use them to model the program counter, encoding it simply as the “tail” of the program, i.e. the subsequence of instructions from the current position to the end of the program. (This won't stop us from jumping backwards, because we'll always pass around the whole program too. I'll come on to jumps a bit later.)

What's our program state? It's a program counter and an operand stack. Since, in general, our stack will not be statically known, we have to model it at the value level, passing it around as “real” arguments at run time rather than template arguments. (I did create a version where a parallel family of template specializations could be used to encode statically known stacks. That's neat because it you can get the C++ compiler to partially evaluate the stack-machine program for known input. But I'm skipping that extra complexity here.)

To actually represent the stack at run time we'll use a plain old array of long. I'll show you that in a second. Before I do, let's start to piece together what our evaluator looks like. It's a template that takes arguments standing for the whole program and the program tail, i.e. program counter. The whole program is always statically known, so is a template argument. There are finitely many possible values of the program counter, so we can make the compiler enumerate them; this means the program tail can be a template argument too. (This works nicely with how we do branches; I'll come on to that.)

/* States in a computation */
template <
    /* Program */ typename Program,
    /* ProgramTail */ typename ProgramTail    /* i.e. a suffix of the Program, 
                                                 beginning with the next instruction to run. */
struct state
    static long eval(/* some args... */)
        /* figure out our next state and delegate to it... */

We also want each state to support an eval() method which executes the program's continuation from that state. Most of the time this will work by calculating the next state and delegating to its eval() method. For states that are “end states”, like when we hit the end of the program, we'll instead return something—the value on the top of the stack. We can handle this case using another specialization (which we'll define later).

A simple opcode's eval() method looks something like the following, which implements PLUS. We pass the stack as an argument to the eval() method, in the form of two pointers, bottom and top.

template <
    /* Program */ typename WholeProgram,
    /* ProgramTail */ typename ProgramTail
struct state< WholeProgram, program<PLUS, ProgramTail> >
    static long eval(long *bottom, long *top)
        top[-1] = top[0] + top[-1];
        return state<WholeProgram, ProgramTail >::eval(bottom, top - 1);

Here we destructively update the stack, decrementing the top. Note that the whole interpreter works in continuation-passing style, and the current stack is always passed around explicitly and linearly—all our eval() calls are tail calls. Currently we malloc() the stack eagerly to a large-ish size, and abort on overflow. But linearity means we could even start small and then realloc() it if it overflowed. A neater trick would be to make it an actual stack that is lazily allocated by the OS, using mmap()—this would give us gradual allocation without cluttering our instruction stream with overflow checks or realloc() calls (but I haven't implemented this yet).

That's the basic idea of our evaluator: each opcode can be implemented by a template specialization which gets elaborated differently according to its position in the program. What about branches? We can handle branches by a single specialization for the case where the PC is not known statically. As before, we deal with the not-statically-known value by accepting it as a method (value) argument rather than a template (type) argument. The simplest and fastest way to do this is to code up a jump table. For this to work, we also have to bound the size of programs.


struct run_time_pc {};

template <
    /* Program */ typename WholeProgram
struct state< WholeProgram, run_time_pc >
    static long eval(long *bottom, long *top, long pc)
        switch (pc)
#define CASE(n) \
case n: return state< \
        WholeProgram, \
        program< \
            /* We build the tail starting from the nth, using the definitions in `nth<>'. */ \
            nth<(n), WholeProgram>::val, \
            typename nth<(n), WholeProgram>::tail \
        > \
    >::eval(bottom, top);
            /* ... */
            /* MAX_PROGRAM_SIZE */
                assert(pc &gt;= MAX_PROGRAM_SIZE); 
                warnx("branched out of bounds");

Now, to implement a branch opcode, all we need to do is have it call this run_time_pc specialization's eval(), which takes an extra argument: the destination program counter, passed as a value. Passing it as a value is what allows the jumping instruction to actually decide the destination at run time. At worst, the C++ compiler compiles this switch statement down to a jump table which can jump to the code implementing any of the program's instructions. If this happens, the compiler must enumerate all these destinations up-front in order to build the jump table, hence why we had to bound the program size. At best, though, if we're doing a direct branch, the branch target will be statically known in the caller of this eval() template, allowing the C++ compiler to optimise away the dead switch cases, replacing the jump table with a direct branch. (In our current stack machine, all branch destinations are encoded on the stack, so are potentially data-dependent, meaning we don't hit this direct-branch case. But in the actual DWARF stack machine, we can encode direct branch targets into the instruction, so we will hit this case.)

Since each entry in this jump table jumps to a specific part of the program, we immediately transition back to having a statically known PC value. Our implementation of branches is completely self-contained within this one mechanism.

Actually, make that two mechanisms, because I want to add an alternative. We can avoid hackily bounding the size of the program, if we don't mind rewriting our jump table as a chain of ifelse branches. We can use templates to generate this chain of branches, for any length. To do this, we parameterise our run_time_pc type on the upper bound of what program counter value we might be jumping to. Initially this is the length of the program; recursive elaborations' ifelse-tests whittle this down to zero, the lowest-numbered place we could jump to. In this way, the compiler effectively builds our switch statement (as an if–else chain) for us. As before, if it's a direct branch, all the tests will get elaborated away. If it's not, whether we get a nice jump table depends on how clever the compiler is. For me, sadly g++ seems not to generate a jump table, even at -O3, but perhaps it can be coaxed into it somehow (let me know if you know!).

template <unsigned UpperBound> 
struct run_time_pc {};

template <
    /* Program */ typename WholeProgram,
                  unsigned MaxPc
struct state< WholeProgram, run_time_pc<MaxPc> >
    /* This is fast only if the C++ compiler can turn if--else chains back into jump table. */
    static long eval_if_else(long *bottom, long *top, long pc)
        if (pc == MaxPc) return state<
                /* We build the tail starting from the nth, using the definitions in nth<>. */
                nth<MaxPc, WholeProgram>::val,
                typename nth<MaxPc, WholeProgram>::tail
        >::eval(bottom, top);
        else return state<WholeProgram, run_time_pc<MaxPc - 1> >::eval_if_else(bottom, top, pc);
template <
    /* Program */ typename WholeProgram
struct state< WholeProgram, run_time_pc<0> >
    static long eval_if_else(long *bottom, long *top, long pc)
        if (pc == 0) return state<
                /* We build the tail starting from the nth, using the definitions in nth<>. */
                nth<0, WholeProgram>::val,
                typename nth<0, WholeProgram>::tail
        >::eval(bottom, top);
        else abort();

I promised to show how we deal with the end of the program. We just specialize for the case where the program tail is void, i.e. the empty sequence.

/* This specialization handles "end of program". */
template <
    /* Program */ typename Program,
struct state<Program, void>
    static long eval(long *bottom, long *top)
        if (top <= bottom) abort();
        return top[0];

There are a few more fiddly things to figure out to make it all compile, but I've shown you all the interesting stuff. A working snapshot of the implementation is here. It compiled only with g++, not clang++, the last time I tried; fixes are welcome. It implements everything I've mentioned. In the near future I'll turn it into an actual DWARF compiler, and it will appear in liballocs. You might ask: why? In short, compiled DWARF expressions are hopefully a way to achieve faster introspection on complex memory layouts. The liballocs metadata toolchain already “compiles” much DWARF content down to an efficient in-memory representation. By compiling not only manifest DWARF structure but also computational DWARF expressions, we can efficiently introspect on structures whose layout can only be described precisely via a computation—think variable-length arrays, hash tables, and so on. Eventually I also want to support “compiled frame information”, which I hope will allow faster stack walking.

[/research] permanent link contact

Thu, 16 Jul 2015

ELF introspection, robustly and portably

I often find myself writing code that does introspection at the ELF level. This is for things like looking up symbols, walking the link map, figuring out which shared object a particular memory address belongs to, which memory regions are executable, and so on. These are fairly common things to want to do in ptrace()- or LD_PRELOAD-based tools, or binary interpreter-like (Valgrind-style) ones. This post is about how to do these things robustly from within a process, where “robustly” brings some particular restrictions.

What introspection?

What does our address space look like? To spare myself the pain of drawing diagrams, we can ask Linux to dump a somewhat-pictorial view of a simple cat process by doing cat /proc/self/maps. I have reversed the order of the lines so that higher addresses go higher up, and tweaked the formatting slightly.

          address range           flgs  offset   dev   inode                  pathname or other name   
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

    7fff301fe000-    7fff30200000 r-xp 00000000 00:00 0                  [vdso]

    7fff301a5000-    7fff301c7000 rw-p 00000000 00:00 0                  [stack]

    7f4a574bc000-    7f4a574bd000 rw-p 00000000 00:00 0 
    7f4a574bb000-    7f4a574bc000 rw-p 00023000 08:01 889388             /lib/x86_64-linux-gnu/ld-2.19.so
    7f4a574ba000-    7f4a574bb000 r--p 00022000 08:01 889388             /lib/x86_64-linux-gnu/ld-2.19.so
    7f4a57493000-    7f4a574ba000 rw-p 00000000 00:00 0 

    7f4a57298000-    7f4a572bb000 r-xp 00000000 08:01 889388             /lib/x86_64-linux-gnu/ld-2.19.so
    7f4a57293000-    7f4a57298000 rw-p 00000000 00:00 0 
    7f4a57291000-    7f4a57293000 rw-p 001bf000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a5728d000-    7f4a57291000 r--p 001bb000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a5708e000-    7f4a5728d000 ---p 001bc000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a56ed2000-    7f4a5708e000 r-xp 00000000 08:01 889403             /lib/x86_64-linux-gnu/libc-2.19.so
    7f4a56c09000-    7f4a56ed2000 r--p 00000000 08:05 2750795            /usr/lib/locale/locale-archive
        00838000-        00859000 rw-p 00000000 00:00 0                  [heap]
        0060c000-        0060d000 rw-p 0000c000 08:01 286233             /bin/cat
        0060b000-        0060c000 r--p 0000b000 08:01 286233             /bin/cat
        00400000-        0040c000 r-xp 00000000 08:01 286233             /bin/cat

This picture is a refinement of the usual OS textbook diagram of a Unix address space. As usual, we have the executable's text segment (bottom), read-only and then writable data (next two up), heap (next up), a stack (much higher) and the kernel region (up top). (Actually the kernel region is not shown, except for one small part, the vsyscall page, which is the only user-accessible area.) The biggest difference is that we have not only the executable but some libraries—primarily the C library (since cat doesn't use others), but also the dynamic linker (itself a library) and the mysterious vdso. All these are ELF objects. In general, a large subset of a process's structure is actually defined by the collection of loaded ELF objects and a little per-process structure also defined by ELF. Our problem is now: how can we acquire a view on this ELF-level structure, from within the program itself at run time, efficiently and robustly?

(The above listing also shows a few other flavours of mapping that won't concern us here: a memory-mapped file /usr/lib/locale/locale-archive, and two anonymous mappings, beginning at 0x7f4a57493000 and 0x7f4a57293000, which are almost certainly heap areas—we can have many heaps, not just the one that grows out of the executable's data segment. And of course, although we see only one mapping labelled [stack], in a large multithreaded process there will be many such.)

What's ELF?

The basics of ELF linking were first defined in System V Release 4 in 1988, and eventually adopted by non-commercial Unices starting in the mid-1990s (1995 for Linux, with kernel 1.2; 1998 for FreeBSD, with version 3; NetBSD and OpenBSD in 2001 and 2003 respectively). The “classic” picture of a loadable ELF file (direct from the spec) looks like the following.

Execution View
ELF Header
Program header table
Segment 1
Segment 2
Segment 3
Section header table

But actually, that omits some of the most interesting stuff! Let's try again.

Execution View
ELF Header
Program header table
Dynamic linking metadata
Dynamic symbol table
Dynamic relocation table
Segment 1
Segment 2
Segment 3
Section header table
Symbol table

Dynamic symbols, program headers and the miscellaneous dynamic linking metadata are what we're interested in introspecting on. The POSIX libdl API offers one interface for doing introspection at the ELF level, at least for looking up symbols by name. Non-portable extensions, like SunOS/Solaris's dladdr() and the later-added dladdr1(), together with GNU's dlvsym() and dl_iterate_phdr(), get us a bit further, walking program headers and looking up symbols by address or version. We get further still if we allow ourselves OS-specific features, like Linux's helpful maps file in the /proc filesystem that generated the dump above, together with the auxv file or its associated library call getauxval(). If we're happy to parse the ELF file itself, by doing file I/O, we can get at anything we like.

What's “robustly”?

There are problems with all of the preceding approaches. Portability is lacking from many—only the POSIX APIs and ELF file I/O are widely portable. Robustness is lacking from all of them, in the sense that it may not be safe or reliable to use them from “sensitive contexts” where ELF introspection may be required. The main kind of “sensitive context” is one where making system calls or library calls would cause unwanted reentrancy. For example, if we want access to metadata from instrumentation logic that runs in the middle of a malloc() call, calling out to a library which might itself call malloc() is a bad idea. If we're instrumenting the code paths that make system calls (ask me how!), calling out to the OS might also be a bad idea. These unintended reentrancies can easily cause infinite loops, deadlock or state corruption. There are also security reasons why system calls and library calls are less reliable than working entirely within the process: the filesystem might have changed since the filenames in our maps or link map were recorded, for example.

So, how much ELF introspection can we do without the help of system calls, library calls or non-portable code? The answer surprised me: quite a lot, but not quite everything. There is some low-hanging fruit, then some that we can just about reach (provided a couple of assumptions which, although not strictly portable, probably hold for most platforms), then some that absolutely no portable method can let us get (without resorting to file I/O).

Let's state our problem slightly more precisely. Our address space includes a selection of objects (the executable and all loaded libraries), mapped at particular addresses. We'd first like to enumerate these mappings, giving us a look-up from load addresses to the object's originating binary file (pathname)—more-or-less the structure we saw in the /proc/self/maps listing earlier. Furthermore, each object defines some symbols. We'd like to be able to resolve symbols in each object, as well as any other useful other metadata, like their soname, the libraries they depend on, the locations of support structures like procedure linkage table (PLT), and the relocations performed when they were loaded. This is what we saw in the elaborated ELF file picture. The symbols-and-addresses stuff is the most important; the relocation and miscellaneous information isn't useful as often, but we'd like to have it anyway. Finally, there's some useful metadata that the operating system has passed down about the process, in an ELF-defined structure called the auxiliary vector. It turns out that we'll use this to get at some of the other metadata, but some other parts of it are useful in its own right—they tell us the process ID, page size, and various properties of the hardware.

The link map

To enumerate the mapped objects, we need access to a structure called the link map. The portable interface for walking the link map is the mysterious r_debug protocol. This is surprisingly murky in its origins, but it probably comes from SunOS; it was definitely included in System V release 4 (the only solid reference I know is the relevant Programmer's Guide). It's actually not specified in most ABIs, although some seem to do so. The r_debug protocol defines a structure containing pointers to the link map. From an executable, it's easy to find this structure: we usually have a DT_DEBUG entry in PT_DYNAMIC, and the dynamic linker updates to points to a r_debug structure it allocates. This contains a pair of pointers to the link map, represented as a doubly-linked list.

What about from a shared library? Getting a pointer to the r_debug is trickier here, because shared libraries don't usually have a DT_DEBUG entry. This is a chicken-and-egg problem: we need the executable's r_debug to get to other objects, but if we're not starting in the executable, how do we get there? So here comes our first somewhat-non-portable assumption: that a global symbol _r_debug is available, and marks the r_debug structure in the dynamic linker. This symbol is commonly present, but I don't believe it's standard—POSIX generally specifies programmer-facing APIs, not linker-level symbols. (As we'll see shortly, another method for getting at it is available, but it is also non-portable and imposes some other requirements. Anyway, by combining these two options, we have a pretty good chance that we can find the link map. So, let's continue on the assumption that we have the link map.)


Once we have the link map, we'd like to do symbol lookups in each object. For this, we need (at least) the dynamic symbol tables, which we can find via the PT_DYNAMIC segments of each object. (We probably also want the symbol hash tables; the method is similar.) The link map helpfully keeps a pointer to this specific segment (rather than to the overall program header table). The segment's content consists of key-value pairs, and it's the pair whose key is DT_SYMTAB that points to the dynamic symbol table. (On most ABIs, a short cut to get a pointer to the containing object's PT_DYNAMIC segment is to make a link-time reference to the symbol _DYNAMIC. But this only lets us get the containing object's PT_DYNAMIC, whereas walking the link map gets us all of them.)

One caveat about symbol lookups is that only dynamic symbols are looked up. The other symbols, if there are any (not needed for dynamic linking, but useful for debugging), live in a different table, .symtab, which needn't be mapped at run time. Depending on how your object was linked, perhaps most symbols became dynamic symbols (--export-dynamic) or perhaps only a minimal selection did. If you've ever tried to look up stat() in the GNU C library, using dlsym(), and been surprised not to find it, this is why. The stat() function is one of a few that are always linked statically, even when the rest of the C library is linked dynamically. In general, not all code or data in our dynamic objects is actually described by dynamic symbols. I'll return to this shortly.

The auxiliary vector

The link map gave us a list of loaded objects and their base addresses, but it doesn't tell us where each individual segment is mapped. This is where we begin our voyage into the not-really-portable. In the case of the executable, we can get the program headers via another useful metadata structure: the auxiliary vector. This is created by the operating system to supply various information of use to startup code (typically the bowels of the C library) and the dynamic linker (which, if one is used, also mutates the metadata to make it look as though the executable was loaded directly). (If we failed to get the link map using the previous _r_debug symbol method, we can also use the auxiliary vector to get to the executable's DT_DEBUG, and hence to the link map.) Here's a very nice article all about auxiliary vectors.

The auxiliary vector lives high on the initial thread's stack, a fixed offset above the stack pointer when the program's first instruction receives control. If we could walk the main thread's stack right to the top, we could find the auxiliary vector easily. However, nowadays, since we lack frame pointers, that means making a library call to something like libunwind, and that might allocate memory. Even then, it's not guaranteed that we can walk the stack all the way to the top, since unwind information might be patchy up there. So we'll need a different approach.

I devised a nasty but ultimately pleasing hack for this. When we say “auxiliary vector” we really mean specifically a list of key-value pairs containing the useful metadata I mentioned. But there's also a bunch of other things up there, in a large contiguous structure (see the helpful article for a diagram): the environment strings (a blob of characters), argument strings (ditto), and vectors of pointers into these blobs (our initial environ and argv). Instead of walking the stack, can we get hold of a pointer into one of these blobs, and then parse the surrounding memory to find the start of the key-value pairs? It's very likely that we can, since we can expect to find a handy global variable, environ, that points into them.

Now, it's important to note that this isn't completely robust. The auxiliary vector only records the initial environment. A program can modify the current environment strings to its heart's content by using C library calls like putenv() and getenv(). These will mutate and, if necessary, reallocate the vector pointed at by environ. However, unless a program deletes its entire initial environment, and assuming the C library doesn't eagerly copy things out of it, that vector should always contain one or more pointers into the initial environment blob. So, we want to walk environ and look for such a pointer.

The next problem: how will we recognise such a pointer when we see it? For this, we need to fix some more very-likely-true assumptions. Firstly, we assume either that the initial stack is the highest user mapping in the process, or that if something is higher, it's a shared object (perhaps Linux's vdso) which we can identify in the link map. (Put differently: what's not allowed is for there to be a memory-mapped file above the initial user stack, other than a shared object. This doesn't seem too restrictive; we could even imagine an ABI-level guarantee from the kernel that it will never create such mappings.) This assumption gives us an upper bound to compare our environment pointers against. What about a lower bound? For that, we assume that the caller can supply us one, in the form of a pointer higher than any other stack-allocated string in environ. The address of a local variable in main() could be one such pointer. In fact, any address on the initial stack will probably be good enough, since putting a stack-allocated string in your environment would be a very quirky thing to do. Anyway, we suppose that our caller, the client code which wants to get hold of the auxiliary vector, can give us a such a pointer.

Now we're ready to go. We scan *environ for pointers to strings within these bounds. Then we navigate downwards in memory to find the end of the auxiliary vector, then keep going to find its beginning, after which we'll see the terminator of the argument or environment pointer vector.

Once we have the auxiliary vector, we have the executable's program headers, from which it's easy to get at most other things we need, again using the link map to access other objects.

Program headers

One challenge remains for us: getting at shared objects' program headers. The problem here is that these headers needn't, in principle, be found anywhere in memory. (The ELF format allows a file to require that its program headers are mapped, using PT_PHDR, but it's not required and its use is uncommon.) Typically a dynamic linker will always keep each object's program headers in memory somewhere, but it's not obliged to do so. Even if it does, that memory might be anywhere. The GNU dynamic linker sometimes stores just a malloc'd copy of the data, perhaps originally read through a file descriptor rather than a memory mapping. Either way, the pointer it maintains to the header data lives in private state (in extra non-public fields in the link map structure); there's no portable access to it. So, to get shared objects' program headers there's no avoiding the use of some platform-specific (and loader-specific) code to fish it out.

The major limitation of doing without program headers is that we don't have a complete map of the address space. The link map lets us enumerate objects and their start addresses, and we can then use the symbol tables to do a “forward lookup” from symbols to addresses. We can invert the symbol tables to give us a partial backward lookup from addresses to symbols. But we can't do quite as complete a backward lookup as the dladdr() function can. If you give dladdr() an address that falls within an object's mappings but isn't covered by an exported symbol, it can still tell you which object it's in. Only the program headers contain the information necessary for this. Another thing we can't figure out is which memory access permissions a segment was mapped with—again, that means looking at the relevant program header.

Maximising ELF introspectability

Armed with these observations, we could imagine dynamically rewriting the binaries on our system slightly to optimise for ELF introspectability. Firstly we would insert a PT_PHDR, and define a symbol (maybe _PHDR), roughly analogous to _DYNAMIC), to help us find it. Secondly, going back to the restriction that only dynamic symbols were available for lookup, we could export all .symtab entries as “hidden” dynamic symbols. The obvious objection to this is that it will slow down linking. However, symbol lookup by name happens via the hash table (DT_HASH or GNU_HASH). I haven't checked all the details yet, but it appears that adding them to the symbol table needn't necessarily mean including them in the hash table. There has to be a hash-chain “next pointer” per symbol, but it's permitted not to hook that entry into any bucket. Since link-time lookups occur via the hash table, leaving the hidden symbols out of the hash table seems to defeat the objection. They'd be present for a “deep” introspector to find, by scanning the symbol table. But they wouldn't interfere with the symbol lookup process, nor with code using dlsym().


If you'd like to use this introspection goodness, I've coded it all up in a single header, relf.h, which is fairly self-contained and should be self-explanatory. Please let me know if you find it useful!

[/devel] permanent link contact

Tue, 30 Jun 2015

A position on licensing research software

I have a habit of putting code on my github page without any statement of license. Sometimes this is oversight or laziness, but sometimes it's entirely deliberate. It's not because I don't want to open-source or otherwise license my code; I do. But, as this post covers in some depth, the Devil is in the details.

In particular, I believe that software whose purpose is scientific or researchy—to exist, to prove a concept, to be extended by other researchers perhaps, but not to acquire users—can be better off without an explicit license, at least while it remains relatively immature and has only a small number of contributors. In short, this is because there's little practical loss from not licensing at this stage, whereas by forcing a choice of license, it's possible to choose wrongly.

For some reason, this belief, at least when compressed into 140-character snippets, attracted quite a lot of opprobrium. So allow me to explain my justification and respond to the criticisms. Before I go further, I should add that I'm definitely not pretending to be an expert on licenses, law, tech transfer, aggregate properties of big companies' IP policies, or anything else I'm writing about. Everything you're about to read is informed by limited personal experience and anecdote. In fact that's how I got here: I'm very aware that licensing is something I don't fully understand. For me, that's a strong motivation for avoiding making premature commitments that I might later regret. You have the opportunity of enlightening me, and might even change my position—but first please understand what my position is and why I hold it.

So, here goes: I claim that early-stage scientific proof-of-concept software is often better off without an explicit license. Naysayers (some hypothetical, but mostly real) claim the following.

“But then people can't legally use your software?” They can, but they should ask first.

“But people rarely bother to ask.” That's fine; this software is not aiming at getting lots of users.

“But you might miss out on users.” I don't want users.

“But you might miss out on collaborators.” I doubt it. Collaborators have to communicate, not just consume. They get in touch. Or, more likely, I meet them through other channels than their browsing my repositories online.

“But people aren't allowed to look at it without a license.” No, that's not how copyright works—quite the opposite. The code is published, so you're allowed to look.

“But company policies mean employees won't be allowed look at your code, even though it's allowed by law.” I've worked at two large IP-paranoid companies, and in both cases employees would happily exercise discretion about whether to look at code lacking a permissive license. (This was true even if official guidance left no room for their discretion.) Dealing with third-party code was also a well-established part of corporate culture. Among other things, this makes these developers much more likely to ask for permission than others. During my three months at one such company, in a very junior position, I was encouraged to make enquiries not only about licensing third-party software that appeared useful, but also about perhaps contracting the developer to support our effort. This surprised me at the time. But being a corporate entity dealing in non-permissively-licensed IP makes you more amenable to dealing with other people's IP in whatever way is appropriate. As I'll cover shortly, the well-known copyright paranoia (including what I'll call “GPL terror”) arises because of cases where the license on offer doesn't suit the prospecting company, and where there's good cause to fear litigation.

“But if they look at your code, they risk creating a derived work, so could get sued by your employer!” Companies know the pragmatics of IP better than this. If I were talking about Big Company Labs's github page, I'd agree that explicit licensing would be highly desirable. Big companies tend to be litigious. In all cases, the question is: who is liable to get sued by whom? But we're not talking about that. The risk to a company from simply looking at my code (emphasis important) is miniscule. The risk to me and to the public good, from premature commitment to a given license, is non-trivial, because it risks closing off opportunities later.

“Why not GPL your code?” I'm not against doing so in principle. In fact, even though I'm told it's falling out of popular favour, the GPL is one of my favourite licenses. But if I GPL'd my code, many more companies' employees would be scared off. Many big companies' legal departments have spawned corporate policies which foster a terror of GPL'd code. Partly this terror is justified: much GPL'd code is copyrighted by large competing companies, who don't want to relicense it and do want to sue infringers. However, the terror is not justified in all cases. I'm not a competing company, I am amenable to relicensing, and I'm not likely to sue infrigers anyway. So companies have no reason to be terrified of looking at my code. Unfortunately, many company procedures are written in a blanket style, with rules like “don't look at GPL'd code!”. that are not sensitive to these nuances.

“How is not putting any license possibly better? It confers strictly fewer rights!” I know it's perverse. But since these “don't look at GPL” policies exist, the absence of any GPL notice would actually make it more defensible, according to these policies, for employees to look at my code in any depth. The real problem here is that, as far as my anecdotes can see, many corporate legal people don't even understand the basic properties of the domain. Software doesn't “have” a license; it is licensed and can be licensed differently to different people. I sometimes say that licensing is “bilateral”, in an attempt to convey this multi-party “edge, not vertex” nature. Being GPL'd doesn't exclude other licensing possibilities. Sadly, corporate legal types tend to act as though it does. They're not the only ones. Some of my peers appear keen to (as I'll call it) “licensify” all scientific software. They seem to be acting on similar misunderstandings. I've even seen it written that one should not use the CRAPL because it allegedly makes it “impossible” to re-use the software. If you actually read and understand the CRAPL, it does no such thing (notwithstanding that as a license it is basically pointless). Even if the CRAPL were restrictive, the kind of software it's intended for—scientific software, as in my case—would almost always be available for relicensing anyway.

“Why not say “GPL, or other licenses by negotiation”?” It's not a terrible idea, but again, lack of nuance among corporate legal departments makes me believe that this will scare people off unnecessarily. I could be wrong about that.

“Why not MIT- or BSD-license your code?” I'm not against doing so in principle. But as a blanket policy, I worry that it'd weaken my ability to successfully do tech transfer later. In particular, the option of dual-licensing code as useful if you have already gone with a permissive license. This might weaken the position of any company set up to do the tech transfer. As I see it, this is the key difference between BSD and GPL from a researcher's point of view. In particular, note that pursuing tech transfer can be in the public interest (and bigger competitors are likely to be less public-spirited than a small start-up founded by scientists). I've nothing against permissive licenses in general; they just feel like too great a commitment to make in the early stages of a project. I happily release non-research software under these licenses (or even public domain) from time to time, as you can see. And I'll happily contribute under these licenses to other projects that have an established tech transfer model. The key points are that my early-stage projects don't yet have this, that there's more than one way to do it, and that licenses make a difference to what options are available.

“So you might later close off your research outputs for your own commercial gain?” Not at all. I'm dedicated to using my research outputs to create the greatest possible public social good. The question is how to do that. Whether BSD-style (permissive) or GPL-style (copyleft) licensing creates the most social good is still a matter of debate. It almost certainly varies case by case. Meanwhile, two other social goods need to be considered: being able to do more research, and actually getting research results transferred into practice. The commercial model can be useful in both of those regards, even though, ideologically speaking, it's not the model I prefer. It would be foolish to jeopardise these opportunities at the moment I publish my code, especially since nowadays the norm is to publish code early (no later than publishing the paper). Very often, at that stage it won't yet be clear how best to harness the code for social good.

“Aren't you inviting people to take the unreasonable risk of using your code without a license, which is breaking the law?” No. Anyone concerned with staying within the law is welcome to contact me about licensing terms. Anyone worried that I might sue them is welcome to do the same. This is no hardship: anyone interested in running or building on my experimental team-of-one research code almost certainly needs to contact me anyway—if not to build it, then to do anything useful with it. There's also an important separate point, which I'll return to shortly: scientific endeavour is built on personal contact and trust relationships, far more than either commercial or open-source software are built on those things. Even in the case of people who don't bother contacting me, the “reasonableness” of the alleged risk needs to be assessed in the context of the scientific culture.

“But in practice, if everyone followed your approach, won't people end up breaking the law more often?” Perhaps so. Most people break the law just going about their everyday business. Few of my countryfolk over the age of 30 have not transgressed our laws in countless ways: copying a tape, keeping a recording of a TV programme for longer than 28 days, wheeling a bike along a public footpath, borrowing their neighbour's ladder without asking, backing up a shop-bought DVD to their hard disk, and so on. Fortunately, the process of law enforcement involves discretion at the point of prosecution. This discretion is kept in check by social dynamics which, while mostly outside legal system, are very real. If my neighbour prosecuted me for borrowing his ladder, it'd render future reciprocal altruism considerably less likely. Suing good causes creates bad press, regardless of whether it can legally succeed. The law is overapproximate out of necessity, so that the genuinely problematic infringers can be pursued, not so that absolutely all contravening activity can be shut down. It presents the option to prosecute, not the imperative to do so. That's not an argument for egregious overapproximation in our laws. Obviously, legal overapproximateness can be vulnerable to abuse, and should be minimised by a process of continuous refinement. But aspiring towards eventually reaching a perfectly precise law is a hopeless ideal. The law is a practical tool, not a formal system—it is both ambiguous and inconsistent. The letter of the law needs to be viewed in the context of the legal process—one carried out with the frailty and imprecision that come from human weakness, and the malleability that comes from human discretion. In short, we are doomed to living with a background level of lawbreaking, and it's in our interest to live this way. With licensing, the right question to ask is: how can our use of licenses affect scientific research, and how should it? This means thinking about process.

“But don't big companies have good reason to be conservative? Won't this harm your relationship with them?” Yes, they do, but no, I don't believe so. Conservative policies are justified in a corporate context. For big companies in competitive markets, the stakes are high, and the capital cost of pursuing legal cases is affordable. If an equally large competitor had such a case, it could turn into a very costly court battle (regardless of who wins). That's why big companies are paranoid about IP: they're afraid of competitors. But they're not afraid of small guys, like researchers or universities. If I had a credible case that a big company were infringing my copyright, the company would very likely settle with me out of court. In any case, as I covered earlier, if they want to use my software, they will want to negotiate terms with me anyway.

“Isn't your attitude socially irresponsible?” I believe I have a stronger social conscience than most people I meet. Everything I've just said is a position I hold in the interest of maximising public good, whether or not my beliefs are correct or effective. So no, I don't believe I'm socially irresponsible at all. Here are some things that I believe are socially irresponsible: to allow corporate capture of publicly funded research outputs; to make licensing commitments of publicly funded work without very careful consideration; to needlessly jeopardise the future potential to exploit publicly-funded work to public-good ends; to allow the practices of big-company legal departments to get in the way of scientific endeavour.

“Licences are good, m'kay?” This seems to be some people's position, but it's not an argument.

“All sensible software projects have licenses!” Software that seeks users is one thing. I've already covered why I'm talking about something else.

“Why are you resisting doing the normal, “responsible” thing?” My situation isn't normal. In this abnormal situation, the normal thing is not necessarily the most responsible thing.

“GPL is not free, troll troll...” I'm trying not to get into that. But I do suspect that this whole debate is really GPL versus BSD in disguise. If you don't believe in copyleft, then the only open-source licenses you're likely to use are BSD or MIT, and there's not much useful flexibility to be gained by deferring the decision. So you can just license all your code straight away. BSD in effect chooses a fixed attitude to commercialisation (“anything is fine; what goes around comes around”) whereas GPL tends to create a fork in the road. I want to hang on to my option on that fork, until I know that doing away with it is the right thing to do. The fact that the permissive-versus-copyleft debate never dies is a powerful argument against coming down firmly on either side. It's better to take decisions in context, case by case. That's exactly the position I'm defending. Note that Bruce Montague's article advocating the BSD license model for tech transfer is right if you're talking about big projects—it's harder to commercialise GPL'd code than BSD'd code, and deliberately so—but completely wrong for most small research projects written by a small group of collaborators. In those cases, it's much easier to relicense the lot, in which case your options for transfer include a dual-license arrangement in the manner of Qt or MySQL. This still permits GPLing of the ongoing commercial development (or not), while retaining the option to license it for commercial development. It says a lot about how subtle the issues are that a simple tweak of context can more-or-less invert the up- and down-sides of the two licenses. (I actually find Montague's article quite fuzzy, even FUDdy, about the details—it only really says that BSD is “simpler”, which any old Einstein should be quick to challenge.)

“I once had licensing trouble, bad enough to require lawyers. Therefore, clearly explicit licensing is essential!” This is a complete non-sequitur. I don't doubt how problematic these things can get. But it doesn't mean that any blanket policy will improve things. If IP lawyers have come into a scene where science is supposed to be happening, it means something has already gone wrong. The culture clash is between science—where trust, good faith and cooperation are routinely assumed—and corporate licensing, where you can't trust anyone beyond what their contract or license binds them to. In short, this happens when big companies get involved and impose inappropriate licensing terms. These companies are where you should be directing your opprobrium—not at me.

To properly explain the latter point, I need to ditch the question-and-answer format (which was already becoming tedious, I'm sure). The argument came up in reference to the Richards benchmark. Let me summarise my understanding of it (corrections welcome). Martin Richards wrote and distributed some benchmarking code. It was distributed as a tarball, in the scientific spirit, with no license stated. It turned out (later) that he was happy to release it into the public domain. Trust, goodwill, reputation, responsibility, ethicality are currency among scientists. Martin's release occurred within that culture. My github repositories do likewise.

Some time later, Mario Wolczko produced a Java version of the Richards benchmark, while working at Sun Labs. Working for a big company makes a person more IP-fastidious than average. Mario very conscientiously sought permission from Martin (perhaps unnecessarily—I'm not convinced he was actually making a derived work, but perhaps he was; no matter). He was of course granted Martin's blessing. He circulated his new code, including to Martin. Martin even began distributing it in his tarball.

The catch was that Mario's employer was a large company, and would only allow the code to be released under some restrictive licensing terms. You can still read them. They come in two parts; the first page is statutory US export regulations while the second page contains Sun's own licensing terms. These include the following restriction.

“Licensee may not rent, lease, loan, sell, or distribute the Software, or any part of the Software.”

This is a severe restriction. It's especially inappropriate for a work of this kind—a benchmark. Benchmarking is a concern that cuts across research areas: it's not just people working on a particular technique or problem that want benchmarks. Anyone working on any aspect of a language implementation might want to apply some “standard” benchmarks. A benchmark is more useful the more things it has been applied to. That's why we often like to group them into suites and circulate them widely. This license is effectively incompatible with that use.

I'm sure Mario tried his best against the might of Sun legal. It's telling that his current web page advises the “caveat emptor” approach: make your own mind up about how likely you are to be sued, and act accordingly. I'd agree it's not good at all that we are being forced to take this approach around a big, litigious company (as Sun's new owner Oracle certainly is). But it seems bizarre to blame this on Martin's original sans-license release. The blame is entirely on Sun. The problem is not a lack of license statement from the little guy; it's an inappropriate license imposed by the big guy. The right response is to decline to use Sun's code. (That said, it's worth noting that Martin's redistribution of the Sun code, although technically a breach of the license terms, has not resulted in Sun or Oracle suing him. So perhaps they, too, understand the value of scientific culture and have no interest in disrupting it.)

To put all this more succinctly, the onus to license doesn't weigh equally on all parties. For artifacts coming out of big players, there's good reason to demand licenses—not just any license, but fit-for-purpose licenses—precisely because big players can afford to sue. By contrast, even the University of Cambridge is a comparatively little player. It is vanishingly unlikely to sue anyone pursuing public-good scientific or research agendas, not only because it's expensive to do so, but because the reputational damage would be huge. And even if you disagree that this particular university isn't a big player, that's immaterial because it doesn't own my research work's copyright anyway; I do.

I, like Martin, work within the scientific culture as a non-litigious “little guy”. It should go without saying that redistribution of my software towards scientific ends will always attract my blessing, not lawsuits—even if people don't ask first (mildly rude as that may be). Meanwhile, for commercial use, I'd like to defer fixing the exact terms until there's serious interest—so that I have some idea of how it's going to be used, and can choose terms appropriately. The pragmatics of “no license specified” and “restrictive license specified” are wildly different, even though a hard-headed legal interpretation makes the former a degenerate case of the latter.

The push towards “licensification” of scientific software actually works against this culture, since it erodes the norm that says that permission will be granted for doing reasonable things for scientific purposes. That's yet another reason why I resist this way of working. Licensification is also viral. If you believe strongly that all released artifacts should “have” “a license”, then you have to follow this to its transitive closure: to release anything that builds on anything, you must obtain a precise statement of license from all derived-from works. This is justified for commercially valuable software, but I don't believe it's the way science needs to work. And if you share my belief that premature commitment can do more harm than good, then we agree that science ought not to work that way.

You might complain that this “scientific culture” I've been talking about is nebulous. I'd agree. But capacity and inclination to sue are reasonable proxies. Call me naive, but if you really worry about the legality of doing science with artifacts released publicly by a reputable scientist working for a university, either you worry too much or the world has gone very wrong. On the other hand, it is reasonable to worry about artifacts released by big companies, because the culture they operate in is very different. That's why we need to pressure them into offering reasonable terms for the things they release, and decline to use their artifacts if they don't. That's what conspicuously failed to happen in the Sun/Richards case.

As more and more scientific research involves code, licensing of code among scientists is going to become much more of an issue. Misunderstandings are already rife: a quick bit of web searching will easily turn up things like this and this. We need to have a frank discussion about what the real issues are, and an open acknowledgement of their subtlety. I believe that blanket licensification is not the answer. It forces a decision to be taken too early, when too little information and expertise are available. I also fear that it will encourage a culture of litigiousness and bureaucracy that is neither desirable nor affordable within science.

As one tentative suggestion for how to make the law work better for us, I note that US copyright law's “fair use” allowance already covers educational purposes. Ideally it should also cover public-good non-commercial scientific purposes. In fact, this already is a factor in assessing fair use. But it's ill-defined, and I'm not aware of any useful case law on that front (are you?). If we want something to work towards, this kind of fair use law is what it should be—not the up-front licensification of every published scientific artifact.

Just as a coda: the above is what I currently believe, based on limited study and my very limited awareness of evidence, case law and the like. Although that belief has withstood a certain exposure to others telling me how wrong I am, it's not a conviction I'm strongly attached to. I went to the effort of writing the above not because I'm convinced I'm right, but because I don't like being branded “socially irresponsible”. Challenge my argument if you like, but don't doubt that I treat this issue with due care and attention. I'm still waiting for the counter-argument that will make me change my position. You might well have it. If you do, please let me know.

[/research] permanent link contact

Tue, 26 May 2015

Polymorphism and observability

[In an earlier post, I talked about debugging, and more generally “observability”, in ML-family languages. Later, I also clarified what I think by “polymorphism” most usefully means. This post explores the less obvious relationships between polymorphism and debugging.]

When I talk to language-minded people about debugging code in ML-family languages, they tend to think that the difficulties are something to do with “the type system” and “polymorphism”. This is only half right. Polymorphism does complicate debugging, but it does so in every language, even ones with no “type system” of any kind. (As a sanity check, ask: does BCPL support polymorphism? The answer is clearly yes, at least according to my earlier definition.)

The axe I'm grinding in this series of posts is that “polymorphism” or “type-mumble” are no excuse for a lack of decent observability in ML-like languages. There are no major obstacles to implementing a nicely debuggable ML, and certainly none regarding polymorphism. Intuitively, this makes sense if we remind ourselves that polymorphism is to do with abstraction, whereas observation is done on the concrete: we're observing a concrete program state, laid out in front of us. (Of course, there are some unfortunate decisions taken by existing implementations of ML-like languages, that make retrofitting debuggability more difficult than it might be. That's a very different problem!)

A similar viewpoint explains why other kinds of fancy types present no obstacle. Even using fancy features of OCaml or Haskell, like GADTs or type classes, our program state still boils down to a big pile of atoms, sums, products and functions. The innovations in “type systems” have been in reasoning, specifically about safety. This is a question of dynamics: checking at compile time that invalid structures will never be constructed at run time. Observability isn't concerned with dynamics; it's about looking at the present, not the future. All we want to do is decode a static snapshot of the program. (Here I'm excluding the debugger feature of “altered execution”, i.e. enacting debug-time side-effects on the debugged process. How to do this safely is an interesting question, but I'm not going to dwell on it here.)

Can we expose more clearly why polymorphism isn't a problem? As I covered last time, “polymorphism” is a fancy word for deferring specialisation. Specialisation can be done in a compiler or at run time. At run time, specialisation means execution: execution is a specialisation process that culminates in the program's result. We can also think of this process as “decision-taking” in response to input.

Polymorphism during execution, witnessed in program snapshots, is very different from polymorphism in source programs. In source programs, the whole program is oriented around an unknown future: programs describe dependency on an “input”, supplied later. By contrast, observing a program state at run time is about decoding a present, not a future. Moreover, to help us do that decoding, we can exploit all the decisions that have been taken so far, i.e. all the specialisation that has occurred, during both compilation and execution, to reach the present state. Some of this specialisation can be called “monomorphisation”, because it has taken generic code and applied it in a specific context.

As before, I'll focus on OCaml. The OCaml compiler turns polymorphic source-level functions into generic run-time function objects (instruction sequences). Similarly, for polymorphic data types in source code, the compiler selects a size and layout, independent of any type variables that might be parameterising the definition. As we would expect, this is achieved using indirection: the fixed size and layout ensure that locally, storage can always be allocated and accessed generically. The specialised part of the data—for example, the payload of a list node—is indirected away, using pointers. If we were talking about C, these pointers would be pointers to void. OCaml's source language of data types lets us be more precise about these, by introducing type variables. But that's a meta-level bonus that helps us reason about dynamics. A snapshot of an OCaml program still reveals it as consisting of allocated objects and pointers between them, just like in C.

Viewed as source code, a program consists largely of functions and data types. But during execution, we have other things too: activations of functions and instances of data types. It's usually these that we want to inspect when debugging. For example, a backtrace is a list of function activations. The heap is a collection of values—instances of some type. (It also contains closures, which are a bit of both; I'll ignore them, for simplicity, but you can ask me at the end.)

Here is the key observation about polymorphism at run time. Whenever a polymorphic function is activated, or when a polymorphic data type is instantiated, some instantiation of its type parameters is morally decided. “Morally” means that we could define an oracle, using the semantics of the language, that tells us how they are being instantiated. For example, it could tell us that at some given allocation site, we're creating a list of int rather than a list of 'a (whereas the latter is all the source can tell us). Exactly what this means, however, is slightly subtle.

One subtlety is that the code doing the allocation doesn't necessarily know what this instantiation is. That code might itself be generic! So maybe we're building an int list list out of some int lists. The code doing this might only know it's building an 'a list list, but our oracle would still tell us that the allocation “morally” has the more precise type int list. Another subtlety is that, of course, there's no guarantee at the implementation level that our runtime is helpfully defining any such oracle for us. Nor need the compiler have provided us any output that would help us implement the oracle. In the case of OCaml, these are definitely not true, and that's precisely why it's difficult to add debugging support to the current OCaml toolchain.

Another subtlety is that the “instantiation” does not necessarily yield something free from type variables. Although our int list list example got rid of all the variables, in some other cases we might find the best we can do is to instantiate 'a with 'b -> 'c, say. But this turns out not to stop us from observing anything we might logically be able to observe. I'll return to this shortly.

One way to make OCaml debuggable might be to directly implement this oracle, by maintaining extra state at run time. Whenever we call a polymorphic function or instantiate a polymorphic data type, we could stash information somewhere that explicitly records how the type parameters are being instantiated. Something quite similar was done a while back in the HashCaml project. Unfortunately, it's a fairly invasive change to the compiler. It's likely to meet resistance via a performance argument, which you can think of this as the “frame pointer” debate but for type information. Pushing around extra information creates a bit more pressure on registers and memory, so typically shaves a few percent off performance. In return, we make observability massively more straightforward. Apparently opinions differ on whether this is a good trade. All I'll say is that if frame pointers are good enough for the Linux kernel, they're good enough for me.

Instead of tracking allocation types up-front, one could do a deeper analysis to recover the same information on demand. If we assume we have a backtrace, that gives us a powerful chunk of context: it describes a nest of function activations. The top-level activation (which would be main(), if OCaml had main()) is always monomorphic, so we should be able to figure out all the subsequent instantiations all the way down the stack. Or, we can flip that around: starting from a given activation, we should be able to figure out any type variable instantiations by looking some distance up the stack, and in the worst case, all the way to the top. Currently this is what my OCaml-implementing colleagues prefer; they expect it can work by looking no more than a few frames up the stack in the common case. The logic involved is basically the same as that of the compile-time type checker—which now needs to be replicated in the debugger and/or the language runtime. That's an annoying chunk of replicated stuff, which I find distasteful. Also, this inference might be expensive—fine for an interactive debugger, but poor for other applications of run-time type information (like serialization routines or a tracing tool, say). The advantage is that it requires fewer changes to the compiler.

A third option would be to relax our aim of recovering source-level types. In practice, we don't necessarily care, at debug time, that we're looking at an int list. It might be enough to look at each list node individually, seeing that it's a Cons, and then, separately, discover that each Cons points to an int. In this way we've avoided “typing” at the same granularity that the OCaml language does typing, but we've still recovered a somehow “typed” view of the program (i.e. one interpreted in terms of source-level data types). Put differently, source-level types like list encode information about a whole structure, spanning multiple allocations. Perhaps all we really need is a piecewise, per-allocation view. Currently, OCaml's tagged-pointer implementation ensures that at word granularity, we can distinguish integers from pointers. That's not enough, because we can't, say, distinguish the first variant of type T from the first variant of type U, nor from the integer 0: all are encoded as a zero word. But if we add local tracking of ADT variants and a few other things, that might be enough for observability purposes, and would be less invasive than a full HashCaml-style solution. I find this promising, although I'm still working through the consequences.

Suppose we stick with our oracle-based approach, tracking a source-level type for each allocated value. There seems to be a complication. I mentioned that type parameters are decided at instantiation points. but also that we might only be deciding that 'a becomes 'b -> 'c, say—we're not fully monomorphising them. This makes sense, and just reflects the nature of functions. Suppose we have a list of functions. A perfectly valid such list might contain the list head function hd. That's a generic function of type 'a list ->'a. When we instantiate our 'a list to one that can hold this function, we've specialised type parameter 'a to 'b list -> 'b. Our list is still polymorphic: we haven't got down to a monomorphic type. Does that mean we're lacking the ability to observe something in our program state? The answer is a resounding “no”! I mentioned that when debugging, we're looking at the present and not the future. The polymorphism in hd encodes the unknown future: we don't yet know what types of arguments the functions in the list will be applied to (it hasn't happened yet!). So, these polymorphic-at-run-time values in our heap represent the residual genericity in delayed computations, i.e. in functions. Functions encode things our program hasn't done yet, but might. They don't present an obstacle to decoding the current program state. In practice, any function has a name, even if it's a fake one generated by the compiler from the source code coordinates of a lambda. If we're in a debugger, getting the name of that function (or those coordinates) is comfortably good enough.

There's a final apparent complication. What about the empty list, or the null pointer? These seem to be polymorphic values. But unlike functions or data types, they're not going to get specialised further by activation or instantiation. The simplistic answer is that these values are okay because they're degenerate cases. It's not a practical loss of observability at run time if we can't answer the woodchuck-esque question of “what type of non-empty list would this empty list be if it wasn't empty?”. A more subtle answer is that these values aren't really polymorphic at all. If we think of how we would define the data type 'a list, we see that the Nil constructor, viewed in isolation, isn't polymorphic—it doesn't use 'a. In the context of this constructor, 'a is a “don't care” or ignored argument. An unparameterised constructor is only vacuously polymorphic: its meaning doesn't actually depend on the parameter. (This view, which sees constructors as somewhat independent of the type definition that encloses them, is one which OCaml's polymorphic variants directly build on.)

Finally, I alluded to some parallels with C. Just as the pointers which allow generic layouts for ML data types are equivalent to void pointers, so we have a similar problem when debugging C code: what's on the end of a void*? If I'm looking at a generic linked list node in my debugger, say, the debugger won't let me follow the pointer to the payload data. For that, we would need some run-time service that can look up some metadata about arbitrary memory locations and tell us what's stored there. Java-style VMs solve this problem using object headers. Clearly we don't have this in C; we need some extra infrastructure to answer these questions. I've been working on it: it's called liballocs. By dynamically tracking allocations in our running program, and using some carefully crafted associative data structures, we can build up a fast mapping from arbitrary pointers to metadata about the pointed-to allocation.

In fact the reason I got interested in this topic was that I wanted to make liballocs understand allocations made by OCaml programs. One of the complications liballocs has to deal with is polymorphic allocation sites. These sometimes occur in C code. For example, we might malloc() an array of generic void*, say, but actually use it to hold some specific kind of pointer. Genericity like this is occasional in C, and commonplace in ML. But there's no fundamental difference: code can be generic regardless of whether our source language includes a type language for describing that genericity. Genericity itself is what makes debugging tricky, because it indirects away concrete details in a way that some implementations (both C and OCaml) make hard to recover at run time. The presence of a fancy type system isn't the problem.

[/research] permanent link contact

Fri, 27 Feb 2015

Talking about polymorphism

[In my last post, I talked about debugging, and more generally “observability”, with reference to ML-family languages. This post is continuing the theme.]

Before I go into what's different about polymorphism between compile time and run time, I need to get some terminology straight. What is polymorphism anyway?

People traditionally talk about “ad-hoc polymorphism” and “parametric polymorphism”. These terms go back a long way— to Strachey in the the mid-1960s. They're unhelpful today, for two reasons. Firstly, and most trivially, some flavours of apparently non-parametric polymorphism are nevertheless rather systematic, so ideally shouldn't be called “ad-hoc”. Type classes in functional languages and method overriding in object-oriented languages both illustrate this.

Secondly, calling polymorphism “parametric” confuses people into thinking that if code doesn't have type parameters, it's not parametrically polymorphic. But, to me at least, this kind of polymorphism is not about how code is presented; it's a deeper property, about how code is expressed. It's easy to find code expressed in apparently polymorphic ways in all languages—even ones where there is no additional metalanguage of types to describe that polymorphism. Some C code is chock-full of such polymorphism—take Linux, for example. In such code there are no type parameters, only void. Similarly, one of the main strengths of dynamic languages is that code often comes out with some degree of polymorphism by default. If you write a sort function, say, you tend to get something that can operate on any kind of ordered sequence.

So although “polymorphism” sounds fancy, it's really just talking about delaying specialisation. If we're write code in a way that avoids commitment to details that would specialise it to one kind of data or another, we're writing polymorphic code. Fancy type systems are a means of reasoning about polymorphism, hence enabling static safety guarantees about polymorphic code. But they're not what enables polymorphism itself.

(I am probably departing from Strachey at this point, although it's not clear. His presentation doesn't directly address the question of whether polymorphism is really polymorphism without a metalanguage to describe it. For the purposes of my subsequent writings, I declare that it is, and that the metalanguage is a separate concern. Even if you disagree, it won't affect the substance of anything that I'm about to say; only which choice of words is an agreeable way to say it.)

Expressing code in a way that defers specialisation invariably means introducing some kind of indirection. Here I mean “indirection” in a very general sense. Run-time indirection via pointers is certainly one very familiar way of introducing polymorphism: a list node having a void* payload has avoided specialisation, whereas one with int hasn't. But there can also be compile-time indirection, such as between an identifier and its binding. And then there is also run-time indirection across not only storage, but also computation. (The latter is the trick which, when applied to its fullest extent, gives us objects in the sense of Cook.)

Somewhere along the line I have stopped using the qualifier “parametric”. In fact, the view I've just expounded—that polymorphism is the deferral of specialisation—covers ad-hoc polymorphism too, if we look at it the right way. When we say a source language supports “ad-hoc polymorphism”, it means that it lets us group together collections of specialised definitions. Such a collection of definitions, viewed as a unit, start to look like a single, unspecialised definition—in other words, a polymorphic definition. Again, indirection is what enables this polymorphism. If I write the token “+” in a language with overloading or type classes—Haskell or C++, it doesn't matter—it denotes the whole collection of defined addition operators. The code thereby avoids directly selecting any particular one. Typically, the selection is done indirectly. Usually it happens in the compiler, since the language defines some rules for selecting which one will actually be used. But it might happen very late in compilation since these rules might allow dependencies on the code's context of use (think how “+ is resolved in a C++ template definition, for example). And all this could equally well happen at run time; multimethods allow us to do this for binary operators like plus, while the usual single dispatch is implementing precisely the restricted unary case: deferring specialisation, on (only) the leftmost argument, until the execution of each individual call.

If polymorphism is this incredibly general thing, we need some way of getting down to specifics—like the question I started with, about what “polymorphism” is present during execution of compiled OCaml programs on a particular OCaml runtime. One thing that helps is to qualify any mention of polymorphism with a particular time: perhaps compile time, perhaps run time. At any given time, we can describe a polymorphic entity (perhaps a definition in source code, perhaps an object at run time) as somewhere on a spectrum between strong “specialisation” and strong absence of specialisation, which we can happily call “genericity”. (I say “strong” not “full” because the logical extremities of this spectrum tend to be degenerate, like “the output of the program for given input”—the most specialised state!—and “not written any code yet”—the most generic possible state.)

This “one time” restriction forces us to distinguish a source-level view from a run-time view. It's clear that this distinction matters. Polymorphism in source code can sometimes be implemented by specialisation before execution. C++ templates are the classic example: I can define a generic template, but the compiler will implement it by generating lots of specialised instances. Meanwhile, certain whole-program optimisations in ML compilers, like MLton, do effectively the same. This kind of compile-time specialisation is sometimes called “whole-program monomorphisation”. Just as “polymorphism” is a fancy word for the deferral of specialisation, “monomorphisation” is a fancy word for applying some kind of specialisation.

Polymorphism only defers specialisation; it doesn't avoid it. As I'll talk about more in the next post, all polymorphism eventually ends up being specialised away somehow. What varies is where and when this specialisation occurs. It might be done early, in a compiler, or late, during execution, by “elaborating” a particular trace of machine instructions in a particular machine context—i.e. by actually running the generic code! In the latter case, each execution is “specialised” in the sense that it runs in a different machine context and so produces a different outcome. Either way, execution is a gradual process of specialisation, entailing the gradual removal of polymorphism.

Phew. Perhaps that was a lot of belabouring the point. But it certainly helped me clarify (to myself) in what ways “polymorphism” might be present in ML code at run time. I'll revisit this precise question in the next post.

[/research] permanent link contact

Fri, 20 Feb 2015

Putting observability first

[Update: this article has now been translated into Russian! Thanks, Vlad.]

Last week I had a fascinating conversation with Mark Shinwell and Stephen Dolan, two colleagues who know the OCaml runtime inside-out. We were talking about ways to make compiled OCaml code observable: making it support interactive debugging and various other dynamic analyses (notably memory profiling).

It turns out that although this is not impossible, it is very difficult. It's fair to say that the OCaml runtime has not been designed with observability particularly high on the wishlist. It's far from unique in this: ML-family languages are usually not implemented with observability in mind. In fact you could argue that ML is explicitly designed for non-observability. The orthodoxy of the day held that tagged storage was the naive implementation technique, and erasing tags at run time was a good thing—mainly since it enabled a modest performance gain. Moreover, this meant that the provable erasability of tags came to be considered a valuable mathematical result, about language designs in general and ML in particular. The lingering side-effect is that this erasure also removes the ability to decode program data straightforwardly from its in-memory encoding. In the absence of some other mechanism—which mainline OCaml doesn't currently have—this greatly compromises observability.

ML is not alone in this. Although I'm not so familiar, I'll wager that Haskell, in GHC, suffers similar observability limitations—or perhaps worse, since it does more extensive compile-time transformations, and these cannot easily be “seen through” at run time. This makes it even harder to explain the state of an executing program, seen at run time, in terms of the source code. Also, I'm not singling out functional languages. Java has serious flaws in the observability mechanisms exposed by its virtual machine, as I once wrote a short paper about.

C, as usual is an interesting case. On the surface, it appears to be designed for little or no observability. It certainly doesn't generate much run-time metadata. This omission could be seen as an ML-like erasure of implicit tags. Yet this is not an accurate representation. Pragmatic working programmers, like Ritchie and Thompson, know well the value of interactive debugging. The first edition of Unix included db, an assembly-level interactive debugger mostly used on coredumps. By the eighth edition two debuggers, adb and sdb, were documented in section 1 of the manual, the latter supporting source-level debugging, while a third debugger (pi; built on Killian's procfs) was bundled in the extras which would become Plan 9. More modern debuggers have kept pace (more-or-less!) with more advanced compiler optimisations, using complex metadata formats like DWARF to describe how to see through them. (This isn't bulletproof, and has its own problems, but works remarkably well for C.) The result is that C code has been consistently highly observable—albeit not without forty years' continuous toil to co-evolve the necessary language- and system-level infrastructure.

This co-evolution is interesting too. The mechanisms for achieving observability in C code lie outside the language. Coredumps and symbol tables are system mechanisms, not language mechanisms. Observability in Unix has been part of the design, down in the fabric of the system; it is not an afterthought. An opinionated system designer (okay, I'll step forward) might claim that observability is too important to leave to language designers. There are, however, some gaps to plug and some mindsets to bridge in order to apply Unix-style debugging infrastructure to ML-like languages. In another post in the near future, I'll dig a bit deeper into OCaml, by considering polymorphism (and why it's not quite the problem it seems to be).

[/research] permanent link contact

Mon, 02 Feb 2015

Thoughts on peer review

My PLDI submission was rejected. I'm not too sad about this, since the reviews were basically encouraging, and on balance I agree that the paper can still be improved and was arguably not (quite) ready yet.

However, despite certain innovations, namely two-phase review and rebuttal, the conference review process is as creaky as ever. This is rather dispiriting, and the longer I spend doing computer science research, the more bizarre our system seems. I filled in the anonymous post-conference survey, but since my comments apply more widely than to a single instance of a single conference, I feel like forgoing my anonymity and re-posting them here (lightly edited). I hope they don't have too much of a sour-grapesy flavour—at least, that's not really how I'm feeling.

Something I didn't elaborate in my survey response: what would this “progressive model” be? It might involve reserving a smaller part of the programme for short presentations of recent journal publications, and a larger part of the programme for short presentations of high-quality in-progress work. This would be work at a substantially more advanced stage than a typical workshop paper—possibly reasonably mature already, but not yet accepted for publication in a journal. Initially, it seems important to ring-fence the slots for the latter kind of presentation, to avoid having already-published work always trump the not-yet-published stuff. Eventually, in a well-functioning community there would not be much call for the former kind of slot, since most journal publications would describe things that had been presented at previous conferences.

[/research] permanent link contact

Powered by blosxom

validate this page