Diverting trains of thought, wasting precious time
Before I begin, here's an executive summary. Metacircularity is not about self-interpretation at all. Rather, it's an engineering approach to re-using as much code as possible between different parts of a toolchain (including compiler, runtime and debugger). This is noble, but limiting ourselves to working in a single language is needlessly restrictive. If we get over our presumptions about“language barriers” (cf. Oracle's disappointing attempt at explaining metacircularity), we can apply the same re-use philosophy to supporting a diversity of languages, not just one.
I've recently found myself wanting to understand the concept and purpose of metacircularity in language implementations. This is because I've become interested in understanding the Maxine JVM, which is often described as having a metacircular design.
All this meant to me at the time was that it's written in Java---making it a self-interpreter, certainly. But is it meta-circular? What does that mean? Why might it be a good thing? As I will show, if we're even just a tiny bit pedantic (which I hope we are), then metacircularity is not really a special case of self-interpretation, but a completely separate concept.
I've found two very helpful papers describing metacircularity. The first is that of Chiba, Kiczales and Lamping talking about the “meta-helix”. The other is Ungar, Spitz and Ausch describing the Klein VM, a metacircular implementation of Self. The first paper really emphasises the idea of implementation by extension which is at the heart of metacircularity. They note that use of metacircularity [is] “to allow extensions to be implemented in terms of the original non-extended functionality”. As the paper goes on to discuss, there is a tricky bootstrapping problem inherent in this. If we don't keep careful track of the dependencies between all these extensions, subtle and not-so-subtle bugs, most obviously infinite recursions, can arise. The paper is about avoiding confusion of metalevels, and as they propose, the shape of a helix, not a circle, makes much more sense in describing what supposedly meta-circular systems are actually doing.
The second paper, by Ungar et al, is more of a practitioners' view: it shows what VM builders consider to be a metacircular design, and what they hope to achieve by it. After reading these two papers, reading some other things, and scratching my head a lot, it became apparent the primary goal of metacircularity is to solve two very practical engineering problems concerning re-use: re-use of code and re-use of debugging tools. They mention the code re-use issue directly, by saying that in traditional designs “an operation such as array access must be implemented once for the runtime, once for the simple compiler, and once for the optimizing compiler”. The question of tool support is also an explicit motivation in the same work: they lament that in the non-metacircular Self VM, “to inspect a Self object... [in] an application that has crashed the VM, one must invoke a print routine in the VM being debugged, [an approach of] dubious integrity”, that the VM “must be able to parse both Self and C++ stack frames”.
So, perhaps prematurely, I'd like to propose a new characterisation of metacircular VMs that I think captures their true nature. Metacircular VMs are interpreters (in the general sense) that are carefully constructed to have a dense re-use structure. They do this by expressing as much as possible---front-end, compiler, runtime---as extensions over a small common core. This core has some interesting properties: it is designed explicitly for extension, and includes the necessary foundations of debugger support. It is the former which allows code re-use, and the latter which enables the same tools to see all the way up the stack, from various levels of VM code up to application-level code.
Note that this is fundamentally different from a self-hosted compiler. Under self-hosting, the compiling compiler is not re-used at all in the compiled compiler. It is just used to implement a translation function, from end to end; how it does it is completely opaque. By contrast, in a metacircular VM, you can invoke your hosting runtime to perform part of your work---asking it do “do what you would do in case X” by calling the corresponding function (or sending the corresponding message, for Smalltalkers). The trick is to ensure that these helper requests are correct and well-defined, meaning they do not cause infinite regress (the obvious bug) and do not confuse meta-levels (the more subtle bugs mentioned by Chiba et al).
As a consequence of this fundamental difference, the key challenge of metacircularity is not just that of “implementing language X in language X” it's dealing with the bootstrapping problem. What is a suitable common core? How can we make it small? What extensions must it permit, or may it permit? How can we structure the extensions on top of one another, so that they can express what we want, re-using what we want to re-use, and efficiently?
So, we've established that “self-interpretation” is irrelevance. But it seems that most metacircular designs are, in fact, self interpreters, right? I actually consider this to be false. Even when staying within “one language”, the fundamentals of the bootstrapping process means that at a given level in the interpreter, certain language features may only be used in restricted ways. Sometimes these restricted subsets are given a name, like “RPython” in the PyPy project. In other cases, they are not named. But in all cases, there are restrictions on what functionality at some level in the system it is is safe and meaningful to invoke at the metalevel. Indeed, this is exactly the “helix” shape that Chiba et al were describing. In other words, different parts of the interpreter are written in different sub-languages, precisely in order to avoid infinite regress. Just because there is continuity between the core language and the eventual top-level language doesn't make them “the same”, and for this reason, metacircular VM designs are not self-interpreters.
If I were to write a ranty summary of the above paragraphs, it would be that the apparently “beautiful”, head-twisting, recursive, quasi-mathematical aspects of the metacircular design---the things which language nerds get excited about---are both irrelevant and illusory. Metacircularity is motivated by engineering pragmatics, not “deep” linguistical or mathematical concepts. (Homoiconicity, itself a concept of overrated interest, is an orthogonal concept to metacircularity, despite what at least one blogger has written.) I believe this fixation with superficial observations about language stems from the documented inability of many programmers to divorce concepts from language. (For “documented”, I can say at least that Lamport has commented on the problem, and in this case I agree with him. I have big disagreements with other parts of the same article though. I will post about those in the near future.)
So, having stated that re-use is the good thing about metacircularity, why is re-use so much easier in a metacircular design? The reason is that we have a common core providing a coherent and adequate set of services---the services embodied in the “bootstrap image”. And I say “services” and not “language” for a reason. The core really is a set of runtime services. As I have explained, is only a distant relation of whatever high-level language the author is intending to realise. In our current technology, re-using code and re-using tools across languages is hard, and so “build everything in the same language!” seems like a useful answer to a VM author's problems of API-level interoperation and of tool support. Metacircular designs are the result (because it's the closest you can get to doing everything in one language). But as I've just described, the “same language” property is an illusion, and there are inevitably many languages involved. It just happens that in current projects, those languages are designed to be as similar as possible to one another---featurewise increments, in effect. But instead of this unimaginative perspective, anyone building a metacircular VM should ask themselves: how can I design my core services---the core of the VM---to support as many different languages as possible?
This will sound familiar to anyone (so, hmm, maybe ten people on the planet) who has read my “Virtual Machines Should Be Invisible” paper. Although it doesn't approach the problem from a metacircularity perspective, this paper is all about building an infrastructure that can support a diverse variety of languages, sharing code and tools between all of them.
Currently, our shared base infrastructure is a POSIX-like operating system. Every VM author (even those interested in Windows, which I'm cool with) implicitly targets this infrastructure. Unfortunately, these systems don't provide enough abstractions. As such, different language implementors build their own infrastructure which reinvents similar abstractions in incompatible ways---including functions, objects, garbage collected storage, run-time self description, exceptions, closures, continuations, and so on. We can clearly avoid this pointless diversity without sacrificing innovation. Just as with operating system interfaces, there is never complete quiescence or consensus, but we still manage to share a lot more software between OSes than we did in the pre-Unix or pre-POSIX days.
One of the mitagating techniques which my VMIL paper describes but which metacircular designs don't use is: describe your implementation decisions. Don't encapsulate them! If you implement a certain language feature a certain way, describe it. There is nothing fragile about this, because your descriptions will be written in a standard way and consumed by an automated interpreter---called a debugger. This is what native debugging infrastructure does. VM-hosted debuggers, of the Java or Smalltalk flavours, don't do this. To make the value of this approach clear, let me finish with another example from the Ungar paper, where they proudly state that Klein VMs can be debugged remotely, and in a post-mortem fashion, using another Klein or Self VM. “A separate, possibly remote, Self VM hosts an environment that manifests the innards of the Klein VM at the source level. Thanks to Klein's metacircularity and Self's mirror-based reflection model, Klein can reuse a vast amount of already-written Self programming environment code.”
What the authors are forgetting here is that this is not a new facility. Native debuggers have long had the capacity to inspect remote processes. Smalltalk-, Self-, and Java-like designs took a retrograde step by forcing debugging to exploit the help of a server within the VM. Although this has the benefit of allowing the debugger implementation to share the introspection services already present inside the VM, it requires a core of the VM to remain working correctly, even after a failure, which precludes many cases of post-mortem debugging. By contrast, trusty (or crusty? your choice) old native debugging is necessarily designed for this as a common use-case.
The approach I advance instead, as described in the VMIL paper, is to implement introspection on the same infrastructure that supports remote inspection---which happens to be the DWARF infrastructure, in the case of DwarfPython and modern Unix-based compiler toolchains. This is very similar to the Klein approach, in which mirror objects may reflect both local and remote state. But it completely avoids the myth that we should implement everything in a single language. Indeed, debugging information formats like DWARF are actively concerned with supporting a wide variety of languages. One Klein process can inspect another because they share a set of implementation decisions. By contrast, a native debugger need share nothing with its debuggee, because native debugging infrastructure includes a facility which is fundamentally omitted from VM-hosted debuggers: the language implementation explicitly describes its own implementation decisions. It does this down to the machine level, and moreover, up from the machine level.
The result is that given a memory image of a crashed program, we can recover a source-level view of its state at the time of a crash. VM-hosted debuggers are fine for user code because encapsulation and memory-safety protect enough of the VM implementation that the debug server can still work. (Notice I don't say “type-safety”! Type-safety is just an enforcement mechanism for encapsulation, not the key property that ensures encapsulated state is not corrupted.) These VM-level guarantees do not have such a nice property if the failure was due to a bug in the VM itself. This is because the invariants of the VM's own data structures are by definition broken in this case. Some might argue that this is a minority use case, so VM-hosted debugging is fine for general use. Personally I don't mind, as long as I have a debugger that can see all the way down. Currently this doesn't include any VM-hosted debugger, but perhaps it could do. (One of my perhaps-future small projects is to create an implementation of JDWP that knows how to answer queries about native code.)
In summary, I think of the solution to re-use proposed by metacircular designs as a degenerate case of the approach I am pursuing. It sounds strange to most people, but it is not too much to ask for a language-agnostic runtime infrastructure that supports a plurality of language implementations (going right down to native code), direct sharing of code and data, and orthogonality of tool support from language. As I ranted about in the VMIL paper, this infrastructure is a modest and very feasible generalisation of what already exists, with basically only performance questions outstanding. (I'm working on it.) Given this infrastructure, the same careful bootstrapping approach can be used to share code and retain tool support throughout higher-level language implementations. But we can do this without the requirement that everything be in a single language, which doesn't make sense anyway.
[/research] permanent link contact
Initialization order of static state is a thorny problem. It's particularly tricky to get right portably. But until recently I didn't realise how tricky it could be even when restricting oneself to GNU tools on Unix platforms. Consider the following three-part program, consisting of an executable prog and two shared libraries lib1 and lib2. The dependency order is left-to-right in that list: prog depends on lib1 which depends on lib2.
/* prog.c */ #include <stdio.h> /* from lib1 */ void greeting(void); /* constructor */ static void init(void) __attribute__((constructor)); static void init(void) { fprintf(stderr, "Initializing prog\n"); } int main(void) { greeting(); return 0; } /* end prog.c */ /* lib1.c */ #include <stdio.h> /* from lib2 */ void hello(void); /* constructor */ static void init(void) __attribute__((constructor)); static void init(void) { fprintf(stderr, "Initializing lib1\n"); } void greeting(void) { hello(); } /* end lib1.c */ /* lib2.c */ #include <stdio.h> /* constructor */ static void init(void) __attribute__((constructor)); static void init(void) { fprintf(stderr, "Initializing lib2\n"); } void hello(void) { printf("Hello, world!\n"); } /* end lib2.c */
Here's a GNU Makefile to tie it all together.
CFLAGS := -g -fPIC LDFLAGS := -L$$(pwd) -Wl,-R$$(pwd) LDLIBS := -l1 -l2 default: lib1.so lib2.so prog %.so: %.c $(CC) $(CFLAGS) -shared -o "$@" "$<" clean: rm -f lib1.so lib2.so prog
Now when you do make (or gmake) it will build a program that initializes its libraries in right-to-left order: from the “most depended on” to the “least depended on”. We can verify this by running the program.
$ ./prog Initializing lib2 Initializing lib1 Initializing prog Hello, world!
Moreover, if you try flipping around the link order in the LDLIBS line, the link will fail with undefined reference to `hello', because the reference to hello (in lib1) is introduced after the reference to lib2, and the linker's defined behaviour is to avoid re-scanning for new undefined references---it's up to the invoker to order the libraries so that this works.
Let's try this on a BSD system. I have a NetBSD 5.0 VM hanging around, so I'll try that. It has recent GNU make, GCC and GNU binutils installed.
$ gmake cc -g -fPIC -shared -o "lib1.so" "lib1.c" cc -g -fPIC -shared -o "lib2.so" "lib2.c" cc -g -fPIC -L$(pwd) -Wl,-R$(pwd) prog.c -l1 -l2 -o prog $ ./prog Initializing lib1 Initializing lib2 Initializing prog Hello, world!
Strangely, our initialization order is flipped. This doesn't matter for our program, but if lib1 consumed some static state in lib2, it would matter quite a bit. What happens if we flip the link order around to compensate? We edit the LDLIBS line and re-make.
$ nano Makefile $ gmake clean && gmake rm -f lib1.so lib2.so prog cc -g -fPIC -shared -o "lib1.so" "lib1.c" cc -g -fPIC -shared -o "lib2.so" "lib2.c" cc -g -fPIC -L$(pwd) -Wl,-R$(pwd) prog.c -l2 -l1 -o prog $ ./prog Initializing lib2 Initializing lib1 Initializing prog Hello, world!
This has done what we want. But what's going on? This link order didn't even work on GNU/Linux. Not only does it work on BSD, but it's required if we want a sensible initialization order. Our initializers run in left-to-right order, so we need to put the “most depended on” libraries first, not last. This isn't a BSD quirk per se, because we're using the GNU linker in both cases. I suspect the linker scripts are nevertheless different in the two cases. However, I haven't had time to look into the details of why. I'd be interested to hear, if anyone knows. I guess this is the sort of pecularity that gives libtool a reason to exist.
[/devel] permanent link contact