Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 06 Jul 2011

Functionality, not (just) reliability

I'm still a newcomer to verification. The more reading on verification I do, the more evident it is that most work in the area is interested in checking fairly boring properties of programs: usually that it doesn't crash, don't corrupt memory, or that some simple assertions hold. In this post I'll argue that thinking of a provable absence of these properties as “job done”, or even a primary goal, is counterproductive: it overlooks the hidden costs of verification, and overstates the value of such proof. There are far more important top-level goals than verifying these properties, yet there seems to be a real danger that research is overlooking these almost entirely.

To explain myself, I'm going to highlight three distinctions I've come up with about different kinds of work that can loosely be called “verification” or “formal reasoning about programs”. None of them are commonly mentioned, and certainly not by the names I've given them. But they seem fairly important to me, and in each case, one side of the divide is sorely neglected.

Positive specification versus negative specification

I call this kind of “doesn't fail” property “negative specifications”---they say what the software shouldn't do. I'm not trying to denigrate any work that verifies against these specs. It's often more than difficult enough to check these “boring” properties statically, without making things any more “interesting”. Nevertheless, focusing only on negative properties seems to neglect the classical goal of verification, which is to check that an implementation satisfies a specification which captures its intended functionality.

Tackling this means addressing “positive specifications”: what the software should do. This is a similar distinction to that between liveness properties and safety properties. But it is not the same: specifying liveness in the sense of “not deadlocking” or “not hanging” is still a negative specification. Really, what defines a positive property is that it has something to do with functionality, and not whether it's stated in a way that uses logical negation. We should aim to specify positive properties that capture an isolated facet of our system's intended functionality, expressed in application-level terms: perhaps small behaviours that a program simulates (but doesn't bisimulate), or small functional dependencies that a program reproduces.

Conservative versus “liberal” software construction

I had a hard time thinking of the right antonym to “conservative”, and “liberal” will have to do (as “reckless” was my other candidate).

Some classical approaches to programming have a highly conservative property that I call compulsory proof: the programmer cannot make progress without ensuring that some specification remains satisfied. Static typing is the most familiar example: type-checking is, from Curry-Howard, proving a property about your program. You can't do much with a program that doesn't type-check. Refinement-based synthesis methods are another example: a refinement step is only valid if it preserves the specification, and provably so.

As we scale up to larger and more complex programs, and larger and more complex specifications, these approaches start to inhibit progress. It's interesting how dependent types are still a “nearly there” feature---nobody has made them usable yet. I'm told that making programs type-check becomes harder and harder under increasingly dependent types. This is unsurprising: we're making the specifications (a.k.a. types) more complex, so harder to prove satisfaction for. It seems that the most pragmatic solution so far, is to relinquish the insistence on proof at all times, a compromise adopted by Edwin Brady's Idris.

The idea of compulsory proof is flawed because programming is usually exploratory. As a programmer, I often start coding before I fully understand the problem. The process of programming provides necessary illumination. So, presupposing any specification that I must doggedly adhere to at each stage, whether from the tyranny of a type-checker or the tyranny of a refinement procedure, is anathema---it actively stops me from acquiring the understanding I need to complete the task. In the worst case, compulsory proof is a net loss: it slows us down far more than it helps us. What's unfortunate is that our research evaluation methods don't account for this. No programming research considers the net worth to a human programmer. Instead it prefers the mathematical orthodoxy of derivable properties such as preservation of type safety. I suspect these are poorly correlated.

The polar opposite to these conservative approaches, which I call “liberal”, is exemplified by dynamic languages. Here no proof of adherence to any specification (type-based or otherwise) is statically enforced. The programmer is free to break his abstractions at any time---even at run time. The only specifications of that are enforced are those of pre-existing machine-level abstractions---integers, pointers and floating-point numbers---whose integrity is enforced by dynamic checks only.

Like most polar extremes, neither fully-conservative nor fully-liberal approaches are often optimal in practice. I think the consensus from mainstream languages (C, Java, C++) is that static checking using a simple type system is a good idea (and I'm sure even most die-hard Python programmers often yearn for some basic static checking). The jury is still out on the cost/benefit ratio of more complex type systems, even such as Java's generics. Ken Arnold vociferously argues against Java generics , and when I first read his opinion, I was sceptical of his view---surely more static checking can't hurt? Nowadays I can see his points about the cost in complexity and, particularly, in the proof burden on programmers.

Meanwhile, from the liberal end of the spectrum, bug-finding tools like KLEE are interesting: we liberally (or recklessly) allow bugs into our program, then try to whittle them down. It's bug-finding and not verification because we can happily specify properties that the tool can't conclusively show are not violated. A consequence is that in practice KLEE doesn't terminate for most nontrivial programs. On the other hand, the space of assertions that we can use KLEE to check is large: at least in theory it subsumes type-safety (noting that a specification of type-safety can be encoded by instrumenting an untyped program with assertions using a suitably-defined instanceof-like operator) and arbitrary application-specific assertions. There are a few problems with KLEE: it's limited by what we can express as assertions (or built-in properties); it doesn't terminate in enough cases; and it tends to find uninteresting bugs which, while not technically false positives, might well be if we could write a more refined input specification. Nevertheless, I like it because it treats perfection (i.e. proof) as the limit case, not the immediate goal. (It's no coincidence that I'm basically working on addressing these weaknesses right now, including making it terminate for more programs---or at least, I would be if I wasn't wrestling with LLVM bugs the whole time.)

Constructed reliability versus emergent reliability

Using any of these techniques takes time. Writing specifications takes time. Even if you say “my verifier requires no annotations!” (like a lot of Dawson Engler's work, including KLEE), you probably rely on assertions. Even if you restrict yourself to “belief”-style latent specifications (like Engler's 2001 SOSP paper), they got there by somebody writing code. If you rely on “mined” specifications, as recovered by a bunch of work (like this and this and this) you have to gather a corpus and run a mining algorithm and check for false positives and then use some other technique to hunt the false negatives.

In other words, despite half-claims to the contrary, we make our software reliable by pouring some amount of labour into it. New, exciting techniques are exciting because they deliver more with less labour.

But there is an appalling lack of holism here. (Aside: Thomas Ball's call for holism is an interesting read and I applaud it; here I like to think I'm going much further!) What other approaches do we have for making software reliable? How about making it easier to write the code, so that we have more labour to pour into the reliability side?

In other words, nonfunctional concerns trade off against each other, and also against functional concerns! Time spent coding new features is time taken away from static analysis or testing or any other reliability-focused work, and vice-versa. It's also time taken away from profiling and optimisation, from refactoring and documentation, and so on. So in other words, it all helps. Software's value isn't a sum of independent parts. The parts are interdependent; what benefits one benefits all. Reliability can be deliberately constructed by applying specific reliability-focused tools, and hacking at code until it passes muster by these tools. But also, it can emerge from superior development processes that made it easier for programmers to build in reliability in the first place.

Now let me lament my own position. In the research world, only a relatively narrow selection of approaches get any funding. Reliability is a perennially hot topic. It's unquestionedly considered sound research motivation to trot out lines about the importance of reliable cars or reliable trains or reliable nuclear power plants. Similarly, it's routine to trot out analogies with civil engineering, bridge-building and the like. Reliability is important, for sure. But thinking holistically, that doesn't mean we have to attack these problems by building tools with the narrow remit of making sure nothing bad happens in whatever code they're given. Improving all parts of the development process can contribute to these goals, and have untold other economic benefits in the process. Too many researchers' mission statements list reliable software as top priority. But reliability is just a means to economic (or “value”) gain. Hardly any researchers will say their goals are “quality software”, or “functioning software”, or “economical software development”. Why not?

A senior researcher here in Oxford recently pitched his work (to industrial visitors) by saying that verification is ensuring that “nothing happens”. I hope that was a gross simplification for pitching purposes, because we can do a lot better than that.

To finish, let me shamelessly bring on some of my current “background” research interest and activities. I'm interested in good ways of re-using existing code; good ways of adopting programming language innovations without rewriting the world; good editing tools (including refactoring and a whole lot more---I'll blog shortly there); good dynamic analysis tools, including a good debugger (again, I'll blog more shortly). Of course, I didn't manage to find a job on any of these ideas. Since my PhD, days, I've felt as though I was perhaps the only programming researcher in the world whose top priority is not specifically code that is reliable, per se, but the bigger yet seemingly more obvious goal of code that does what you want it to.

So, am I crazy? I took a job in verification because I thought I'd learn some program analysis skills that would be generally useful, including (later) application to the problems nearest my heart, not just to approaches sanctioned by the reliability orthodoxy. But it's a draining experience to be railing against orthodoxy all the time, especially when you feel like the only lunatic in the asylum. I'm not sure how much longer I can take it.

[/research] permanent link contact

Powered by blosxom

validate this page