Rambles around computer science

Diverting trains of thought, wasting precious time

Wed, 02 Jul 2014

Why isn't verification standard practice?

Yesterday, Steve Crocker gave a very stimulating talk sharing thoughts—and also soliciting thoughts—about why verification isn't standard practice. He began with an anecdote about last year's BIND vulnerability (CVE-2013-2266) that could allow a single malicious packet to crash a DNS server. The problem is a simple bug in a regular expression library. Malicious clients can exploit the bug to make the process allocate huge amounts of memory, and hence kill it. How can we justify the fact that such simple bugs get into deployed code? Why can't we verify the absence of at least these simple bugs in anything that we deploy?

There were many useful contributions from the audience. It took me a while to collect my thoughts, but here are some personal responses that weren't discussed at the time.

Coding versus deployment

The BIND problem is primarily one of deploying shoddy software, and only secondarily one of programming shoddy software. Even with the best possible tools, making something robust will always take time and effort. Our problem is that we don't have good ways by which to make an informed decision about whether something is good enough (and if not, what needs fixing). This is re-stating a theme of the discussion: that our problem is at best only partly a technical one. I do believe that technical solutions and cultural solutions must go hand-in-hand: changing the technology can change the culture.

Proof as an activity much like programming

Considerable progress has been made in proof assistants like Coq and Isabelle, where proof starts to look like a programming task. In particular, the machine helps us do it and helps us check it. This is a really useful advance. Programmers are already used to satisfying a proof procedure, if they use any language with a type checker. But that doesn't mean we need to restrict ourselves to designing all our proof systems to be type checkers or things like them! I'll elaborate on this shortly.

One size doesn't fit all

I believe it's too limiting to expect a single compile-time analysis to tackle all proving and proof-checking workloads. If we write some immature code that's doing something subtle, we might expect machine proof (or disproof) to take some time. As it gets more mature, we can add more annotations and generally structure the code better, to help make the proof fast. We can't just forbid ourselves from writing or executing immature code. Of course if we consider the BIND scenario, meaning the case of mature, production software, then the deployed code should be sufficiently mature that a fast compile-time analysis is capable of producing and/or checking the proof. But we need tools that let us progress code across the maturity spectrum, not just demand a fixed level of maturity.

Language-independent analyses

One of the reasons that we get hung up on language so easily is that static analysis systems like to have a source-level input language. There's no reason why they need to, though. As programming researchers, I'd argue we have never been very good at recognising the need to accommodate diversity of programming languages in most of the infrastructure we design. This can and should change. (It's something I'm working on, intermittently.) One approach I think makes sense is “binary-level analysis, source-level properties”. We can annotate source code and push those annotations through into binaries, and check that they hold of the binary level. Binaries are what is deployed, after all, and as I've argued already, deployment is the point where the assurances are most needed. It also defuses complaints about the ambiguity of source languages. Binaries are a lot less ambiguous (and we're working on improving that too). While source-level correctness for all deployment environments, i.e. “correctness portability”, is a nice property, it's a harder problem and not always important. We should walk before we run.

Simple and not-so-simple bugs

Is it enough to consider only the simple bugs? During the talk, Jon Crowcroft rightly put it like this (I'm severely paraphrasing): rewriting everything in a better language would bring us to the next level of problems, and these would be more interesting than most of the ones we currently face, but (by implication) such problems will exist. It's not clear to me that a similarly nightmarish scenario could not occur in BIND owing to some much more subtle bug than a buffer overflow (or similar). If what I just said happens not to be true for BIND (again, following Jon's comments that DNS and other core internet services are simple), it's probably true for some other critical system (like the web).

The static-to-dynamic continuum

Some discussion of assertions during the talk was interesting. One audience member commented that some software crashes more than it would if it didn't contain assertions, either because the assertions are wrong or are not sufficiently critical properties to justify termination of the program. I find it hard to blame the assertions for this, It's true that if we must tolerate a risk that assertions might fail in production code, “log and [attempt to] continue” is a marginally better strategy than “abort”. But Steve Crocker countered that for critical, deployed software, assertions should be proved to hold—a position I agree with. That's a moral “should”; in practice we have to be very clever about how we enforce this. For example, we wouldn't want to unwittingly encourage programmers to delete assertions that they couldn't prove to hold. More importantly, to repeat my last paragraph, we need to allow developers to progress a codebase from immature to mature states. We might only deploy it when it's mature, but for development, we need to have something which admits run-time failures, yet is still executable. This is another reason why I believe a fixed compile-time analysis isn't the right approach, and that we should instead choose the level of assurance we want at deployment time. The tool that establishes (or denies) this assurance might also be a compiler, but needn't be.

Assertion languages, not annotation languages

Even though we don't currently prove assertions to hold, I'd argue that assertions are a great success because they elicit specification from programmers. Most programmers use them, and they elicit a lot of subtle properties which might otherwise not get explicitly written down. Improving the expressiveness of assertions is an approach that I'm particularly keen on. The TESLA work of Jon Anderson, Robert Watson and colleagues, on temporal assertions, is a great example. I have some work brewing about expressing type correctness using assertions.

The neat thing about assertions is that they are expressed in the programming language, and easily understood by any programmer. I believe that building specification and verification infrastructure out of programmer-friendly constructs, like assertions, is a necessary step to making them standard practice. We should avoid forcing programmers to use formalisms that don't map very clearly to the program. Again, TESLA is an example of this. Reportedly, programmers find it easy to write TESLA automata. They might not find the same thing easy to write in LTL or CTL*, even though those formalisms might be more friendly to somebody who writes verification algorithms. So, when I hear mention of “annotation languages” as a separate thing that needs to be bolted on to a source language, my reaction is to ask whether these annotations can be expressed in the source language itself, augmented by only a minimal and programmer-understandable set of new primitives. In my book, such a property then becomes an assertion rather than an annotation.

The performance culture

This is a bit of a departure but I can't help throwing it in. It's generally considered that infrastructure that slows programs down more than a few percent is not acceptable for deployment use. This is in spite of the relative plenty of computing power and the relatively huge expense of simple bugs like buffer overflows. Suppose we reach a point where all the “basic” bugs are routinely proved absent via verification. There might be some less basic properties which we can only check dynamically at nontrivial cost. Do we leave the checks in, or not? I'd argue that the performance nuts are usually getting the calculations wrong. The cost of extra servers pales to the cost of downtime, security compromises and the like. and they also pale next to the cost of debugging subtle failures that are not caught cleanly. Unfortunately, saying that the slowdown is too great is still a popular way to reject research papers. The commonly accepted criteria about “suitability for deployment” are finger-in-the-air stuff.

The I/O bottleneck

This was the least obvious of my thoughts, but is for me one of the most important. Let's consider the Heartbleed bug, which is triggered when a malicious client supplies a bogus length field and causes the server to overrun its buffer. This particular overrun would have been prevented by a memory-safe language, because the buffer copy would be bounds-checked. But why do people write code at the level of bytes and buffers anyway? Despite what some people write, “type safety” is not quite the issue here. Bytes and buffers are fundamentally semantically impoverished things, and yet we find ourselves writing this kind of code in every language, no matter how “type safe”, and no matter how expressive its data abstraction features might be. Coding at the level of bytes and buffers is always error-prone, even if one or other kind of error is ruled out by one or other kind of check mandated by a given language.

The reason we do it is because of I/O. I/O happens on bytes and buffers, because that's what the libraries expose. In turn, they blame it on the operating system: that's what Unix exposes. Once upon a time there was a lot of work on persistent programming languages, but they have not taken off. I suspect it's because they require too much buy-in. We don't want our data to be an opaque box that is managed only by a language runtime. What we need instead is to add specification to the byte-streams we already have. If we produce or consume byte-streams, we should specify their format, both syntactically and including some basic semantic properties. If we're OpenSSL, we'd specify that the first two bytes of the heartbeat message are a length field, and that it should be equal to the remaining length of the buffer. This can already be expressed in languages like Datascript. We might be able to come up with something even better (ask me!) but in any case, we also now need to push this kind of specification facility deeper into our libraries, language runtimes and operating systems. Once we've done so, it's trivial to insert bounds checks. We can even check subobject bounds inside the payload, i.e. referential integrity within the payload itself. In general, once we know the meaning of the bytes, we can actually treat I/O payload data abstractly. By contrast,even a “type safe” language currently only achieves abstraction by copying the data into data structures that it manages. Doing so unavoidably requires interpreting those bytes, and doing so unsafely—any checks that are done are entirely down to the programmer.

It's no coincidence that I've advocated this before. I'm working on it—after I work on all the other things that come before it in my list. If somebody wants to join me, that would be great....

[/research] permanent link contact


Powered by blosxom

validate this page