Diverting trains of thought, wasting precious time

I've been thinking a lot about various techniques in program analysis, transformation and verification recently. There's certainly a lot to think about.

One idea I'm exploring is looking at verification problems as program
specialization exercises. There is a recurring two-stage process in
verification. First, transform your program so that a single execution
captures *all* possible inputs. For an explicit-state model
checker like CMC,
we do this by putting our program in a harness that systematically
explores its state space. Alternatively, for approaches based on
predicate abstraction, we replace all input-dependent transitions in the
program with nondeterministic choice. The effect is the same: we now
have one program encoding all possible behaviours. The second step is
then to specialize our program for answering the question we care about,
such as “does this assertion ever fail?”. We rely on this
specialization to give us a new, simpler, faster program that we can
exhaustively check, or can check to greater depth, without exhausting
resources (time or memory).

It's the specialization step I'm thinking about right now. How much of
our program's computation can we throw away, while still computing the
answer to our question? CEGAR approaches work from the bottom up: we
start from a trivial abstraction and refine it compute something close
to the smallest program which finds either an absence of bugs or at
least one non-spurious bug. This process need not terminate; I'm not yet
clear on its other failure modes, but am fairly sure there are some.
Meanwhile, a top-down approach also exists. CMC is a useful tool even
though it doesn't do *any* specialization of the computation per
se. (It does support some other kinds of abstraction for reducing the
state space by defining equivalences, which have a similar effect but
are of limited applicability.) To improve on this, we could exploit the
fact that throwing away unwanted computation is something we know
something about. Compilers have been doing this since compilers began.
“Program specialization” is a term used mainly by
compiler-minded people rather than verification people. Can we apply
ideas from one world to the other?

“Program specialization” in the literature is often used to
mean partial evaluation. With partial evaluation, we take a program of
*n* inputs, say, and then produce a smaller, simpler, faster
version where some of these inputs are replaced by fixed values. This is
typical of optimisation problems, where “faster” is the key
requirement, and the input constraints have usually been derived from
some other analysis. However, there is a converse case of program
specialization which the same literature often ignores. This is where we
take a program of *n* *outputs*, and then produce a smaller,
simpler, faster version where we “don't care” about some of
these *outputs*. This is typical of verification problems, where
“simpler” is the key requirement, and the selection of
don't-care outputs is a consequence of the specification being
considered.

Predicate abstraction is doing this, but with some added room for manoeuvre---since it's open to finding sound approximations rather than precise specializations---and also with some added constraints, since it's interested in predicates that can be input to a SAT or SMT solver to perform the abstraction-refinement. Dave provided a valuable link in a productive coffee break this morning, by noting that program slicing is also an instance of specializing for don't-care outputs. What happens if we use slicing techniques to do a top-down specialization? I'm worried the answer is “not enough” or “strictly worse than abstraction-refinement”, but I'll keep thinking about it.

*[/research]
permanent link
contact
*

One of the reasons why I'm not a theoretical computer scientist is that I am very, very averse to mathematical notation. “It's like Greek to me!”---no applause, please. Certainly, it's common to see very highly abbreviated notation that takes some serious cognitive gear-turning to decode. If I'm faced with a Greek-heavy paper, I usually skim over the symbolic stuff and look for an explanation in words. Sometimes it's there, and sometimes it isn't. In the cases where it's not, I rarely have the stamina to wade through the Greek.

Natural language, for all its imprecision, is---unsurprisingly---more “natural”! In fact, I'll wager that most of the infamous imprecision found in natural language specifications could be fixed by more precise natural language. Perhaps a semantic checker for English is in order. Diagrams are even better than natural language, of course, although they rarely stand alone.

It strikes me that formalism is primarily useful for avoiding mistakes. By turning complex reasoning into simple pattern-recognition and symbol-pushing, correctness can be checked fairly dumbly. The cost is that although it's hard to make mistakes, it's hard to make progress: there are reams of applicable rules, and expressing anything complex requires a whole lot of symbols. So I'm going to go out on a limb and claim that formalism is notably not very good for acquiring understanding. In a lecture, diagrams and examples and words have always been far more useful to me than slides full of Greek. I'm also going to assert (without proof!) that formalism is not useful for artifact construction, except where mistake-avoidance is paramount. We should allow programmers to make imprecise statements, and refine them later, because humans can be a lot more productive this way. In particular, we can make progress before we fully understand the problem! Only when the cost of the smallest mistake is so great that we really want to rein things in should we resort to fully rigorous constructive methods (such as formal refinement processes, the B method, etc.). This argument also encompasses many of the usual arguments in favour of dynamic languages over statically typed ones.

Of course, that doesn't mean that any formal notation is to be avoided. For whatever quirk of evolution, humans have some aptitude for written language---and that includes more mathematical-style symbolic notations just as well as plain old words made of letters. So mathematical notation is fine if it stays within a particular comfort zone. I can read basic logic and basic algebra without much cognitive burden. Only when the formal notation passes a certain density threshold do I suddenly hit problems. I suspect that most theoretical computer scientists (and mathematicians) have a much higher threshold than I do.

*[**/research]
permanent link
contact
*