Skip to content

SpeculativeAnalysis

Guido Chari edited this page Mar 26, 2018 · 15 revisions

Example

The idea with speculative analysis is to allow static analysis to be optimistic, where otherwise precision would be lost.

Let's consider an example:

f <- function(foo) {
  x <- 1
  foo()
  x
}

wellBehaved <- function() 1
illBehaved <- function() assign("x", 666, sys.frame())

f(wellBehaved)

We would like to constant propagate x <- 1. But, it's only correct in case foo being wellBehaved, not if it is illBehaved. To solve this issue we can place a guard:

f <- function(foo) {
  assume(contextInterefence == wellBehaved)
  foo()
  1
}

Guido: I think I'd use guard here instead of assume based on my understanding of both terms. Anyway, perhaps we are looking for a new type of a guard...

That works, but is overly conservative. For example it will fail for f(function()42) even though the function we pass is in a sense well behaved too.

What we want instead is a condition on foo, that is sufficient for our optimization to still be correct.

f <- function(foo) {
  assume(doesNotAlter("x", foo))
  foo()
  1
}

Let's call the sum of all conditions, such as doesNotAlter, the spec that foo needs to adhere to. Note, checking the spec is probably expensive. But, with (foo == wellBehaved || doesNotAlter("x", foo)) there exists a check that is both efficient and as weak as possible.

Guido: We already mentioned this, but a cost model would be great for this kind of guards. Thinking loud I haven't seen studies of how much existing guards cost. May be interesting to know...

Specs

How do we derive a spec?

There are a number of properties that we want form it:

  • It is a sufficient condition for our optimization to be correct
  • We can verify it at runtime
  • For practical reasons: Once we verified it, there is a quick check

From Static Analysis

If we think in terms of static analysis, then the above function f can be analyzed as follows:

x <- 1   ---[x=1]--->   foo()   ---[??]--->   x

Adding our hunch about foo being wellBehaved, while staying conservative, we get the following:

Conservative Analysis:

                      .-->   wellBehaved()   --[x=1]--.
                     /                                 \
x <- 1   ---[x==1]--|                                   |---[??]--->   x
                     \                                 /
                      `-->   foo()   ----------[??]---’

Obviously, the fact we need about x is lost at the merge between wellBehaved and foo. We can see this by looking at the difference between the results of an optimistic analysis and a conservative analysis. To get an optimistic analysis, we just replace foo with abort.

Optimistic Analysis
                      .-->   wellBehaved()   ---[x=1]--.
                     /                                  \
x <- 1   ---[x==1]--|                                    |---[x=1]--->   x
                     \                                  /
                      `-->   abort()   ---------[⊥]----’

To find a minimal spec, we need to back-propagate facts, that we used in optimizations. In this case we find that foo() needs to produce a state where x=1. We can wrap the call to foo in a spec that enforces this.

                     .->  wellBehaved()  ---------[x=1]------.
                    /                                         \
x <- 1   --[x==1]--|                                           |--[x=1, ??]-->   x
                    \                                         /
                     `-->  specWrapper(foo())  ---[x=1, ??]--’

Concretely, we could use the following spec:

specWrapper <- function(contextInterference) {
    contextInterference()
    if (x != 1)
        specViolation()
}

The problem with this is, that we can't actually execute such a specWrapper in general. To understand why, consider the following example:

specWrapper <- function(contextInterference, arg) {
    contextInterference(arg)
    if (sentANetworkPacket())
        specViolation()

specWrapper(foo, highlySensitiveDoNotLeak)

More realistically, in the case of a compiler, we might want to assert that a function does not print anything, does not alter the heap, and so on. All of those are effects, which, when detected retroactively, can't be undone.

However, we could try to prove that a particular foo we observe at runtime, does satisfy the spec. For example using a static analysis that verifies that specViolation is unreachable.

Guido: I lost the discussion so I may be saying something stupid. But usually guards are simple enough that can be tested at run time. Here you are mentioning a particular kind of condition with stronger constraints. And actually, it seems a very particular case. So I wonder if we can have like a family of conditions. Or perhaps devise some efficient condition that first checks if the argument is not special.

o: As you say, traditionally what we think of as guards, is a condition on some runtime value. Here I am trying to extend that notion and make it a condition on an analysis result, that will only become available at runtime. The idea being that we have an incomplete static analysis, that is completed at runtime, and the condition is on the delayed completion of the analysis. Efficiency is an additional concern.

Issues

Let's try to come up with an implementation strategy. Given a static analysis proveSpec which tries to statically prove that a given expression cannot reach specViolation.

delayedCheck = quote(specWrapper(unquote(foo), unquote(highlySensitiveDoNotLeak)))
assume(proveSpec(delayedSpec))
foo(highlySensitiveDoNotLeak)

That's better, but still fails to often. So far we just focus on the previously unknown piece foo, but disregard the context in which it is instantiated (i.e. f). Thus, we get a less precise result than we would have gotten, if we would have known foo statically. Consider:

f <- function (foo) {
    GLOBAL <- 1
    delayedCheck = quote(specWrapper(unquote(foo), unquote(doNotLeak)))
    assume(proveSpec(delayedSpec))
    foo(doNotLeak)
    return 0
}

aFoo <- function(x) if (GLOBAL != 1) leak(x)
f(aFoo)

In this case the function foo does not leak the argument, but our analysis cannot know, without also considering the body of f. So, we also want to include some context.

Maybe a solution would be to partially evaluate f and then analyze the residual function. Conceptually, such a solution would behave along those lines:

f <- function (foo) {
    GLOBAL <- 1
    delayedCheck = quote(
        GLOBAL <- 1
        specWrapper(unquote(foo), unquote(doNotLeak))
        return 0)
    assume(proveSpec(delayedCheck))
    foo(doNotLeak)
    return 0
}

We probably do not want to keep a copy of the whole function for delayedCheck, but rather use the function itself. In any case, we need to argue why it should be possible to complete the original analysis on the optimized version of f.

Guido: I got lost with the last 2 sentences. Anyway, what I see is that for this special cases, what we need is something that is not optimistic but precise, i.e, you know that foo will not abort. So you do not need the complex guard before calling the original function. I guess a guard is needed just to test that the argument is the same foo?

o: So I wouldn't call them special cases, because those are the cases that we are interested. I would like to find a generic model, where the simple guards are the special cases. Also what do you mean by "you know that foo will not abort"? We don't know what foo is. The question is, will a particular foo that we observe at runtime, adheres to the spec.

Guido: So, what I meant here is that I do not think that completing the analysis would be very fast. Making as efficient as possible is an interesting challlenge. But for sure it will have a cost. And in a language that heavily relies on method calls that will affect the throughput I guess. So I see this as a special kind of type analysis on the parameter. You run the analysis when you call the function and then you are sure at run time that foo will not abort. Then you should add a simpler guard to that function that test that "foo have not changed". This avoids to run the analysis again. In case it changed (is not the exactly the same function) you run the analysis.

Clone this wiki locally