Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 3.12 using the new monitoring API. #47

Closed
markshannon opened this issue Sep 11, 2023 · 9 comments
Closed

Support 3.12 using the new monitoring API. #47

markshannon opened this issue Sep 11, 2023 · 9 comments

Comments

@markshannon
Copy link

Short sales pitch 🙂

sys.monitoring offers support for the sort of instrumentation that SlipCover uses, but built into the VM, giving even better performance without the need for import hooks, rewriting bytecode and other hacks inelegant approaches.

The ability to disable each point of instrumentation means that cold code remains instrumented, while hot code runs at full speed, even in the same function.

Feel free to ping me if you have any questions. I'm sure that the docs could be improved, but they should be enough to get you started.

@emeryberger
Copy link
Member

Hi @markshannon ! Thanks for the kind offer! We have been following the discussion of this feature for some time (which I'd like to think we had some role in getting adopted :) , and indeed we have already experimented with it; @jaltmayerpizzorno is working on incorporating it.

@jaltmayerpizzorno
Copy link
Collaborator

Hi @markshannon, thank you for reaching out, and sorry for taking a while to respond. I put some effort towards supporting 3.12 with the new API, but had to shift focus... I realize 3.12 is essentially out, but here are my thoughts so far on it:

  • the LINE events work beautifully, and make it very easy and efficient to implement line/statement coverage (nice!);
  • for branch coverage, things are less straightforward:
    • first, there is the issue that one easily ends up measuring clause coverage instead;
    • second, there is the issue of attribution... the code offset indicated by the branch callback may not lay on a line that the developer would recognize as a branch destination. In bytecode resulting from for statements, for example, one branch destination usually lays in the bytecode that sets up the iterator. One could look into the bytecode to try and determine where that is going, but (a) this would require more bytecode-level knowledge, defeating what the API was intended to allow, and (b) the compiler can make one's life difficult;
    • third (but somewhat less important), the API still requires the application to keep track whether the branch executed all possible ways to know when to disable it.

What I would love to be able to do is to be able to insert callbacks at the branch destinations (see for example Figure 4 on our paper) indicating the branch taken (source and destination lines). These could then individually disabled as soon as they are reached. Can you think of some way to accomplish that using (or extending) the API?

SlipCover does that (for 3.8-3.11) by going back to the AST (which has some downsides). For 3.12, I started exploring adding statements whose line numbers really encode a branch, in order to get a LINE event, but as of when I interrupted the work it was not clear to me whether the line numbers will give me enough space to do so. I would much rather have a cleaner, supported way of doing this. Also, there doesn't seem to be any way to insert a NOP through the AST, so these statements I insert cost a few cycles that I would rather not waste...

@markshannon
Copy link
Author

first, there is the issue that one easily ends up measuring clause coverage instead

If I understand correctly, you want to record the branch out of a and b, but not from a to b in if a and b: ...
Is filtering out those edges, either in the callback, or in post-processing especially difficult?

second, there is the issue of attribution... the code offset indicated by the branch callback may not lay on a line that the developer would recognize as a branch destination. In bytecode resulting from for statements, for example, one branch destination usually lays in the bytecode that sets up the iterator. One could look into the bytecode to try and determine where that is going, but (a) this would require more bytecode-level knowledge, defeating what the API was intended to allow, and (b) the compiler can make one's life difficult;

Would filtering out branches that do not go to the start of a line be sufficient?

third (but somewhat less important), the API still requires the application to keep track whether the branch executed all possible ways to know when to disable it.

There is only ever two ways a branch can go, so it can be done with a single dictionary mapping source to destination.

What I would love to be able to do is to be able to insert callbacks at the branch destinations (see for example Figure 4 on our paper) indicating the branch taken (source and destination lines). These could then individually disabled as soon as they are reached. Can you think of some way to accomplish that using (or extending) the API?

How does that work for branches where control flow merges?
Take the simple example:

if cond:
    block
merge

There is an edge from cond to merge and from block to merge how can a node be inserted at the destination of one edge, without also being inserted at the destination of the other edge?

@jaltmayerpizzorno
Copy link
Collaborator

first, there is the issue that one easily ends up measuring clause coverage instead

If I understand correctly, you want to record the branch out of a and b, but not from a to b in if a and b: ...
Is filtering out those edges, either in the callback, or in post-processing especially difficult?

Right... generally, as we record bytecode-based coverage, we need a way to map that back to source code, as that is how developers think about it. For branches, certainly just using the line information isn't enough. Perhaps if we used the new column information in addition, and mapped it to AST portions... I haven't tried that. Or if the compiler were to save meta-information. In any case, it is much simpler (and thus less error prone) to insert these in the AST and recompile.

second, there is the issue of attribution... [...]
Would filtering out branches that do not go to the start of a line be sufficient?

Not in general, given same-line branches like if foo: bar().

third (but somewhat less important), the API still requires the application to keep track whether the branch executed all possible ways to know when to disable it.

There is only ever two ways a branch can go, so it can be done with a single dictionary mapping source to destination.

Yes, the application can keep track of branch status and disable when possible, but the branch keeps costing cycles for the destination that has already been seen. Ideally we'd want callbacks that can be disabled on a per destination basis... that's why SlipCover inserts probes within the blocks (see below).

I think you meant bytecode branches above... in terms of the source code, match does lead to multiple edges.

How does that work for branches where control flow merges? Take the simple example:

if cond:
    block
merge

It's on our paper... we split such "critical" edges, turning that into

if cond:
   (probe)
   block
else:
   (probe)
merge

You know, maybe we should be promoting clause coverage as an option... I wonder if we'd see adoption.

Anyway, I think support for inserting sys.monitoring callback points at the AST level could be really useful in general. Something that is a no-op in terms of execution, but that causes the callback (ideally also passing custom data in) when execution reaches the point, if monitoring for that event type was requested.

@markshannon
Copy link
Author

sys.monitoring is designed to be used on unmodified code, without causing needing to modify the bytecode, so adding callback points at the AST level isn't really viable.

If you really think the cost of checking repeated checking branches is too high, and if you are willing to modify the AST and insert probes, as you do for 3.11, you could insert JUMP_FORWARD 0 as your probe, and set JUMP events.
No guarantee that it will work for 3.13, although it probably will, but it should be fast for 3.12.
The callback function can be something like:

def probe(code, src, dst):
    if dst - src == 2:
        # do probe stuff
    return DISABLE

Or, if you are willing to do some up front analysis, you only need to check branches where the destination is the destination of more than one branch, and use INSTRUCTION to record all instructions executed.

@jaltmayerpizzorno
Copy link
Collaborator

sys.monitoring is designed to be used on unmodified code, without causing needing to modify the bytecode, so adding callback points at the AST level isn't really viable.

I get it that might require its own PEP, given that it'd be AST-, and thus language-based. But it would a way to support more custom, high-performance dynamic analysis.

If you really think the cost of checking repeated checking branches is too high, and if you are willing to modify the AST and insert probes, as you do for 3.11, you could insert JUMP_FORWARD 0 as your probe, and set JUMP events.[...]

You mean modify the bytecode, right? Or is there a way to add something in the AST that compiles to JUMP_FORWARD 0?

Or, if you are willing to do some up front analysis, you only need to check branches where the destination is the destination of more than one branch, and use INSTRUCTION to record all instructions executed.

I haven't measured, but I think that using INSTRUCTION (which, if I understand it correctly, calls back for every instruction) will make us much slower than the current approaches...

Ugh, so, in summary, no easy way to use the new APIs for branch coverage?

@markshannon
Copy link
Author

Using INSTRUCTION makes a callback for every instruction, until disabled. I assume you will disable it immediately, so it should have reasonably low overhead (apart from memory use).

No easy way to use the new APIs for branch coverage?

It depends on what you mean by a branch.

If you mean a regular branch in the control flow, then the easy way is to use BRANCH events and track which way the branch has gone. Although, you seem concerned that will be too slow, as it might well be, but I think it is worth trying.

If branches are more limited so that in the code if a and b: body ; next the branch from a to next is not considered, then no it is not so easy. I think post-processing is your best approach in that case.
You can map instructions to source locations using code.co_positions() which should allow you to filter out such branches.

@jaltmayerpizzorno
Copy link
Collaborator

jaltmayerpizzorno commented Oct 8, 2023

It depends on what you mean by a branch.

If you mean a regular branch in the control flow [...]

I think it only makes sense for a tool to report branches in terms of the source code... no matter what branch or other instructions the compiler generates.

@jaltmayerpizzorno
Copy link
Collaborator

Done in #48, more or less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants