-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix more FPs in cpp/iterator-to-expired-container
#16255
Fix more FPs in cpp/iterator-to-expired-container
#16255
Conversation
…ve the call context all the way though from the source to the sink.
predicate isSink(DataFlow::Node sink) { destroyedToBeginSink(sink, _) } | ||
predicate isSink(DataFlow::Node sink, FlowState state) { | ||
// Note: This is a non-trivial cartesian product! | ||
// Hopefully, both of these sets are quite small in practice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a tiny bit concerned that in a database with lots of results for this query, both sets might be distinctly "medium" rather than clearly "small". Mind you a cartesian product of even a few thousand nodes / states still shouldn't be enough to matter.
Based on the stage timings table on the DCA run, I decided to have a closer look at an abseil-cpp-linux
run locally. It took one second to execute (with a cache warmed up on another dataflow query), no more than 122ms on any single predicate. So I'm reassured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this is basically the same CP as the one we fixed for a different query in #10123. We could make this a lot safer by doing a similar transformation, but I didn't want to do it before we had seen a DB that needed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also consider doing this transformation before we pull the query out of experimental. But in any case, I would prefer to keep that transformation out of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like it might be worth doing before we pull the query out of experimental, or alternatively we can rely on QA to catch any problems. Either way you're right, it doesn't need addressing here.
state1 = TempToDestructor() and | ||
state2 = DestroyedToBegin(node2) and | ||
node2 = getADestroyedNode(node1) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what exactly is different now compared to when we had two sequential dataflows? The source of the second flow (DestroyedToBeginFlow
) still had to be a reached sink of the first (TempToDestructorFlow
) flow. I can see that we now know the correct overall source ... but that isn't reported by the query anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this hypothetical example:
int id(int x) {
return x;
}
void f1() {
int x2 = id(source());
safe(x2);
}
void f2() {
int x1 = id(0);
sink(x1);
}
And consider a QL query that has two configurations:
- One configuration (
C1
) fromsource()
tox
inreturn x
(inid
), and - One configuration (
C2
) starting at any sink fromC1
to the argument ofsink
.
C1
finds a flow path from f1
to id
, and C2
finds a flow path from id
to f2
. So if we simply use the result of C2
we get a FP saying that there's flow from source()
to sink(x1)
.
However, if we combine these into a single configuration then the dataflow library won't lose track of the call context. This means flow can no longer enter id
from f1
and exit id
to f2
.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that makes sense and fits with what's going on in the full test case. Thanks!
This fixes almost all the FPs on MRVA. In particular, we go from 365 results to 24 results, and all of the removed FPs are instances of the FP added in the first commit.