Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix more FPs in cpp/iterator-to-expired-container #16255

Merged

Conversation

MathiasVP
Copy link
Contributor

This fixes almost all the FPs on MRVA. In particular, we go from 365 results to 24 results, and all of the removed FPs are instances of the FP added in the first commit.

…ve the call context all the way though from the source to the sink.
@MathiasVP MathiasVP requested a review from a team as a code owner April 19, 2024 07:30
@github-actions github-actions bot added the C++ label Apr 19, 2024
predicate isSink(DataFlow::Node sink) { destroyedToBeginSink(sink, _) }
predicate isSink(DataFlow::Node sink, FlowState state) {
// Note: This is a non-trivial cartesian product!
// Hopefully, both of these sets are quite small in practice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a tiny bit concerned that in a database with lots of results for this query, both sets might be distinctly "medium" rather than clearly "small". Mind you a cartesian product of even a few thousand nodes / states still shouldn't be enough to matter.

Based on the stage timings table on the DCA run, I decided to have a closer look at an abseil-cpp-linux run locally. It took one second to execute (with a cache warmed up on another dataflow query), no more than 122ms on any single predicate. So I'm reassured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this is basically the same CP as the one we fixed for a different query in #10123. We could make this a lot safer by doing a similar transformation, but I didn't want to do it before we had seen a DB that needed this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also consider doing this transformation before we pull the query out of experimental. But in any case, I would prefer to keep that transformation out of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it might be worth doing before we pull the query out of experimental, or alternatively we can rely on QA to catch any problems. Either way you're right, it doesn't need addressing here.

state1 = TempToDestructor() and
state2 = DestroyedToBegin(node2) and
node2 = getADestroyedNode(node1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what exactly is different now compared to when we had two sequential dataflows? The source of the second flow (DestroyedToBeginFlow) still had to be a reached sink of the first (TempToDestructorFlow) flow. I can see that we now know the correct overall source ... but that isn't reported by the query anyway.

Copy link
Contributor Author

@MathiasVP MathiasVP Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this hypothetical example:

int id(int x) {
  return x;
}

void f1() {
  int x2 = id(source());
  safe(x2);
}

void f2() {
  int x1 = id(0);
  sink(x1);
}

And consider a QL query that has two configurations:

  • One configuration (C1) from source() to x in return x (in id), and
  • One configuration (C2) starting at any sink from C1 to the argument of sink.

C1 finds a flow path from f1 to id, and C2 finds a flow path from id to f2. So if we simply use the result of C2 we get a FP saying that there's flow from source() to sink(x1).

However, if we combine these into a single configuration then the dataflow library won't lose track of the call context. This means flow can no longer enter id from f1 and exit id to f2.

Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that makes sense and fits with what's going on in the full test case. Thanks!

@MathiasVP MathiasVP merged commit 799c380 into github:main Apr 22, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants