-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C#/Java: Content based model generation improvements. #17363
C#/Java: Content based model generation improvements. #17363
Conversation
76b9cdd
to
8473b8c
Compare
8473b8c
to
412551f
Compare
b5456a3
to
b7d00a4
Compare
4521dfd
to
30f3923
Compare
30f3923
to
0abc08c
Compare
DCA looks good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments.
* detection instead, as reads and stores would use a significant | ||
* part of an objects internal state. | ||
*/ | ||
private class ContentDataFlowSummaryTargetApi extends DataFlowSummaryTargetApi { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could make the restriction per parameter. I.e., if a method has only one summary for Argument[0]
then include it, but if it has more than, say 3, summaries for Argument[1]
then exclude those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that could also be interesting.
Would it be acceptable if I do that experiment as a follow up?
As it is now we "only" exclude approximately 2% of all the possible target APIs with this limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
hasSyntheticContent(read) and | ||
hasSyntheticContent(store) and | ||
( | ||
step(t1, read, t2, store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we restrict to syntPathEntry
here? Otherwise it looks like we will compute O(n^2)
paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh - this was intended as a helper predicate, but it looks like we need to fold it into the declarations where it is used to avoid the O(n^2).
* if both of these methods exist. | ||
*/ | ||
pragma[nomagic] | ||
predicate acceptReadStore( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is simply enough to say that we will include a synthetic field if it is part of some input specification and part of some output specification. That should also handle cases such as
setAll // input: Argument[0], output: ReturnValue.SyntheticField[Foo].Element
get // input: Argument[0].SyntheticField[Foo], output: ReturnValue
list = setAll("taint");
x = list[0];
sink(get(x));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about that as a possible "improvement" (instead of using "path" equality we could "hash" paths with synthetics (basically just print the synthetics in the order they are shown in the path and then compare that)). This would allow path continuations as the one you mention there.
Is it acceptable that we attempt this as a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking of something even simpler where the order is not taken into account, but yes follow-up is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also try without order; However, I am quite sure that we need the "chaining" logic. It turned out that it is not uncommen that we produce summaries like SyntheticField[A] -> SyntheticField[A]
(without other mentions on the synthetic field A
) or SyntheticField[A] -> SyntheticField[B]
and SyntheticField[B] -> SyntheticField[A]
, if we don't restrict the use of synthetics.
* be translated into a synthetic field. | ||
* | ||
* This is needed because we don't want to include summaries that reads from or | ||
* stores into a "dead" synthetic field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"internal" instead of "dead"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I will add a commit to the re-factor PR that changes this.
If we apply the content based model generator to
Many of these models are undesirable, primarily because of one of the following reasons
With the changes proposed in this PR, applying the content based model generator to
The issues are addressed in the following way:
get
andset
methods that uses a private backing fieldX
. In this case we would like to generate the summariesset : Argument[0] -> Argument[this].SyntheticField[X]
andget : Argument[this].SyntheticField[X] -> ReturnValue
, but we only want to do this in case both methods exists. If only theget
method exists, then it retrieves information from a "dead" synthetic field. The synthetic chaining is currently based on access path equality but that restriction could potentially be loosened a bit, but I suspect that this only has limited impact on the generated summaries.Note that the changes in this PR doesn't change the production version of the model generation and only impacts model generated by the recently introduced
--with-contentbased-summaries
.Further changes are still required for this to be production ready, but the changes in this PR are self contained.