Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: Remove tracked types from Access Paths, track tainted object type, and tweak type pruning. #18179

Merged
merged 8 commits into from
Dec 5, 2024

Conversation

aschackmull
Copy link
Contributor

This PR changes how type pruning is done in the data flow library. The primary change is that access paths no longer carry information about tracked types. On its own, this is a precision reduction, so it's combined with a separate precision improvement in that we now track the type of the originally tainted object - this corresponds to the innermost type in an access path, if available, prior to this PR. These two changes mostly balance each other in terms of overall precision, but of course they also affect performance. For Java on MRVA top 1000, this effect was slightly negative with two projects having big degradations in terms of number of computed tuples in certain queries. However, without types in access paths, this now allows other changes, so this PR contains a third tweak that postpones type pruning until stage 5. And this latter tweak regains the lost performance and then some, so the overall effect looks like a net win on almost all of MRVA top 1000.

I also checked impact on C# MRVA top 1000, but there the effects were negligible. And other languages are likely even less affected, since this only deals with type pruning, which isn't used much beyond Java and C#.

Commit-by-commit review is encouraged. The first two commits are simple refactorings. And the subsequent 3 commits implement the 3 changes described above.

@aschackmull aschackmull force-pushed the dataflow/accesspath-notypes branch from 5feb737 to b65a4e4 Compare December 3, 2024 09:59
@aschackmull aschackmull requested review from a team as code owners December 3, 2024 11:58
Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great; a nice overall simplification of our code.

@@ -3006,6 +3050,12 @@ module MakeImpl<LocationSig Location, InputSig<Location> Lang> {
)
}

private string ppStored() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate is duplicated, so perhaps pull it out via

private class TypOption extends TypOption::Option {
  string pp() {
    exists(string ppt | ppt = this.toString() |
      if this.isNone() or ppt = "" then result = "" else result = " : " + ppt
    )
  }
}

@@ -891,6 +891,8 @@ module MakeImplCommon<LocationSig Location, InputSig<Location> Lang> {
nodeDataFlowType(this.asNode(), result)
or
nodeDataFlowType(this.asParamReturnNode(), result)
or
isTopType(result) and this.isImplicitReadNode(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we certain that isTopType always has results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonably sure, but not completely. There are two options: Either we make it part of the language-specific input, or we make it a consistency check. I think I like the latter to avoid add yet another predicate in the input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A consistency check sounds fine to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found to fail on C# with

DataFlowType should have one top type that's compatible with all other types, but has 3.

That's harmless, but not intentional, so agreed to add this to the input signature instead as a followup.

@aschackmull aschackmull force-pushed the dataflow/accesspath-notypes branch from 29d3c51 to c187a7a Compare December 4, 2024 10:23
@aschackmull aschackmull merged commit 4bf63fe into github:main Dec 5, 2024
45 checks passed
@aschackmull aschackmull deleted the dataflow/accesspath-notypes branch December 5, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants