Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataflow: Support alert provenance #15501

Merged
merged 23 commits into from
Apr 12, 2024

Conversation

aschackmull
Copy link
Contributor

@aschackmull aschackmull commented Feb 1, 2024

This adds support for alert provenance in path problem queries using the data flow library. Plumbing for all languages using models-as-data is added such that the MaD rows contributing to an alert are included in the PathGraph. Provenance from QL-based models is also included and can be adjusted on a language-by-language basis depending on which models are desirable to include in the provenance data - generally we should not include provenance for basic steps such as variable assignments and similar self-evident flow.

Full support for C#, Java, Python, Go, and Ruby.
Swift and C++ don't support MaD yet and hence there's no MaD provenance to propagate, but provenance from QL-based models is supported.
(JS is of course not supported as it doesn't yet use the shared data flow library).

Edit: Notes on the interesting qltest changes:

  • All languages (with MaD): Source/Sink provenance is somewhat approximate since the model isn't preserved through a configuration, so for source/sink nodes with multiple models for different purposes we see duplicate provenance. This is potentially fixable by merging the models and managing their different purpose through a hierarchical relationship.
  • Some languages also highlight MaD/QL redundancies.

@aschackmull aschackmull force-pushed the dataflow/alert-provenance branch 5 times, most recently from 6b8c216 to 3f986bd Compare February 7, 2024 10:38
@aschackmull aschackmull changed the title Dataflow/Java: Support alert provenance Dataflow: Support alert provenance Feb 7, 2024
@aschackmull aschackmull force-pushed the dataflow/alert-provenance branch from 3f986bd to 1464560 Compare February 7, 2024 14:16
@github-actions github-actions bot added the JS label Feb 7, 2024
@aschackmull aschackmull force-pushed the dataflow/alert-provenance branch 3 times, most recently from f35a98c to 25e2e07 Compare February 9, 2024 12:08
@aschackmull aschackmull force-pushed the dataflow/alert-provenance branch 2 times, most recently from 0bf1d34 to 979e8c9 Compare March 6, 2024 13:16
@aschackmull aschackmull force-pushed the dataflow/alert-provenance branch 5 times, most recently from 05450de to 43a334d Compare March 22, 2024 09:14
@@ -8,13 +8,15 @@
*
* The kind `remote` represents a general remote flow source.
*/
extensible predicate sourceModel(string type, string path, string kind);
extensible predicate sourceModel(

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for madId, but the QLDoc mentions remote
@@ -23,7 +25,9 @@
* `kind` should be either `value` or `taint`, for value-preserving or taint-preserving steps,
* respectively.
*/
extensible predicate summaryModel(string type, string path, string input, string output, string kind);
extensible predicate summaryModel(

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for madId, but the QLDoc mentions taint
@@ -8,13 +8,15 @@
*
* The kind `remote` represents a general remote flow source.
*/
extensible predicate sourceModel(string type, string path, string kind);
extensible predicate sourceModel(

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for madId, but the QLDoc mentions remote
@@ -23,7 +25,9 @@
* `kind` should be either `value` or `taint`, for value-preserving or taint-preserving steps,
* respectively.
*/
extensible predicate summaryModel(string type, string path, string input, string output, string kind);
extensible predicate summaryModel(

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for madId, but the QLDoc mentions taint
@@ -8,13 +8,15 @@
*
* The kind `remote` represents a general remote flow source.
*/
extensible predicate sourceModel(string type, string path, string kind);
extensible predicate sourceModel(

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for madId, but the QLDoc mentions remote
@@ -23,7 +25,9 @@
* `kind` should be either `value` or `taint`, for value-preserving or taint-preserving steps,
* respectively.
*/
extensible predicate summaryModel(string type, string path, string input, string output, string kind);
extensible predicate summaryModel(

Check warning

Code scanning / CodeQL

Missing QLDoc for parameter Warning

The QLDoc has no documentation for madId, but the QLDoc mentions taint
@aschackmull aschackmull force-pushed the dataflow/alert-provenance branch 3 times, most recently from 07b4f13 to 0e5b37f Compare April 9, 2024 06:56
@aschackmull aschackmull force-pushed the dataflow/alert-provenance branch from 0a475a3 to b4e23d9 Compare April 12, 2024 07:21
@aschackmull aschackmull merged commit 854dfb3 into github:main Apr 12, 2024
44 of 46 checks passed
@aschackmull aschackmull deleted the dataflow/alert-provenance branch April 12, 2024 09:17
@michaelnebel
Copy link
Contributor

I do worry about the burden of updating all the MaD indexes every time that new models are added though.

Yeah. I do have a plan for that: We'll make a parameterised module that can be applied to a PathGraph that replaces the numbers by something more readable and stable - that way we'll get stable test output that also shows actual provenance. The downside is that we'll need to update a bunch of tests.

I ran into this problem immediately 😂 . Are you by any chance already working on the plan for creating a PathGraph wrapper?

@aschackmull
Copy link
Contributor Author

Are you by any chance already working on the plan for creating a PathGraph wrapper?

I was just about to start.

---
category: feature
---
* The `PathGraph` result of a data flow computation has been augmented with model provenance information for each of the flow steps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case that users who write custom queries and corresponding qltests may have to update their .expected output as a result of this PR? If so, this should be mentioned prominently in the change note to avoid surprises and confusion by users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an amendment: #16211

redsun82 added a commit that referenced this pull request Apr 12, 2024
Not being triggered by changes in shared was making it possible to not
notice changes in `shared` having effect on Swift tests. For example
[this PR](#15501) introduced a
test change that was fixed [here](#16197).
@aschackmull
Copy link
Contributor Author

Are you by any chance already working on the plan for creating a PathGraph wrapper?

I was just about to start.

Here: #16210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants