Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS: Improve handling of spread arguments and rest parameters [shared data flow branch] #17213

Merged
merged 37 commits into from
Sep 9, 2024

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Aug 13, 2024

(Targeting js/shared-dataflow-branch)

This greatly improves our ability to handle spread arguments and rest parameters, as well as .apply() calls and the arguments array.

The basic idea is that each call materialises an array of arguments using a bunch of store edges, which is read back using read edges at the callee. But as usual there are a few complications which I'll talk about below

Split into static and dynamic argument arrays

The initial pruning phase of data flow does not track contents at all. So if we always pass arguments via contents, then all argument positions will get mixed up during the initial forward pass. This would happen even for call sites and functions that don't actually use spread/rest.

To ensure "simple" call sites and functions are handled precisely in the initial phase, we pass in two argument arrays, one for "static" arguments and one for "dynamic" arguments.

Arguments passed in via spread arguments or .apply() are called "dynamic arguments". Rest parameters and the arguments object are said to be "dynamic parameters".

  • At each call site:
    • Arguments passed in via a spread operator, or appear after a spread operator, are stored in the dynamic argument array.
    • The second argument of .apply() flows directly into the dynamic argument array.
    • All other positional arguments are stored in the static argument array.
  • At each function:
    • Contents from the dynamic parameter array flow into the plain positional parameters.
    • Contents from the static and dynamic parameter arrays flow into rest parameters and arguments.

Shifting indices

When a rest parameter or spread argument appears at an index >0 we need to shift array indices. This currently requires synthesising a bunch of nodes and store/read edges for the first 10 array indices. This is similar to how Ruby does it.

When a rest parameter or spread arguments appears at index 0, we just generate a local flow step instead so we avoid a redundant zero-shift.

Super calls in default constructors

Part of the reason for doing this is that we have better support for flow through super() calls, and since default constructors desugar into code containing a super(...args) call, we get spurious flow if we don't also have precise handling of spread arguments and rest parameters. Tbh I was expecting a bigger impact from this; in the end only 3 FPs were fixed, but still worth doing now that the work is done.

Removing the implicit taint rule for arrays

With the old data flow library we mostly followed by the rule that if any element of an array is tainted, then the whole array is tainted. This rule has been gradually relaxed as we've added support for precise tracking of array contents, but a few holdovers remain, such as array literals being tainted if any of their members were tainted.

Taint tracking just doesn't play nice with rest/spread if we keep that rule, so I'm moving towards removing it in this PR. To avoid breaking everything however, I've added array element contents to defaultImplicitTaintRead, meaning that any sink or configuration-specific additional taint step can implicitly read array elements. Note that only configuration-specific taint steps can use the implicit read, steps from library models do not, so we may have some work to do in updating our models. But that's for a future PR.

Note that there is a known bug in how implicit reads interact with use-use flow, and we get use-use flow from the capture library in some cases. I'm working on a fix for that in a separate PR. This DCA run shows the effect of merging that PR into this branch, and it shows a significant speed-up for vscode.

Performance

Although the potential performance fixes are still in flux, I think it would make sense to get this PR through review.

Results

Results from the above two evaluations show:

  • 32 new TPs
    • 30 new TPs from detecting prototype pollution in various copies of the $.extend() from a vulnerable version of jQuery.
    • 1 new TP from __dirname used in a shell command without escaping.
    • 1 new TP from js/xss-through-dom though not sure if exploitable
  • 3 fixed FPs, due to spurious flow through spread/rest parameter where indices got messed up
  • 5 new FPs (since we discover more flow paths in general)
    • 2 where a URL redirection is prevented by encodeURIComponent which is not currently a sanitiser.
    • 1 due to call/return mismatch after flow into a.href where the a element is shared in a captured variable, but data can't propagate across calls.
    • 2 due to complex code: a small templating engine and one flowing on the wrong side of a goog.isString(x) check
  • 3 new "low quality" alerts where the data flow path is valid, but the query seems too broad or it goes through a test case

@asgerf asgerf force-pushed the jss/spread-argument branch from 71b5736 to f65879e Compare August 27, 2024 10:19
asgerf added 2 commits August 29, 2024 13:42
Tainted URL suffix steps are added as configuration-specific additional
steps, which means implicit reads may occur before any of these steps.

These steps accidentally included the legacy taint steps which include
a step from 'arguments' to all positional parameters. Combined with the
implicit read, arguments could escape their array index and flow to
any parameter while in the tainted-url flow state.
@asgerf asgerf marked this pull request as ready for review August 30, 2024 09:45
@asgerf asgerf requested a review from a team as a code owner August 30, 2024 09:45
@asgerf asgerf requested a review from erik-krogh August 30, 2024 09:45
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most solid, well documented, and thoroughly tested piece of work that I've reviewed in quite a while ❤️

Everything is teste, well motivated, and you've nicely split the work into commits.
And the end result is clearly better than what we had before.

It was at times a little hard to follow, but I think that's mostly from me not working a lot with the shared dataflow library.

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few drive-by comments, overall LGTM.

Am I correct in understanding that when a function with a rest parameter

function foo(first, ...rest)

is called like

foo(x, ...xs)

then xs is not matched directly against rest, but rather via shifting at both the caller and the callee?

exists(Function f |
// When the first parameter is a rest parameter, flow into the rest parameter as a local flow step
// to ensure we preserve knowledge about array indices
(node1 = TStaticParameterArrayNode(f) or node1 = TDynamicParameterArrayNode(f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parens can be removed.

c = ContentSet::arrayElementUnknown()
)
or
// Prepare to store spread arguments into the dynamic arguments array, when it isn't the initial spread argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rather be: when it isn't the initial argument?

then
c = ContentSet::arrayElement() and // unknown start index when not the first spread operator
storeContent.isUnknownArrayElement()
else storeContent.asArrayIndex() = n + c.asArrayIndex()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or storeContent.isUnknownArrayElement() and c.isUnknownArrayElement()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch. Added a test and fixed.

Comment on lines 20 to 21
FlowSummaryPrivate::Steps::summaryStoreStep(node1.(FlowSummaryNode).getSummaryNode(),
ContentSet::arrayElement(), node2.(FlowSummaryNode).getSummaryNode())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not related to this PR): I'm surprised that this line is here. In contrast, I would expect all read steps involving array elements (not just those from flow summaries) to be lifted to taint steps (like Ruby, C#, and Java does).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sort of is related to this PR actually, as this implements part of the "implicit array-taint rule" I mentioned in the PR description, which we're trying to move away from. I might remove this line in a future PR.

@asgerf
Copy link
Contributor Author

asgerf commented Sep 5, 2024

(@hvitved) Am I correct in understanding that when a function with a rest parameter

function foo(first, ...rest)

is called like

foo(x, ...xs)

then xs is not matched directly against rest, but rather via shifting at both the caller and the callee?

Yes, unlike Ruby we just do the shift in both directions. We generally need the shifting nodes anyway since we have to materialise them before knowing the call graph.

@asgerf asgerf force-pushed the jss/spread-argument branch from 63bd757 to fb9732a Compare September 6, 2024 06:43
@asgerf
Copy link
Contributor Author

asgerf commented Sep 6, 2024

In the process of writing new tests I noticed another issue with array indices, but to avoid creeping this PR any further, I've added a test and a TODO comment in fb9732a and plan to address this in a future PR.

@hvitved and @erik-krogh would either of you like to have a look at the new commits?

@asgerf asgerf merged commit 094112c into github:js/shared-dataflow-branch Sep 9, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JS no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants