-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JS: Improve handling of spread arguments and rest parameters [shared data flow branch] #17213
JS: Improve handling of spread arguments and rest parameters [shared data flow branch] #17213
Conversation
620987b
to
4de444b
Compare
4de444b
to
36457f6
Compare
Shows the effect of github#17262
71b5736
to
f65879e
Compare
Tainted URL suffix steps are added as configuration-specific additional steps, which means implicit reads may occur before any of these steps. These steps accidentally included the legacy taint steps which include a step from 'arguments' to all positional parameters. Combined with the implicit read, arguments could escape their array index and flow to any parameter while in the tainted-url flow state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the most solid, well documented, and thoroughly tested piece of work that I've reviewed in quite a while ❤️
Everything is teste, well motivated, and you've nicely split the work into commits.
And the end result is clearly better than what we had before.
It was at times a little hard to follow, but I think that's mostly from me not working a lot with the shared dataflow library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few drive-by comments, overall LGTM.
Am I correct in understanding that when a function with a rest parameter
function foo(first, ...rest)
is called like
foo(x, ...xs)
then xs
is not matched directly against rest
, but rather via shifting at both the caller and the callee?
exists(Function f | | ||
// When the first parameter is a rest parameter, flow into the rest parameter as a local flow step | ||
// to ensure we preserve knowledge about array indices | ||
(node1 = TStaticParameterArrayNode(f) or node1 = TDynamicParameterArrayNode(f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parens can be removed.
c = ContentSet::arrayElementUnknown() | ||
) | ||
or | ||
// Prepare to store spread arguments into the dynamic arguments array, when it isn't the initial spread argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this rather be: when it isn't the initial argument
?
then | ||
c = ContentSet::arrayElement() and // unknown start index when not the first spread operator | ||
storeContent.isUnknownArrayElement() | ||
else storeContent.asArrayIndex() = n + c.asArrayIndex() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or storeContent.isUnknownArrayElement() and c.isUnknownArrayElement()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch. Added a test and fixed.
FlowSummaryPrivate::Steps::summaryStoreStep(node1.(FlowSummaryNode).getSummaryNode(), | ||
ContentSet::arrayElement(), node2.(FlowSummaryNode).getSummaryNode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not related to this PR): I'm surprised that this line is here. In contrast, I would expect all read steps involving array elements (not just those from flow summaries) to be lifted to taint steps (like Ruby, C#, and Java does).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sort of is related to this PR actually, as this implements part of the "implicit array-taint rule" I mentioned in the PR description, which we're trying to move away from. I might remove this line in a future PR.
Yes, unlike Ruby we just do the shift in both directions. We generally need the shifting nodes anyway since we have to materialise them before knowing the call graph. |
This happened as a result of the bugfix in the previous commit
63bd757
to
fb9732a
Compare
In the process of writing new tests I noticed another issue with array indices, but to avoid creeping this PR any further, I've added a test and a TODO comment in fb9732a and plan to address this in a future PR. @hvitved and @erik-krogh would either of you like to have a look at the new commits? |
(Targeting
js/shared-dataflow-branch
)This greatly improves our ability to handle spread arguments and rest parameters, as well as
.apply()
calls and thearguments
array.The basic idea is that each call materialises an array of arguments using a bunch of store edges, which is read back using read edges at the callee. But as usual there are a few complications which I'll talk about below
Split into static and dynamic argument arrays
The initial pruning phase of data flow does not track contents at all. So if we always pass arguments via contents, then all argument positions will get mixed up during the initial forward pass. This would happen even for call sites and functions that don't actually use spread/rest.
To ensure "simple" call sites and functions are handled precisely in the initial phase, we pass in two argument arrays, one for "static" arguments and one for "dynamic" arguments.
Arguments passed in via spread arguments or
.apply()
are called "dynamic arguments". Rest parameters and thearguments
object are said to be "dynamic parameters"..apply()
flows directly into the dynamic argument array.arguments
.Shifting indices
When a rest parameter or spread argument appears at an index >0 we need to shift array indices. This currently requires synthesising a bunch of nodes and store/read edges for the first 10 array indices. This is similar to how Ruby does it.
When a rest parameter or spread arguments appears at index 0, we just generate a local flow step instead so we avoid a redundant zero-shift.
Super calls in default constructors
Part of the reason for doing this is that we have better support for flow through
super()
calls, and since default constructors desugar into code containing asuper(...args)
call, we get spurious flow if we don't also have precise handling of spread arguments and rest parameters. Tbh I was expecting a bigger impact from this; in the end only 3 FPs were fixed, but still worth doing now that the work is done.Removing the implicit taint rule for arrays
With the old data flow library we mostly followed by the rule that if any element of an array is tainted, then the whole array is tainted. This rule has been gradually relaxed as we've added support for precise tracking of array contents, but a few holdovers remain, such as array literals being tainted if any of their members were tainted.
Taint tracking just doesn't play nice with rest/spread if we keep that rule, so I'm moving towards removing it in this PR. To avoid breaking everything however, I've added array element contents to
defaultImplicitTaintRead
, meaning that any sink or configuration-specific additional taint step can implicitly read array elements. Note that only configuration-specific taint steps can use the implicit read, steps from library models do not, so we may have some work to do in updating our models. But that's for a future PR.Note that there is a known bug in how implicit reads interact with use-use flow, and we get use-use flow from the capture library in some cases. I'm working on a fix for that in a separate PR. This DCA run shows the effect of merging that PR into this branch, and it shows a significant speed-up for vscode.
Performance
Although the potential performance fixes are still in flux, I think it would make sense to get this PR through review.
Results
Results from the above two evaluations show:
$.extend()
from a vulnerable version of jQuery.__dirname
used in a shell command without escaping.js/xss-through-dom
though not sure if exploitableencodeURIComponent
which is not currently a sanitiser.a.href
where thea
element is shared in a captured variable, but data can't propagate across calls.goog.isString(x)
check