-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swift: implement type pruning for dataflow #14592
Swift: implement type pruning for dataflow #14592
Conversation
Note that this currently includes some commits from #14570, which will hopefully be rebased away once that PR is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review. This looks good to me, though (inevitably on a PR of this size and complexity) I've made quite a lot of minor comments / questions. Testing will be important here - though I gather you've done quite a bit of debugging already. I'll have a quick go with the branch myself later today.
} | ||
|
||
predicate typeStrongerThan(DataFlowType t1, DataFlowType t2) { none() } | ||
predicate typeStrongerThan(DataFlowType t1, DataFlowType t2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this predicate for? Should it have QLDoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in a few places in the dataflow library to select the more precise of two types, for instance when runtime overload resolution can tell us that a parameter has a more specific type than the argument that flows to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I requested QLDoc when it was introduced here: #13083 (comment), but it was postponed until we had turned the dataflow library into a shared qlpack. This has now been the case for a while so we should probably add QLDoc to this one now (cc @aschackmull)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go: #14612
class DataFlowType extends TDataFlowType { | ||
string toString() { result = "" } | ||
class DataFlowType extends Type { | ||
DataFlowType() { this.getCanonicalType() = this } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the canonical type of a type exactly? (the QLDoc sadly reads "Gets the canonical type of this type.").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the unique type after resolving aliases and desugaring. For instance, if we have typealias MyInt == Int
, then [MyInt?]
has the canonical type Array<Optional<Int>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to add getCanonicalType
to Expr
, Pattern
, and whatever form of Decl
makes sense, in the same way that we have getUnspecifiedType
for C++
predicate compatibleTypes(DataFlowType t1, DataFlowType t2) { | ||
exists(DataFlowType commonSub | | ||
commonSub.getABaseType*().getCanonicalType() = stripType(t1) and | ||
commonSub.getABaseType*().getCanonicalType() = stripType(t2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about performance here, though probably the pragma[inline]
keeps it relatively constrained. I guess DCA will confirm or refute my concerns...
@@ -385,7 +385,6 @@ module MakeImplCommon<InputSig Lang> { | |||
result = viableCallableLambda(call, _) | |||
} | |||
|
|||
cached | |||
private module Cached { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the name of this module if it's no longer cached
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I did that for debugging and didn't mean for it to land in the final commit.
or | ||
t1 instanceof AnyType | ||
or | ||
t2 instanceof AnyType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to account for stuff like AnyType?
here?
f1.getResult() = f2.getResult() and | ||
not exists(int i | i in [0 .. f1.getNumberOfParamTypes()] | | ||
f1.getParamType(i) != f2.getParamType(i) | ||
) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected the =
and !=
in here to be recursive calls to compatibleTypes
. Should they be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally they would... Unfortunately doing that forces the relation to be materialized, which seems to be infeasible.
@@ -1062,7 +1116,46 @@ string ppReprType(DataFlowType t) { none() } | |||
* a node of type `t1` to a node of type `t2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth stating that this relationship is not commutative, i.e. in some cases compatibleTypes(a, b)
is not the same as compatibleTypes(b, a)
? I found this surprising (but I think I understand why).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation it's actually commutative. It isn't transitive, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use of implies
in the bit for a FunctionType
makes it not commutative. compatibleTypes(notThrowingFunction, throwingFunction)
might hold while compatibleTypes(throwingFunction, notThrowingFunction)
does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatibleTypes
is actually required to be symmetric/commutative:
* This predicate must be symmetric and reflexive. |
or | ||
this.asExpr() instanceof InOutExpr | ||
or | ||
this.asExpr() instanceof InOutToPointerExpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly ForceValueExpr
, OptionalSomePattern
, AnyTryExpr
, VarargExpansionExpr
??? I'm really not sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, all of those can change the type
if this.hasSelfParam() | ||
then result = this.getInterfaceType().(AnyFunctionType).getResult().(AnyFunctionType).getResult() | ||
else result = this.getInterfaceType().(AnyFunctionType).getResult() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be widely useful. 👍
@@ -0,0 +1,7 @@ | |||
private import swift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the qldoc checks that this file and the AnyType
class should have QLDoc. It can be short and to the point.
) and | ||
(f1.isThrowing() implies f2.isThrowing()) and | ||
(f1.isAsync() implies f2.isAsync()) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just noticed t1
, t2
are not bound to f1
, f2
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh jeez, that might be why it was blowing up...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm seeing performance issues as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking a few days to get back to this. I've asked a couple of questions to aid my understanding.
We will need a new DCA run on this PR, when you think it's ready.
not t instanceof AnyFunctionType | ||
} or | ||
TTopFunctionType() or | ||
TDataFlowFunctionType(DataFlowCallable c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between TTopFunctionType
and TDataFlowFunctionType
? Why do we need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTopFunctionType allows any function type, DataFlowFunctionType is a specific callable (and can be used to improve precision of lambda call resolution)
) | ||
or | ||
rk.(ParamReturnKind).getIndex() = -1 and | ||
// c.getSelfParam().isInout() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this line and 5 lines above get commented out?
exists(t) and | ||
exists(pos) and | ||
result.asType() instanceof AnyType | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate only applies to nodes that are synthesized from a callback in a modeled function. The AnyType
is the top of Swift's type hierarchy (although it's not included in the getABaseType
relationship), so we're allowing any type here. It's certainly possible to be more precise, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a cartesian product between all DataFlowType
s and all ArgumentPosition
s, right? This was fine before because DataFlowType
was a unit type, but this looks like a real CP now, though? Or is the assumption that this fine because there aren't that many ArgumentPosition
s?
32ed674
to
1ee6a03
Compare
subType(t1) = subType(t2) | ||
or | ||
exists(BoundGenericType bound1, BoundGenericType bound2 | | ||
stripType(t1.asType()).(BoundGenericType) = bound1 and |
Check warning
Code scanning / CodeQL
Redundant cast Warning
BoundGenericType
or | ||
exists(BoundGenericType bound1, BoundGenericType bound2 | | ||
stripType(t1.asType()).(BoundGenericType) = bound1 and | ||
stripType(t2.asType()).(BoundGenericType) = bound2 and |
Check warning
Code scanning / CodeQL
Redundant cast Warning
BoundGenericType
/* | ||
* The Swift `Any` type, which is a supertype of all other types. | ||
*/ |
Check warning
Code scanning / CodeQL
Block comment that is not QLDoc Warning
the below code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments (mostly just questions to help me understand the code), but I'm really happy that we're almost there 🎉
result.asCallable().asSourceCallable() = expr | ||
or | ||
result.asCallable().asSourceCallable() = expr.(DeclRefExpr).getDecl() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need both cases here? Is that to handle each callable having their own type?
In any case, a comment on this would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's handling lambdas vs references to functions
@@ -97,6 +120,10 @@ private class SsaDefinitionNodeImpl extends SsaDefinitionNode, NodeImpl { | |||
override DataFlowCallable getEnclosingCallable() { | |||
result = TDataFlowFunc(def.getBasicBlock().getScope()) | |||
} | |||
|
|||
override DataFlowType getTypeImpl() { | |||
result = getDataFlowType(this.asDefinition().getSourceVariable().asVarDecl().getType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all SourceVariable
s has a result for asVarDecl
(in particular, some of them are related to key paths: https://github.com/github/codeql/blob/main/swift/ql/lib/codeql/swift/dataflow/Ssa.qll#L28).
It seems like these don't have a type now? Should we instead have a getType
predicate on SourceVariable
(that all instances of the abstract class then implements) so that this predicate could instead be implemented as getDataFlowType(this.asDefinition().getSourceVariable())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
override DataFlowType getTypeImpl() { | ||
exists(BoundGenericType dictType, TupleType tupleType | | ||
dictType = expr.getBase().getType().getCanonicalType() and | ||
tupleType.getType(0) = dictType.getArgType(0) and | ||
tupleType.getType(1) = dictType.getArgType(1) and | ||
tupleType.getNumberOfTypes() = 2 and | ||
result.asType() = tupleType | ||
// TODO: handle generic parameter types | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very complicated 😨. Since DictionarySubscriptNode
is really just an intermediate node that follows a subscript so that we can push both a TupleContent
and a CollectionContent
: could we instead simply get the type here by forwarding the call to the node representing the subscript expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so - the type of this node needs to be a tuple type so the read steps work out correctly.
@@ -749,6 +811,10 @@ private module ReturnNodes { | |||
override string toStringImpl() { result = exit.toString() } | |||
|
|||
KeyPathExpr getKeyPathExpr() { result = exit.getScope() } | |||
|
|||
override DataFlowType getTypeImpl() { | |||
result = getDataFlowType(this.getKeyPathExpr().getType().(BoundGenericType).getArgType(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be getArgType(1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the above comment.
@@ -79,6 +98,8 @@ private class KeyPathComponentPostUpdateNode extends TKeyPathComponentPostUpdate | |||
} | |||
|
|||
KeyPathComponent getComponent() { result = component } | |||
|
|||
override DataFlowType getTypeImpl() { result = getDataFlowType(component.getComponentType()) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to implement getTypeImpl
on all the post-update nodes? Can the result of getTypeImpl
on a post-update node not simply be the result of calling getTypeImpl
on the pre-update node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what we do actually implement this predicate on PostUpdateNode
s: https://github.com/github/codeql/pull/14592/files#diff-514dd0988f7b47e40ff8c1e66ea886d1f947d047da108270ed0ddec75c2856c1R1531. Do we really need this (and similar) overrides then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not. I can delete and recheck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was actually only on ExprPostUpdateNode
s. Moving it to PostUpdateNodeImpl
simplifies some things, although it leads to a couple of duplicate overrides. It also fixed a bug with the postupdate nodes on keypaths.
exists(TupleType tuple1, TupleType tuple2 | | ||
stripType(t1.asType()) = tuple1 and | ||
stripType(t2.asType()) = tuple2 and | ||
compatibleTuples(tuple1, tuple2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar question here: I think that compatibleTuples
holds when there is a common index in which both tuple types is either the any type or an unbound type. When no such index exists, does that mean they're compatible because of the first condition in compatibleTypes
?
/** | ||
* Gets the type of this node. | ||
*/ | ||
DataFlowType getType() { result = this.(NodeImpl).getTypeImpl() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to expose the DataFlowType
type to users?
exists(t) and | ||
exists(pos) and | ||
result.asType() instanceof AnyType | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a cartesian product between all DataFlowType
s and all ArgumentPosition
s, right? This was fine before because DataFlowType
was a unit type, but this looks like a real CP now, though? Or is the assumption that this fine because there aren't that many ArgumentPosition
s?
DataFlowType getCallbackReturnType(DataFlowType t, ReturnKind rk) { | ||
exists(t) and | ||
exists(rk) and | ||
result.asType() instanceof AnyType | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. This looks like a CP between all types and all return kinds. I guess this is fine if the getCallbackParameterType
case is also fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was fine but we don't actually need these anymore - they were part of the old API but not the new one.
TupleType asUnlabeled() { | ||
result.getNumberOfTypes() = this.getNumberOfTypes() and | ||
not exists(int index | result.getType(index) != this.getType(index)) and | ||
not exists(result.getName(_)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- This looks like something that could blow up with a bad join order, right? That is, it we start by joining together all
TupleTypes
that has the same number of types. Does the joins look okay in this case? - Does the unlabeled version of a tuple type always exist? Or should the
QLDoc
say[...], if any.
? And if this predicate really is partial: Is that a problem for the way we use it instripType
?
Today is my last day before Christmas, so I'm probably not going to get chance to review the final version of this. I just want to post something to make it clear I do not expect you to wait for my final 👍. I trust @rdmarsh2 s work and @MathiasVP s judgement, so once you're both happy I'm happy. I do suggest you do another DCA run at some point though, and make sure any remaining regressions in performance or query results are at least understood. |
I think there are some comments above that haven't yet been addressed. Please would you reply to them, fix the failing checks, and state when this PR is ready to review again? Is there anything we can do to help? |
This PR implements type pruning for dataflow by adding
DataFlowType
, which corresponds to only the canonical types in the Swift type hierarchy, adding a newgetType
member predicate toDataFlow::Node
, and using them to implement thegetNodeType
predicate that type pruning relies on.