Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift: implement type pruning for dataflow #14592

Closed

Conversation

rdmarsh2
Copy link
Contributor

This PR implements type pruning for dataflow by adding DataFlowType, which corresponds to only the canonical types in the Swift type hierarchy, adding a new getType member predicate to DataFlow::Node, and using them to implement the getNodeType predicate that type pruning relies on.

@rdmarsh2
Copy link
Contributor Author

Note that this currently includes some commits from #14570, which will hopefully be rebased away once that PR is merged.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review. This looks good to me, though (inevitably on a PR of this size and complexity) I've made quite a lot of minor comments / questions. Testing will be important here - though I gather you've done quite a bit of debugging already. I'll have a quick go with the branch myself later today.

}

predicate typeStrongerThan(DataFlowType t1, DataFlowType t2) { none() }
predicate typeStrongerThan(DataFlowType t1, DataFlowType t2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this predicate for? Should it have QLDoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in a few places in the dataflow library to select the more precise of two types, for instance when runtime overload resolution can tell us that a parameter has a more specific type than the argument that flows to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I requested QLDoc when it was introduced here: #13083 (comment), but it was postponed until we had turned the dataflow library into a shared qlpack. This has now been the case for a while so we should probably add QLDoc to this one now (cc @aschackmull)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you go: #14612

class DataFlowType extends TDataFlowType {
string toString() { result = "" }
class DataFlowType extends Type {
DataFlowType() { this.getCanonicalType() = this }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the canonical type of a type exactly? (the QLDoc sadly reads "Gets the canonical type of this type.").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the unique type after resolving aliases and desugaring. For instance, if we have typealias MyInt == Int, then [MyInt?] has the canonical type Array<Optional<Int>>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to add getCanonicalType to Expr, Pattern, and whatever form of Decl makes sense, in the same way that we have getUnspecifiedType for C++

predicate compatibleTypes(DataFlowType t1, DataFlowType t2) {
exists(DataFlowType commonSub |
commonSub.getABaseType*().getCanonicalType() = stripType(t1) and
commonSub.getABaseType*().getCanonicalType() = stripType(t2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about performance here, though probably the pragma[inline] keeps it relatively constrained. I guess DCA will confirm or refute my concerns...

@@ -385,7 +385,6 @@ module MakeImplCommon<InputSig Lang> {
result = viableCallableLambda(call, _)
}

cached
private module Cached {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about the name of this module if it's no longer cached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I did that for debugging and didn't mean for it to land in the final commit.

or
t1 instanceof AnyType
or
t2 instanceof AnyType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to account for stuff like AnyType? here?

f1.getResult() = f2.getResult() and
not exists(int i | i in [0 .. f1.getNumberOfParamTypes()] |
f1.getParamType(i) != f2.getParamType(i)
) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected the = and != in here to be recursive calls to compatibleTypes. Should they be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally they would... Unfortunately doing that forces the relation to be materialized, which seems to be infeasible.

@@ -1062,7 +1116,46 @@ string ppReprType(DataFlowType t) { none() }
* a node of type `t1` to a node of type `t2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth stating that this relationship is not commutative, i.e. in some cases compatibleTypes(a, b) is not the same as compatibleTypes(b, a)? I found this surprising (but I think I understand why).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation it's actually commutative. It isn't transitive, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use of implies in the bit for a FunctionType makes it not commutative. compatibleTypes(notThrowingFunction, throwingFunction) might hold while compatibleTypes(throwingFunction, notThrowingFunction) does not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatibleTypes is actually required to be symmetric/commutative:

* This predicate must be symmetric and reflexive.
. I do think that this relation is symmetric now, though, right?

or
this.asExpr() instanceof InOutExpr
or
this.asExpr() instanceof InOutToPointerExpr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly ForceValueExpr, OptionalSomePattern, AnyTryExpr, VarargExpansionExpr??? I'm really not sure though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all of those can change the type

if this.hasSelfParam()
then result = this.getInterfaceType().(AnyFunctionType).getResult().(AnyFunctionType).getResult()
else result = this.getInterfaceType().(AnyFunctionType).getResult()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be widely useful. 👍

@@ -0,0 +1,7 @@
private import swift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the qldoc checks that this file and the AnyType class should have QLDoc. It can be short and to the point.

) and
(f1.isThrowing() implies f2.isThrowing()) and
(f1.isAsync() implies f2.isAsync())
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just noticed t1, t2 are not bound to f1, f2 in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh jeez, that might be why it was blowing up...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm seeing performance issues as well.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking a few days to get back to this. I've asked a couple of questions to aid my understanding.

We will need a new DCA run on this PR, when you think it's ready.

not t instanceof AnyFunctionType
} or
TTopFunctionType() or
TDataFlowFunctionType(DataFlowCallable c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between TTopFunctionType and TDataFlowFunctionType? Why do we need both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTopFunctionType allows any function type, DataFlowFunctionType is a specific callable (and can be used to improve precision of lambda call resolution)

)
or
rk.(ParamReturnKind).getIndex() = -1 and
// c.getSelfParam().isInout() and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this line and 5 lines above get commented out?

exists(t) and
exists(pos) and
result.asType() instanceof AnyType
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here?

Copy link
Contributor Author

@rdmarsh2 rdmarsh2 Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate only applies to nodes that are synthesized from a callback in a modeled function. The AnyType is the top of Swift's type hierarchy (although it's not included in the getABaseType relationship), so we're allowing any type here. It's certainly possible to be more precise, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a cartesian product between all DataFlowTypes and all ArgumentPositions, right? This was fine before because DataFlowType was a unit type, but this looks like a real CP now, though? Or is the assumption that this fine because there aren't that many ArgumentPositions?

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh2/swift/dataflow-type-pruning branch from 32ed674 to 1ee6a03 Compare December 13, 2023 19:40
subType(t1) = subType(t2)
or
exists(BoundGenericType bound1, BoundGenericType bound2 |
stripType(t1.asType()).(BoundGenericType) = bound1 and

Check warning

Code scanning / CodeQL

Redundant cast Warning

Redundant cast to
BoundGenericType
.
or
exists(BoundGenericType bound1, BoundGenericType bound2 |
stripType(t1.asType()).(BoundGenericType) = bound1 and
stripType(t2.asType()).(BoundGenericType) = bound2 and

Check warning

Code scanning / CodeQL

Redundant cast Warning

Redundant cast to
BoundGenericType
.
Comment on lines +6 to +8
/*
* The Swift `Any` type, which is a supertype of all other types.
*/

Check warning

Code scanning / CodeQL

Block comment that is not QLDoc Warning

Block comment could be QLDoc for
the below code
.
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments (mostly just questions to help me understand the code), but I'm really happy that we're almost there 🎉

Comment on lines +50 to +52
result.asCallable().asSourceCallable() = expr
or
result.asCallable().asSourceCallable() = expr.(DeclRefExpr).getDecl()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both cases here? Is that to handle each callable having their own type?

In any case, a comment on this would be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's handling lambdas vs references to functions

@@ -97,6 +120,10 @@ private class SsaDefinitionNodeImpl extends SsaDefinitionNode, NodeImpl {
override DataFlowCallable getEnclosingCallable() {
result = TDataFlowFunc(def.getBasicBlock().getScope())
}

override DataFlowType getTypeImpl() {
result = getDataFlowType(this.asDefinition().getSourceVariable().asVarDecl().getType())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all SourceVariables has a result for asVarDecl (in particular, some of them are related to key paths: https://github.com/github/codeql/blob/main/swift/ql/lib/codeql/swift/dataflow/Ssa.qll#L28).

It seems like these don't have a type now? Should we instead have a getType predicate on SourceVariable (that all instances of the abstract class then implements) so that this predicate could instead be implemented as getDataFlowType(this.asDefinition().getSourceVariable())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +424 to +433
override DataFlowType getTypeImpl() {
exists(BoundGenericType dictType, TupleType tupleType |
dictType = expr.getBase().getType().getCanonicalType() and
tupleType.getType(0) = dictType.getArgType(0) and
tupleType.getType(1) = dictType.getArgType(1) and
tupleType.getNumberOfTypes() = 2 and
result.asType() = tupleType
// TODO: handle generic parameter types
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very complicated 😨. Since DictionarySubscriptNode is really just an intermediate node that follows a subscript so that we can push both a TupleContent and a CollectionContent: could we instead simply get the type here by forwarding the call to the node representing the subscript expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so - the type of this node needs to be a tuple type so the read steps work out correctly.

@@ -749,6 +811,10 @@ private module ReturnNodes {
override string toStringImpl() { result = exit.toString() }

KeyPathExpr getKeyPathExpr() { result = exit.getScope() }

override DataFlowType getTypeImpl() {
result = getDataFlowType(this.getKeyPathExpr().getType().(BoundGenericType).getArgType(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be getArgType(1)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above comment.

@@ -79,6 +98,8 @@ private class KeyPathComponentPostUpdateNode extends TKeyPathComponentPostUpdate
}

KeyPathComponent getComponent() { result = component }

override DataFlowType getTypeImpl() { result = getDataFlowType(component.getComponentType()) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to implement getTypeImpl on all the post-update nodes? Can the result of getTypeImpl on a post-update node not simply be the result of calling getTypeImpl on the pre-update node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what we do actually implement this predicate on PostUpdateNodes: https://github.com/github/codeql/pull/14592/files#diff-514dd0988f7b47e40ff8c1e66ea886d1f947d047da108270ed0ddec75c2856c1R1531. Do we really need this (and similar) overrides then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. I can delete and recheck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was actually only on ExprPostUpdateNodes. Moving it to PostUpdateNodeImpl simplifies some things, although it leads to a couple of duplicate overrides. It also fixed a bug with the postupdate nodes on keypaths.

exists(TupleType tuple1, TupleType tuple2 |
stripType(t1.asType()) = tuple1 and
stripType(t2.asType()) = tuple2 and
compatibleTuples(tuple1, tuple2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here: I think that compatibleTuples holds when there is a common index in which both tuple types is either the any type or an unbound type. When no such index exists, does that mean they're compatible because of the first condition in compatibleTypes?

/**
* Gets the type of this node.
*/
DataFlowType getType() { result = this.(NodeImpl).getTypeImpl() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to expose the DataFlowType type to users?

exists(t) and
exists(pos) and
result.asType() instanceof AnyType
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a cartesian product between all DataFlowTypes and all ArgumentPositions, right? This was fine before because DataFlowType was a unit type, but this looks like a real CP now, though? Or is the assumption that this fine because there aren't that many ArgumentPositions?

Comment on lines 81 to 85
DataFlowType getCallbackReturnType(DataFlowType t, ReturnKind rk) {
exists(t) and
exists(rk) and
result.asType() instanceof AnyType
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This looks like a CP between all types and all return kinds. I guess this is fine if the getCallbackParameterType case is also fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was fine but we don't actually need these anymore - they were part of the old API but not the new one.

Comment on lines +13 to +17
TupleType asUnlabeled() {
result.getNumberOfTypes() = this.getNumberOfTypes() and
not exists(int index | result.getType(index) != this.getType(index)) and
not exists(result.getName(_))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  • This looks like something that could blow up with a bad join order, right? That is, it we start by joining together all TupleTypes that has the same number of types. Does the joins look okay in this case?
  • Does the unlabeled version of a tuple type always exist? Or should the QLDoc say [...], if any.? And if this predicate really is partial: Is that a problem for the way we use it in stripType?

@geoffw0
Copy link
Contributor

geoffw0 commented Dec 15, 2023

Today is my last day before Christmas, so I'm probably not going to get chance to review the final version of this. I just want to post something to make it clear I do not expect you to wait for my final 👍. I trust @rdmarsh2 s work and @MathiasVP s judgement, so once you're both happy I'm happy. I do suggest you do another DCA run at some point though, and make sure any remaining regressions in performance or query results are at least understood.

@geoffw0
Copy link
Contributor

geoffw0 commented Jan 4, 2024

I think there are some comments above that haven't yet been addressed. Please would you reply to them, fix the failing checks, and state when this PR is ready to review again? Is there anything we can do to help?

@rdmarsh2 rdmarsh2 closed this Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants