Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add type tracking for content #15711

Merged
merged 23 commits into from
Apr 9, 2024
Merged

Python: Add type tracking for content #15711

merged 23 commits into from
Apr 9, 2024

Conversation

RasmusWL
Copy link
Member

@RasmusWL RasmusWL commented Feb 23, 2024

Adds support for tracking more than just attribute content in type-trackers.

Adding the store/read steps is the most important part.

The tests I wrote along the way were against the call-graph. This was convenient since I knew them already, and I it's the motivation for adding this functionality... it might have been more prudent to expand the core type-tracking tests initially instead, but that got done as the very last commit only.

The fact that we don't handle my_dict.get("some_key") was very surprising. I don't want to handle that aspect in this PR though, but will examine it in a followup PR. For now it lives in my own fork of the repo: RasmusWL#113

RasmusWL added 17 commits March 14, 2024 10:42
Due to the char-pred of Content, this change should keep exactly the
same behavior as before.
This should not result in many changes, since store/load steps are still
only implemented for attributes.
While it might be useful to track content to any lookup, it's not
something we do right now.
Instead of just relying on the call-graph tests
I was initially surprised to see that this didn't work, until I
remembered that type-tracking only works with content of depth 1.
We do this to remove the inconsistencies, and to be ready for a future
where type-tracking support content tracker of depth > 1.

It works because targets of loadSteps needs to be LocalSourceNodes

predicate loadStep(Node nodeFrom, LocalSourceNode nodeTo, Content content) {
@RasmusWL RasmusWL marked this pull request as ready for review March 15, 2024 13:15
@RasmusWL RasmusWL requested a review from a team as a code owner March 15, 2024 13:15
@RasmusWL
Copy link
Member Author

DCA evaluation looks fine, new results looks like TPs 👍

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice development and good tests. It will be great to get this in! I have left some questions.

@@ -5,6 +5,7 @@

private import internal.TypeTrackingImpl as Impl
import Impl::Shared::TypeTracking<Impl::TypeTrackingInput>
private import semmle.python.dataflow.new.internal.DataFlowPublic as DataFlowPublic

/** A string that may appear as the name of an attribute or access path. */
class AttributeName = Impl::TypeTrackingInput::Content;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe deprecate this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, very good point 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

Comment on lines 644 to 655
/**
* Subset of `storeStep` that should be shared with type-tracking.
*/
predicate storeStepCommon(Node nodeFrom, ContentSet c, Node nodeTo) {
tupleStoreStep(nodeFrom, c, nodeTo)
or
dictStoreStep(nodeFrom, c, nodeTo)
or
moreDictStoreSteps(nodeFrom, c, nodeTo)
or
iterableUnpackingStoreStep(nodeFrom, c, nodeTo)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this not contain attributeStoreStep?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we characterise the steps that should go here? Are they the ones where c is precise content? (This question is partly about maintenance of this predicate.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍 let me write a comment 👍

About the attributeStoreStep, as you already found out, it's already covered in type-tracking with slightly different logic, so it would be a mistake to also include it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

Comment on lines 77 to 78
or
this instanceof IterableSequenceNode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the consequences of this is, did you test it in isolation? A new LocalSourceNode feels like potentially a big thing, since many predicates rely on this definition. Also, it should fit with the semantics, that a value is introduced in this place, but I guess that is arguably the case as it pops out of some container...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the consequences of this is, did you test it in isolation?

I did not. I think I did that part rather hastily, and it should have deserved more attention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

Comment on lines +185 to 190
exists(DataFlowPublic::AttrWrite a, string attrName |
content.(DataFlowPublic::AttributeContent).getAttribute() = attrName and
a.mayHaveAttributeName(attrName) and
nodeFrom = a.getValue() and
nodeTo = a.getObject()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it could be subsumed by attributeStoreStep (which could be part of storeStepCommon), if we are willing to replace mayHaveAttributeName with the slightly stricter getAttributeName. Do we remember why we are strict using dataflow and not using type tracking? Is it simply that we do not yet (as in evaluation order) have access to this small local flow reasoning during dataflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be interesting to be able to share this part as well between dataflow and type-tracking, but for this PR I didn't want to deal with that complexity, so I just kept the existing behavior.

Comment on lines +105 to +118
class Content extends DataFlowPublic::Content {
Content() {
// TODO: for now, it's not 100% clear if should support non-precise content in
// type-tracking, or if it will lead to bad results. We start with only allowing
// precise content, which should always be a good improvement! It also simplifies
// the process of examining new results from non-precise content steps in the
// future, since you will _only_ have to look over the results from the new
// non-precise steps.
this instanceof DataFlowPublic::AttributeContent
or
this instanceof DataFlowPublic::DictionaryElementContent
or
this instanceof DataFlowPublic::TupleElementContent
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is better to define a subset of Content called PreciseContent and refer to it here. I guess it depends on whether we expect to update content and want to remember also updating the list of precise content before we start editing this definition to not refer to precise content anymore. And on whether the concept of precise content might have value on its own...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, did you consider using ContentSet here like Ruby does (and like we do above for SummaryTypeTracker::Input)? It seems like, since we are in the area, we might as well be future proof...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start defining the same set of "precise content" again in the future, I'm all for it 👍

I think it would be nice to move to `ContentSet like Ruby, but let's keep that to an other PR 👍

Comment on lines +212 to 217
exists(DataFlowPublic::AttrRead a, string attrName |
content.(DataFlowPublic::AttributeContent).getAttribute() = attrName and
a.mayHaveAttributeName(attrName) and
nodeFrom = a.getObject() and
nodeTo = a
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as to the store step.

RasmusWL added 2 commits April 2, 2024 13:26
When looking things over a bit more, we could actually exclude the steps
that would never be used instead. A much more involved solution, but
more performance oriented and clear in terms of what is supported (at
least until we start supporting type-tracking with more than depth 1
access-path, if that ever happens)
@RasmusWL RasmusWL requested a review from yoff April 2, 2024 14:51
@RasmusWL
Copy link
Member Author

RasmusWL commented Apr 2, 2024

Did new performance test just to ensure the new step exclusions are not causing really bad join orders 🤔

EDIT: Performance looks comparable with the new commits 👍

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach, excluding impossible steps. I guess that if we revamp our use of content, simplifying to fewer kinds, and if that then makes simplification of iterable unpacking possible (because we can get rid of conversion steps) we might be able to delete these exclusions again...

@yoff yoff merged commit 1048cf7 into github:main Apr 9, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants