Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: extract and expose struct tags, interface method IDs #17357

Merged

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Sep 3, 2024

This enables us to distinguish all database types in QL. Previously structs with the same field names and types but differing tags, and interface types with matching method names and at least one non-exported method but declared in differing packages, were impossible or only sometimes possible to distinguish in QL. With this change these types can be
distinguished, as well as permitting queries to examine struct field tags, e.g. to read JSON field name associations.

This is a pre-requisite to (some approaches to) dealing with Go 1.23's more direct exposure of type aliases, since it enables us to distinguish all types that are distinct in the database in QL, and therefore implement up-to-aliasing type matching, known in the Go spec as identical types.

@smowton smowton requested a review from a team as a code owner September 3, 2024 11:19
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Longer review to follow.

go/extractor/dbscheme/tables.go Outdated Show resolved Hide resolved
Copy link
Member

@mbg mbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks good! Thank you for improving this and moving it out into it's own PR. I just have a few suggestions in addition to @owen-mc's comments, which also make sense.

Also, to sanity check: in the PR description you discuss that part of the motivation here is to be able to distinguish types better. That makes sense and I found the relevant part of the Go specification for this in https://go.dev/ref/spec#Type_identity. For structs:

Two struct types are identical if they have the same sequence of fields, and if corresponding fields have the same names, and identical types, and identical tags. Non-exported field names from different packages are always different.

For interfaces:

Two interface types are identical if they define the same type set.

Looking over the tests here, I can see that the tests exercise the new functionality and that seems to behave as expected. Do the tests cover the new ability to decide (in)equality that you are hoping for? Could you comment on how the tests cover that?

go/extractor/extractor.go Outdated Show resolved Hide resolved
go/extractor/dbscheme/tables.go Outdated Show resolved Hide resolved
go/ql/lib/semmle/go/Types.qll Outdated Show resolved Hide resolved
go/ql/lib/semmle/go/Types.qll Outdated Show resolved Hide resolved
go/ql/test/library-tests/semmle/go/Types/InterfaceIds.ql Outdated Show resolved Hide resolved
@owen-mc
Copy link
Contributor

owen-mc commented Sep 3, 2024

Tests failing:

The following files need to be reformatted using gofmt or have compilation errors:
./ql/test/library-tests/semmle/go/Types/pkg2/tst.go
Error: make: *** [Makefile:15: check-formatting] Error 1
./ql/test/library-tests/semmle/go/Types/struct_tags.go
Error: Process completed with exit code 2.

@smowton smowton changed the base branch from rc/3.15 to main September 3, 2024 15:45
@smowton
Copy link
Contributor Author

smowton commented Sep 3, 2024

Retargeted this against main because we're currently not expecting to need this for rc/3.15 if we go for a simpler alias-erasing approach in the interim

Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work spotting these problems and fixing them. A few small suggestions for improvement.

Also, shouldn't the label for struct types include the tag of each field? Since differing tags make it a different struct type? Ideally this would have a test as well. This could be done as a follow-up, but it also fits in pretty naturally with this PR.

go/extractor/extractor.go Outdated Show resolved Hide resolved
go/extractor/dbscheme/tables.go Outdated Show resolved Hide resolved
@smowton
Copy link
Contributor Author

smowton commented Sep 20, 2024

Note: at #17341 I added a stats update since there are new db tables and they ought to have associated stats. This evidently caused join-order problems since DCA flipped from just fine to catastrophic, so join order fixery (or simply accepting missing stats -- I note C# recently totally removed them and simply manually hacked their join orders where necessary) will be necessary before this can be merged.

@smowton smowton force-pushed the smowton/feature/go-indistinguishable-types branch from fe2ef27 to 41ffbdc Compare September 30, 2024 16:25
@smowton
Copy link
Contributor Author

smowton commented Sep 30, 2024

@mbg @owen-mc all comments applied.

I have also taken the liberty of resurrecting #9386 and including it here since we have a dbscheme change afoot anyway.

@smowton
Copy link
Contributor Author

smowton commented Sep 30, 2024

Oh, except, @mbg no there is no direct test of distinguishing two types using these functions -- there isn't a direct identical-type predicate in this PR to test, and @owen-mc getTypeLabel already adds struct-type tags into the label, hence how before this PR two structs that differed only in their tags would be different QL entities, but there would be no QL predicate that could tell them apart.

Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few optional suggestions and one change I feel more strongly about.

go/extractor/dbscheme/tables.go Outdated Show resolved Hide resolved
go/extractor/dbscheme/dbscheme.go Show resolved Hide resolved
Comment on lines 781 to 785
* For example, `interface { Exported() int; notExported() int }` declared in two
* different packages defines two distinct types, but they appear identical according to
* `getMethodType`. If the packages were named `a` and `b`, `getMethodType` would yield
* `notExported -> int` for both, whereas this method would yield `a.notExported -> int`
* and `b.notExported -> int` respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] I think this example would be a bit clearer without the exported method.

Suggested change
* For example, `interface { Exported() int; notExported() int }` declared in two
* different packages defines two distinct types, but they appear identical according to
* `getMethodType`. If the packages were named `a` and `b`, `getMethodType` would yield
* `notExported -> int` for both, whereas this method would yield `a.notExported -> int`
* and `b.notExported -> int` respectively.
* For example, `interface { notExported() int }` declared in two different packages
* defines two distinct types, but they appear identical according to `getMethodType`.
* If the packages were named `a` and `b`, `getMethodType` would yield
* `notExported -> int` for both, whereas this method would yield `a.notExported -> int`
* and `b.notExported -> int` respectively.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a mention of Exported instead.

go/ql/lib/semmle/go/Types.qll Outdated Show resolved Hide resolved
owen-mc
owen-mc previously approved these changes Oct 1, 2024
@smowton
Copy link
Contributor Author

smowton commented Oct 1, 2024

(I'll hold off on merging this for now since there are still moderate performance issues brought about by the stats update)

@smowton
Copy link
Contributor Author

smowton commented Oct 2, 2024

DCA is now pretty good after the latest wave. Considering all the join-order tweaks needed based on the new stats, I'll do a QA run to get a larger performance sample.

@smowton
Copy link
Contributor Author

smowton commented Oct 2, 2024

QA results were broadly very strong -- analysis time reductions were much more common than increases -- but I've debugged a few of the notably somewhat-slower projects and made one last DCA run to retest the usual suite + those projects showing a slowdown on QA.

mbg
mbg previously approved these changes Oct 3, 2024
Copy link
Member

@mbg mbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this for when you're happy with the performance results

owen-mc
owen-mc previously approved these changes Oct 4, 2024
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge when performance is good enough.

owen-mc
owen-mc previously approved these changes Oct 4, 2024
owen-mc
owen-mc previously approved these changes Oct 4, 2024
This enables us to distinguish all database types in QL. Previously structs with the same field names and types but differing tags, and interface types with matching method names and at least one non-exported method but declared in differing packages, were impossible or only sometimes possible to distinguish in QL. With this change these types can be distinguished, as well as permitting queries to examine struct field tags, e.g. to read JSON field name associations.
@smowton smowton force-pushed the smowton/feature/go-indistinguishable-types branch from bc49db4 to 837387a Compare October 8, 2024 18:23
@smowton
Copy link
Contributor Author

smowton commented Oct 9, 2024

At long last -- performance results are good. QA shows an overall -8.5% time spent running queries, though over half of that comes from one leviathan project whose 2h30 analysis now takes 15 minutes. There are a small number of QA projects that recurred across a few runs showing moderate-sized (20-second or so) increases in runtime, but which didn't have an obvious cause looking at predicate timing tables, or comparing the RA of the most expensive predicates against the RA generated on main -- my guess is that these are cases where scheduling and/or cache eviction is perturbed for the worse, and for now I'm going to have to let them go, to revisit if we see a more flagrant problem.

@smowton smowton merged commit 58fd1a2 into github:main Oct 9, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants