Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: Support Go 1.23 (Explicit aliases) #17058

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Go: Support Go 1.23 (Explicit aliases) #17058

wants to merge 27 commits into from

Conversation

mbg
Copy link
Member

@mbg mbg commented Jul 24, 2024

This PR makes minimal changes for us to tolerate Go 1.23. Go 1.23 makes changes to how aliases are represented by the compiler. This PR modifies the extractor to extract them as a new kind of type.

This is just a first stab at extracting the alias types and does not include parameters or arguments, yet.

Things to check:

  • Is it sensible for us to extract the alias types or just resolve them? I think it makes sense to extract them to mirror how the compiler represents the type AST.
  • In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?
  • Should we extract the type object for aliases?
  • Should we extract the names of aliases using the existing table for named types or a separate table?
  • Some existing tests fail with the current changes where interface{} types change to any. I am not entirely sure why this is.

@github-actions github-actions bot added the Go label Jul 24, 2024
@mbg mbg force-pushed the mbg/go/1.23 branch 5 times, most recently from ec0dcc1 to 51db53f Compare July 24, 2024 10:42
@mbg mbg force-pushed the mbg/go/1.23 branch 2 times, most recently from dbe8434 to eb4e1d1 Compare August 19, 2024 15:27
@mbg mbg marked this pull request as ready for review August 19, 2024 17:16
@mbg mbg requested review from a team as code owners August 19, 2024 17:16
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first-pass review of the code that is here, without thinking about what remains to be done.

Comment on lines 1809 to 1814
b.WriteString(tp.Obj().Id())
// Ensure that the definition of the alias gets extracted,
// which may be an alias in itself.
extractType(tw, tp.Rhs())
// Construct the label for this type alias.
lbl = tw.Labeler.GlobalID(fmt.Sprintf("%s;typealias", b.String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use the string directly, rather than making a string builder. Presumably this pattern has been copied from the case above, where it is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had expected to be building up a string with more than just the name (e.g. to include the number of type parameters), but didn't end up working in that yet / not sure more than the name is actually needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a combination of the name and the type label for the rhs? But I guess there could be two alias types with the same name and the same rhs in different packages. Hmmm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and ended up copying what we do for named types, since we also extract the object for alias types.

go/ql/test/library-tests/semmle/go/Types/Aliases.ql Outdated Show resolved Hide resolved
@owen-mc
Copy link
Contributor

owen-mc commented Aug 20, 2024

  1. Is it sensible for us to extract the alias types or just resolve them? I think it makes sense to extract them to mirror how the compiler represents the type AST.

I agree that it makes sense to extract them. I believe this compiler change is paving the way for aliases of generic types where some of the type parameters have been specified. We may find it hard to support that if we go too far from how the compiler represents things.

  1. In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?

If we're lucky then we've already put getUnderlyingType in all the places that need it. We may well find some more things we need to do.

  1. Should we extract the type object for aliases?

I think that we should, for the same reason as 1. (Assuming the compiler creates objects for them.)

@smowton
Copy link
Contributor

smowton commented Aug 20, 2024

Taken a look over the code; no additional review points from me. Agree with Owen's take on Qs 1-3 above. To Q4:

Should we extract the names of aliases using the existing table for named types or a separate table?

I'd say use the existing table. It's redundant w.r.t. getEntity().getName() anyway, but this is consistent with other named types.

@owen-mc
Copy link
Contributor

owen-mc commented Aug 21, 2024

  1. In light of that, do we need to make any adjustments (beyond implementing getUnderlyingType) to account for this so that queries which look for certain types don't get fooled by aliases?

I've added a predicate type unalias(Type t) which looks through any number of aliases till it gets to a non-alias type. (This is copied from the Go library.) Currently this is only needed for one experimental query. We may find we need to sprinkle it in a few more places, probably where we're trying to say is "is t this particular type" and we actually want to say "is t this particular type, or an alias of it, or an alias of an alias of it, ...". I'll run DCA and then QA to try to identify anywhere this happens.

  1. Some existing tests fail with the current changes where interface{} types change to any. I am not entirely sure why this is.

Having looked into it, I think this is what I'd expect. any is an alias for interface{}. When any is used as the type for anything, and we try to get information about it, it used to give the information about the thing it was an alias for, but now it gives information about itself.

@owen-mc
Copy link
Contributor

owen-mc commented Aug 22, 2024

The DCA results are fine. In one project the number of extractor errors changed, and we lost 5 FPs in unimportant queries.

@smowton
Copy link
Contributor

smowton commented Aug 22, 2024

Do you know why we saw result changes? Were they the same project as that with extraction errors?

@owen-mc
Copy link
Contributor

owen-mc commented Aug 22, 2024

Yes, same project. I didn't dig into it deeply. My guess is that we were able to extract a little bit more. All the FPs involved mistakenly thinking that two very similar expressions were actually the same. (None of them involved alias types.)

@smowton
Copy link
Contributor

smowton commented Aug 22, 2024

OK, that's reassuring. QA to confirm we're in a good state, then ready to merge?

@smowton
Copy link
Contributor

smowton commented Aug 26, 2024

Evaluation 1 had two key problems:

  1. It was contaminated by known problems with the 2.18.3 release (MaD conversion introducing unintentional changes)
  2. At least the Type.implements(Type) predicate was broken due to failing to account for identical types being sufficient for interface implementation -- so in particular, F(IntAlias) can implement F(int), and similarly F([]IntAlias) vs F([]int) and so on. Branch https://github.com/smowton/codeql/tree/smowton/admin/go1.23-wip has a crude first stab at implementing the needed deep-unalias routine (including getting the extractor to ensure the deep-unaliased type actually exists in the DB).

https://github.com/github/codeql-qa-ops/issues/190 is a new experiment to check if the combination of a tighter diff (i.e., not including the MaD changes) and fixing Type.implements is sufficient to fix alert differences, or if there are more missing unalias calls out there.

Note the working branch linked above also includes a couple of other small changes needed to make this work, in particular extracting struct-field tags, which were previously represented in trap labels but not in the database, making some types impossible to distinguish.

Things remaining to do:

  1. The implementation is noticeably a bit slow -- the deep-unalias operation could probably be better implemented, particularly the forall statements expressing deep-unaliasing w.r.t. function, struct and tuple types.
  2. Against at least repo https://github.com/cosmos/cosmos-sdk I noticed a concerning sign, that there existed aliases with more than one RHS, and consequently a small number of types with multiple deep-unaliases.
  3. We still need to do something about the new packages.Load crashes, ideally file an upstream bug.

@smowton
Copy link
Contributor

smowton commented Aug 27, 2024

Evaluation 2 reveals one interesting finding: 30 or so of our test repositories are using https://github.com/crypto-com/cosmos-sdk-codeql. Unless that pack is recompiled and the users upgrade, that means our database downgrade script will be used in anger to execute their queries, since the newer CodeQL will build a database with type-aliases in it, then the CLI will downgrade it so we can execute their queries that expect the old dbscheme.

I note the current downgrade script isn't very faithful -- rather than delete rows that refer to an alias type, we should instead replace the alias with its RHS, thus making it look as if aliases are transparent.

There are also a small number of alert changes, which I'll look into now.

@smowton
Copy link
Contributor

smowton commented Aug 27, 2024

Checklist of things to take care of before this can be merged:

  • The deepUnalias implementation is currently quite time-consuming. A recommended optimisation is to replace forall in the recursive method with recursion over component indices. Fixed by implementing a fast path that unpacks the first few elements of signatures and struct types. Extraction performance was also fixed by checking if alias types are involved before either requesting or extracting a transparent-alias version of a type.
  • Update the upgrade and downgrade scripts to (a) account for the dbscheme gaining the component_tags table and (b) on a downgrade, replace alias types with unalias(ty) instead of just dropping them as is done currently.
  • Investigate, report upstream and perhaps manage to work around the packages.Load panic seen on ~10 repositories in both QA experiments. Answer: this is go/types: 'under' panics on Alias type golang/go#68877 which will be fixed in go1.23.1. We will wait for its expected release on Tuesday to start the release. (We can merge this PR before then)
  • Investigate why on cosmos-sdk the unalias and AliasType.getRhs predicates are not functional (a small (~5) number of types have more than one alias RHS) @mbg established this is down to multiple versions of the same package being used in the context of a single extraction, a more general problem that isn't currently addressed for Go. The same issue also causes database inconsistencies in Java, and perhaps in other languages.
  • Investigate outstanding alert differences seen in the round 2 QA run, either fixing or change-noting as appropriate. Complete, see comment below.
  • Add consistency queries checking that unaliasing and deep-unaliasing are single-valued and total
  • Verify that all actions in extractType are idempotent now that they can be hit twice, once in the explicit and once in the transparent aliases path. They weren't; omitted them from the extract-type-with-transparent-aliases path.

@smowton
Copy link
Contributor

smowton commented Aug 28, 2024

Residual changes:

  • bhirbec/bitbot, twitchyliquid64/bob-the-builder: positive changes (new TPs) due to apparent changes in how part-broken projects are represented (function targets are successfully extracted that weren't before). Needs a change note.
  • At least some string literals seem to have started getting their Types[...].Value populated, leading to population of the constvalues table and therefore a result from getExactValue. This has fixed FPs from various redundancy / identical-expression quality queries, which were conflating all string literals because they relied on getExactValue to distinguish them. I suggest we keep the new behaviour (but change-note it).
  • Found several bugs with transparent-alias type extraction now fixed at https://github.com/smowton/codeql/commits/smowton/admin/go1.23-wip/

Started https://github.com/github/codeql-dca-main/issues/23315 to determine which projects still have alert differences.

DCA results to investigate:

  • MainfluxLabs/mainflux: loss of SQL injection. Cause: MongoDB models needed an unalias
  • casdoor/casdoor: loss of request-forgery. Cause: pointer-content read and store steps need to match types, which requires us to do a deep-unalias
  • go-gitea/gitea: loss of uncontrolled-allocation-size Same cause as casdoor
  • jackluo2012/datacenter: loss of incorrect-integer-conversion Same cause as casdoor
  • google/gapid: gain of useless-assignment Probable side-effect of compiler changes; database is highly broken
  • spotlightpa/almanack: gain of log-injection Correct new results due to project requiring Go 1.23 ==> incompletely extracted under 1.22

mbg and others added 24 commits September 1, 2024 21:06
This makes it clearer when changes in test results
are due to changes in the standard library.
This changed from uint in 1.22 to uint8 in 1.23.
Read through as many aliases as possible before getting
the underlying type.
Restricting to alias types in the source code stops the results from
changing when the standard library changes.

Using `pp()` instead of `getName` lets us see the results where the
underlying type doesn't have a name.
Use object like we do for named types.
@mbg mbg mentioned this pull request Sep 3, 2024
@@ -15,7 +15,7 @@ local_path_override(
# see https://registry.bazel.build/ for a list of available packages

bazel_dep(name = "platforms", version = "0.0.10")
bazel_dep(name = "rules_go", version = "0.49.0")
bazel_dep(name = "rules_go", version = "0.49.0-codeql.1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that https://github.com/bazelbuild/rules_go/releases/tag/v0.50.0 has been released, can we just use that?
This also means we can remove the 0.49.0-codeql.1 release from the registry (also added in this PR).

Copy link
Member Author

@mbg mbg Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting us know 👍🏻 I'll drop @redsun82's fix in the next rebase and bump this to the new version instead.

@mbg mbg changed the title Go: Support Go 1.23 Go: Support Go 1.23 (Explicit aliases) Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants