Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Improve lines-of-code counts. #17588

Merged
merged 10 commits into from
Oct 2, 2024
Merged

Rust: Improve lines-of-code counts. #17588

merged 10 commits into from
Oct 2, 2024

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Sep 25, 2024

Improve lines-of-code counts:

  • remove Module from the count, which was spuriously adding +1 LOC in main.rs.
  • add back calls to println! in the tests, since these should be deterministically (if not correctly) extracted now.
  • coincidentally the total LOC counted changes after the first commit, but changes back exactly after the second. These changes are not doing nothing!

@aibaars - also note that we have a spuriously located MacroRules (on my_macro.rs line 1) and Struct (on my_struct.rs line 2), which I discovered while debugging this query. I haven't done anything about them, I believe the right fix would be in the extractor.

Obviously types are still not extracted, which is also affecting the counts.

@geoffw0 geoffw0 added the Rust Pull requests that update Rust code label Sep 25, 2024
rust/ql/lib/codeql/files/FileSystem.qll Outdated Show resolved Hide resolved
@aibaars
Copy link
Contributor

aibaars commented Sep 27, 2024

In general using start and end lines of AstNodes to count lines will never work 100%. Currently we're overcounting a bit for items with attached comments. However, we may also change the location of top-level items to be the span of their names only. That gives a much better user-experience (so we highlightithe name of the function instead of the entire function body). Another tricky thing to deal with are multi-line string literals. The only reliable way to calculate LOC is to look at token, which we do not extract (though we might at some point).

A pretty simple way to deal with this problem is to ignore the start line for top-level items. That would under count in cases like

fn
my_function() {...}

but that would be extremely rare.

aibaars
aibaars previously approved these changes Sep 27, 2024
Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. However, you could implement special handling of top-level items to avoid counting doc comments as a line.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 1, 2024

Merged main, resolved conflicts and excluded top-level AST elements from the count (for now).

In general using start and end lines of AstNodes to count lines will never work 100%.

True, but notably this has proven to be a good enough approximation in other languages. Multi-line string literals should be pretty rare, and we might even add special logic for them if it turns out to be a thing that bothers us.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 2, 2024

Though I am a bit worried the attached comments issue is going to turn out to affect things other than top-level items, I'll be happier when the issue with locations itself is fixed.

I think I'm seeing this in the wild - things other than Items with attached comments causing us to flag comment lines as code. I wondering if the right answer for now is to only include end locations (as attached comments presumably always come before the thing they're attached to)? For single line items this won't make a difference, it'll avoid the attached comments issue, though we'll miss things like lines containing only {.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 2, 2024

Done. @aibaars I'd appreciate your thoughts on this.

rust/ql/lib/codeql/files/FileSystem.qll Dismissed Show dismissed Hide dismissed
@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 2, 2024

With the latest changes we finally get an under-count rather than an over-count of lines of code compared to cloc. This is what we should be getting.

@aibaars
Copy link
Contributor

aibaars commented Oct 2, 2024

I think I'm seeing this in the wild - things other than Items with attached comments causing us to flag comment lines as code. I wondering if the right answer for now is to only include end locations (as attached comments presumably always come before the thing they're attached to)? For single line items this won't make a difference, it'll avoid the attached comments issue, though we'll miss things like lines containing only {.

That would probably work. We could include the start line for BlockExpr and friends to avoid missing out on { .

I suspect that the following node kinds may all start with a comment:

 git grep 'HasDocComments for '
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Const {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Enum {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for ExternBlock {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for ExternCrate {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Fn {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Impl {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for MacroCall {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for MacroDef {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for MacroRules {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Module {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for RecordField {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for SourceFile {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Static {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Struct {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Trait {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for TraitAlias {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for TupleField {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for TypeAlias {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Union {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Use {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Variant {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Adt {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for AssocItem {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for ExternItem {}
crates/syntax/src/ast/generated/nodes.rs:impl ast::HasDocComments for Item {}

Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Perhaps include the start-line of BlockExpr and any other things that may start with a single { on a line.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 2, 2024

I think I'd rather merge this now, further improvements can follow later if necessary. When can I anticipate the locations themselves being fixed? If it might be in the next few weeks, I'm inclined not to spend any more time iterating on workarounds.

@geoffw0 geoffw0 merged commit f7db47b into github:main Oct 2, 2024
14 checks passed
@geoffw0 geoffw0 deleted the loc2 branch October 29, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants