Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Reduce FPs in cpp/wrong-type-format-argument due to extraction errors #17775

Merged
merged 13 commits into from
Oct 24, 2024

Conversation

calumgrant
Copy link
Contributor

@calumgrant calumgrant commented Oct 15, 2024

Fixes false positives cause by buildless extraction. Commit-by-commit review recommended.

Removes 1008 false positives in the asterisk/asterisk projects as observed on codeql-qa.

Pull Request checklist

All query authors

Internal query authors only

  • Autofixes generated based on these changes are valid, only needed if this PR makes significant changes to .ql, .qll, or .qhelp files. See the documentation (internal access required).
  • Changes are validated at scale (internal access required).
  • Adding a new query? Consider also adding the query to autofix.

@calumgrant calumgrant force-pushed the calumgrant/bmn/wrong-type-format-arguments-test branch from 526aaff to 23970f1 Compare October 15, 2024 14:53
@calumgrant calumgrant force-pushed the calumgrant/bmn/wrong-type-format-arguments-test branch from 23970f1 to 1e459ac Compare October 15, 2024 15:09
@calumgrant calumgrant force-pushed the calumgrant/bmn/wrong-type-format-arguments-test branch from 1e459ac to 853128c Compare October 16, 2024 08:46
@calumgrant calumgrant marked this pull request as ready for review October 16, 2024 09:59
@calumgrant calumgrant requested a review from a team as a code owner October 16, 2024 09:59
@@ -300,6 +300,11 @@ class FunctionCall extends Call, @funbindexpr {
this.isVirtual() or
this.getTarget().getAnAttribute().getName() = "weak"
}

override predicate mayBeFromImplicitlyDeclaredFunction() {
this.getType() instanceof IntType and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be anything else than int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be, so perhaps this test is redundant. But, I'm wondering what happens if a function is implicit in some compilation units, but explicit in others?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sound like that's something that need further thought, especially with regard to the implications that has for the changes made here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test for this scenario, and removed the redundant test. So it seems that Function::getType() can return multiple things if there are conflicting declarations.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks somewhat odd, as the modifications differ across sprintf and fprintf, and as printf itself is left untouched.

@@ -50,6 +50,8 @@ private class Fprintf extends FormattingFunction, NonThrowingFunction {
override int getFormatParameterIndex() { result = 1 }

override int getOutputParameterIndex(boolean isStream) { result = 0 and isStream = true }

override int getFirstFormatArgumentIndex() { result = 2 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd. It feels like there might be a bug in one of the super classes. Otherwise, why doesn't Printf have a similar predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this, but it's what's needed to remove false-positives on the asterisk project. The declaration of int fprintf() in the last commit (5315a5c) replicates what happened in the database. I briefly forgot why I needed that line.

This line is safe because class Fprintf always has 2 arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause is actually that the extractor sometimes fails to mark function declarations as implicit. Hmm. Maybe that's the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed instead using 4341fab. I couldn't find a nice way to identify these extraction errors but hopefully the solution makes sense.

@@ -534,6 +534,9 @@ class Expr extends StmtParent, @expr {

/** Gets the function containing this control-flow node. */
override Function getControlFlowScope() { result = this.getEnclosingFunction() }

/** Holds if this expression could be the return value of an implicitly declared function. */
predicate mayBeFromImplicitlyDeclaredFunction() { none() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make this a predicate of Call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to ask an Expr if it came from an implicitly-declared function. I suppose, we could have done arg.(Call).mayBeFromImplicitlyDeclaredFunction()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that would be better.

---
category: minorAnalysis
---
* Fixed false positives in the `cpp/wrong-type-format-argument` ("Wrong type of arguments to formatting function") query if there are extraction errors in the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically not true, as there are valid cases where a function is implicitly defined and correctly being used as a argument in a printf call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might also reject true-positives. Do you have a suggested text I could use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we mostly want to be explicit about what we ignore, i.e., calls to functions that are implicitly declared, as in the majority of cases we see that these are due to extraction errors, and not because the codebase is pre-C99 C-code, which allows implicit function declarations.

printf("%s", 1); // BAD
printf("%s", implicit_function()); // GOOD - we should ignore the type
sprintf(0, "%s", ""); // GOOD
fprintf(0, "%s", ""); // GOOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is now good, because int fprintf() now no longer yields a value for getFirstFormatArgumentIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@calumgrant
Copy link
Contributor Author

Will re-run DCA after current changes.

@@ -125,6 +125,7 @@ abstract class FormattingFunction extends ArrayFunction, TaintFunction {
// The formatting function either has a definition in the snapshot, or all
// `DeclarationEntry`s agree on the number of parameters (otherwise we don't
// really know the correct number)
result > 0 and // Avoid invalid declarations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make more sense to me to make this condition part of the charPred of this abstract class. Reasoning: there are no arguments besides vararg ones, then there is no explicit argument for the formatting string, which would be very odd for a formatting function.

Maybe something like:

abstract class FormattingFunction extends ArrayFunction, TaintFunction {
  int firstFormatArgumentIndex;
  int numberOfExplicitParameters;

  FormattingFunction() {
    forex(FunctionDeclarationEntry fde | fde = this.getAnExplicitDeclarationEntry() |
      numberOfExplicitParameters = fde.getNumberOfParameters()
    ) and
    (
      if this.hasDefinition()
      then firstFormatArgumentIndex = this.getDefinition().getNumberOfParameters()
      else firstFormatArgumentIndex = numberOfExplicitParameters
    ) and
    firstFormatArgumentIndex > 0
  }

and implement the relevant get predicates in terms of firstFormatArgumentIndex and numberOfExplicitParameters.

I think it would also make sense at this point to move getAnExplicitDeclarationEntry to Function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also work for intrinsic functions like __builtin___sprintf_chk? I will try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intrinsic functions should still have a prototype in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out, and encountered a couple of problems:

  1. Certain dataflow functions no longer identify the formatting functions as sinks if the functions have not been declared. Maybe this is ok, but I needed to fix up some of the qltests to add missing function declarations.
  2. The builtin functions have no declaration entries - I tested this using count(fn.getADeclarationEntry()) which returned 0. This is fine, but I did need to add a special case to FormattingFunction.

I'll need to run DCA on this (once the tests pass), fix changenotes, and fix the tests on semmle-code.

Are you still ok with this design? @jketema

Copy link
Contributor

@jketema jketema Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builtin functions have no declaration entries - I tested this using count(fn.getADeclarationEntry()) which returned 0. This is fine, but I did need to add a special case to FormattingFunction.

That's an extractor bug, if it's reproducible in an integration test. Otherwise it's a bug in how we run tests.

Copy link
Contributor

@jketema jketema Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at this a bit:

  • The question mark in #114=@"fun_decl___builtin___sprintf_chk(?)" is there, because the compiler generated function declaration is a C function, so name-mangling of the parameters should be ignored.
  • For compiler generated functions we apparently do not output a row in the fun_decls relation, which explains why you get a count of zero.
  • We do output parameter information directly on the function:
    #13d=@"fun_decl___builtin___snprintf_chk(?)"
    .push *
    functions(#13d, "__builtin___snprintf_chk", 6)
    link_parent(#13d, #101)
    function_return_type(#13d, #10e)
    #13d_13e=@"{#13d}_0_par"
    params(#13d_13e, #13d, 0, #116)
    #13d_13f=@"{#13d}_1_par"
    params(#13d_13f, #13d, 1, #119)
    #13d_140=@"{#13d}_2_par"
    params(#13d_140, #13d, 2, #10e)
    #13d_141=@"{#13d}_3_par"
    params(#13d_141, #13d, 3, #119)
    #13d_142=@"{#13d}_4_par"
    params(#13d_142, #13d, 4, #11c)
    .pop
    

Questions (I'll also have a look at whether I can answer this myself):

  • Why do we look at the declaration entries and not at the functions?
  • Should we sometimes fallback to the parameter information of the Function if there are no declaration entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We look at declaration entries in order to figure out if the declaration was implicit or not. There can be spurious function parameters caused by extraction errors, so we need to double-check the function declaration entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened a PR for an additional internal test related to this. I think the right solution here is to use the information from the Function and not the FunctionDeclarationEntry when we see that the function is a builtin function. In that case the information from the Function should be reliable, because the function's signature is under control of out frontend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally added if this instanceof BuiltinFunction then firstFormatArgumentIndex = this.getNumberOfParameters(), because getNumberOfParameters() was reliable for builtins. I'm happy to add this back, because we shouldn't rely on FunctionDeclarationEntry for builtins. If it's ok with you, shall this be the final solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

---
category: minorAnalysis
---
* Fixed false positives in the `cpp/wrong-type-format-argument` ("Wrong type of arguments to formatting function") query if there are extraction errors in the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we mostly want to be explicit about what we ignore, i.e., calls to functions that are implicitly declared, as in the majority of cases we see that these are due to extraction errors, and not because the codebase is pre-C99 C-code, which allows implicit function declarations.

@calumgrant calumgrant force-pushed the calumgrant/bmn/wrong-type-format-arguments-test branch from 6463531 to c5a082f Compare October 21, 2024 12:36
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could we run a final DCA experiment to see everything is ok?

@calumgrant calumgrant merged commit a8f1d57 into main Oct 24, 2024
16 checks passed
@calumgrant calumgrant deleted the calumgrant/bmn/wrong-type-format-arguments-test branch October 24, 2024 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants