Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Reduce FPs in cpp/wrong-type-format-argument due to extraction errors #17775

Merged
merged 13 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: feature
---
* Added the predicate `mayBeFromImplicitlyDeclaredFunction()` to the `Expr` class to represent expressions that may be the return value of an implicitly declared C function.
4 changes: 4 additions & 0 deletions cpp/ql/lib/semmle/code/cpp/exprs/Call.qll
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,10 @@ class FunctionCall extends Call, @funbindexpr {
this.isVirtual() or
this.getTarget().getAnAttribute().getName() = "weak"
}

override predicate mayBeFromImplicitlyDeclaredFunction() {
this.getTarget().getADeclarationEntry().isImplicit()
}
}

/** A _user-defined_ unary `operator*` function. */
Expand Down
3 changes: 3 additions & 0 deletions cpp/ql/lib/semmle/code/cpp/exprs/Expr.qll
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,9 @@ class Expr extends StmtParent, @expr {

/** Gets the function containing this control-flow node. */
override Function getControlFlowScope() { result = this.getEnclosingFunction() }

/** Holds if this expression could be the return value of an implicitly declared function. */
predicate mayBeFromImplicitlyDeclaredFunction() { none() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make this a predicate of Call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to ask an Expr if it came from an implicitly-declared function. I suppose, we could have done arg.(Call).mayBeFromImplicitlyDeclaredFunction()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that would be better.

}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ private class Sprintf extends FormattingFunction, NonThrowingFunction {
override int getFirstFormatArgumentIndex() {
if this.hasName("__builtin___sprintf_chk")
then result = 4
else result = this.getNumberOfParameters()
else result = this.getNumberOfExplicitParameters()
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ abstract class FormattingFunction extends ArrayFunction, TaintFunction {
// The formatting function either has a definition in the snapshot, or all
// `DeclarationEntry`s agree on the number of parameters (otherwise we don't
// really know the correct number)
result > 0 and // Avoid invalid declarations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make more sense to me to make this condition part of the charPred of this abstract class. Reasoning: there are no arguments besides vararg ones, then there is no explicit argument for the formatting string, which would be very odd for a formatting function.

Maybe something like:

abstract class FormattingFunction extends ArrayFunction, TaintFunction {
  int firstFormatArgumentIndex;
  int numberOfExplicitParameters;

  FormattingFunction() {
    forex(FunctionDeclarationEntry fde | fde = this.getAnExplicitDeclarationEntry() |
      numberOfExplicitParameters = fde.getNumberOfParameters()
    ) and
    (
      if this.hasDefinition()
      then firstFormatArgumentIndex = this.getDefinition().getNumberOfParameters()
      else firstFormatArgumentIndex = numberOfExplicitParameters
    ) and
    firstFormatArgumentIndex > 0
  }

and implement the relevant get predicates in terms of firstFormatArgumentIndex and numberOfExplicitParameters.

I think it would also make sense at this point to move getAnExplicitDeclarationEntry to Function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also work for intrinsic functions like __builtin___sprintf_chk? I will try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intrinsic functions should still have a prototype in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out, and encountered a couple of problems:

  1. Certain dataflow functions no longer identify the formatting functions as sinks if the functions have not been declared. Maybe this is ok, but I needed to fix up some of the qltests to add missing function declarations.
  2. The builtin functions have no declaration entries - I tested this using count(fn.getADeclarationEntry()) which returned 0. This is fine, but I did need to add a special case to FormattingFunction.

I'll need to run DCA on this (once the tests pass), fix changenotes, and fix the tests on semmle-code.

Are you still ok with this design? @jketema

Copy link
Contributor

@jketema jketema Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The builtin functions have no declaration entries - I tested this using count(fn.getADeclarationEntry()) which returned 0. This is fine, but I did need to add a special case to FormattingFunction.

That's an extractor bug, if it's reproducible in an integration test. Otherwise it's a bug in how we run tests.

Copy link
Contributor

@jketema jketema Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at this a bit:

  • The question mark in #114=@"fun_decl___builtin___sprintf_chk(?)" is there, because the compiler generated function declaration is a C function, so name-mangling of the parameters should be ignored.
  • For compiler generated functions we apparently do not output a row in the fun_decls relation, which explains why you get a count of zero.
  • We do output parameter information directly on the function:
    #13d=@"fun_decl___builtin___snprintf_chk(?)"
    .push *
    functions(#13d, "__builtin___snprintf_chk", 6)
    link_parent(#13d, #101)
    function_return_type(#13d, #10e)
    #13d_13e=@"{#13d}_0_par"
    params(#13d_13e, #13d, 0, #116)
    #13d_13f=@"{#13d}_1_par"
    params(#13d_13f, #13d, 1, #119)
    #13d_140=@"{#13d}_2_par"
    params(#13d_140, #13d, 2, #10e)
    #13d_141=@"{#13d}_3_par"
    params(#13d_141, #13d, 3, #119)
    #13d_142=@"{#13d}_4_par"
    params(#13d_142, #13d, 4, #11c)
    .pop
    

Questions (I'll also have a look at whether I can answer this myself):

  • Why do we look at the declaration entries and not at the functions?
  • Should we sometimes fallback to the parameter information of the Function if there are no declaration entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We look at declaration entries in order to figure out if the declaration was implicit or not. There can be spurious function parameters caused by extraction errors, so we need to double-check the function declaration entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened a PR for an additional internal test related to this. I think the right solution here is to use the information from the Function and not the FunctionDeclarationEntry when we see that the function is a builtin function. In that case the information from the Function should be reliable, because the function's signature is under control of out frontend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally added if this instanceof BuiltinFunction then firstFormatArgumentIndex = this.getNumberOfParameters(), because getNumberOfParameters() was reliable for builtins. I'm happy to add this back, because we shouldn't rely on FunctionDeclarationEntry for builtins. If it's ok with you, shall this be the final solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

if this.hasDefinition()
then result = this.getDefinition().getNumberOfParameters()
else result = this.getNumberOfExplicitParameters()
Expand All @@ -143,7 +144,7 @@ abstract class FormattingFunction extends ArrayFunction, TaintFunction {
* from implicit function declarations. If there is some inconsistency in the number
* of parameters, then don't return anything.
*/
private int getNumberOfExplicitParameters() {
int getNumberOfExplicitParameters() {
forex(FunctionDeclarationEntry fde | fde = this.getAnExplicitDeclarationEntry() |
result = fde.getNumberOfParameters()
)
Expand Down
3 changes: 2 additions & 1 deletion cpp/ql/src/Likely Bugs/Format/WrongTypeFormatArguments.ql
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,8 @@ where
) and
not arg.isAffectedByMacro() and
not arg.isFromUninstantiatedTemplate(_) and
not actual.getUnspecifiedType() instanceof ErroneousType
not actual.getUnspecifiedType() instanceof ErroneousType and
not arg.mayBeFromImplicitlyDeclaredFunction()
select arg,
"This format specifier for type '" + expected.getName() + "' does not match the argument type '" +
actual.getUnspecifiedType().getName() + "'."
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* Fixed false positives in the `cpp/wrong-type-format-argument` ("Wrong type of arguments to formatting function") query if there are extraction errors in the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically not true, as there are valid cases where a function is implicitly defined and correctly being used as a argument in a printf call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might also reject true-positives. Do you have a suggested text I could use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we mostly want to be explicit about what we ignore, i.e., calls to functions that are implicitly declared, as in the majority of cases we see that these are due to extraction errors, and not because the codebase is pre-C99 C-code, which allows implicit function declarations.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
| file://:0:0:0:0 | <error expr> |
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import cpp

from Expr e
where e.getType() instanceof ErroneousType
select e
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
| file://:0:0:0:0 | There was an error during this compilation |
| implicit.cpp:5:5:5:5 | identifier 'g' is undefined |
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
import cpp

from Diagnostic d
select d
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
| implicit2.c:1:7:1:7 | g | file://:0:0:0:0 | float |
| implicit2.c:1:7:1:7 | g | file://:0:0:0:0 | int |
| implicit.c:1:6:1:6 | f | file://:0:0:0:0 | void |
| implicit.c:3:5:3:5 | g | file://:0:0:0:0 | float |
| implicit.c:3:5:3:5 | g | file://:0:0:0:0 | int |
| implicit.cpp:3:6:3:6 | f | file://:0:0:0:0 | void |
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import cpp

from Function fn
where fn.fromSource()
select fn, fn.getType()
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
void f() {
f();
g();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// semmle-extractor-options: --expect_errors

void f() {
f();
g();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
| implicit.c:3:5:3:5 | call to g |
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import cpp

from Expr e
where e.mayBeFromImplicitlyDeclaredFunction()
select e
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
float g();
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
| tests.c:7:18:7:18 | 1 | This format specifier for type 'char *' does not match the argument type 'int'. |
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Likely Bugs/Format/WrongTypeFormatArguments.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// semmle-extractor-options: --expect_errors

int printf(const char * format, ...);
int fprintf();

void f() {
printf("%s", 1); // BAD
printf("%s", implicit_function()); // GOOD - we should ignore the type
sprintf(0, "%s", ""); // GOOD
fprintf(0, "%s", ""); // GOOD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is now good, because int fprintf() now no longer yields a value for getFirstFormatArgumentIndex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

}
Loading