-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Reduce FPs in cpp/wrong-type-format-argument due to extraction errors #17775
C++: Reduce FPs in cpp/wrong-type-format-argument due to extraction errors #17775
Conversation
526aaff
to
23970f1
Compare
23970f1
to
1e459ac
Compare
C++: Change note
1e459ac
to
853128c
Compare
@@ -300,6 +300,11 @@ class FunctionCall extends Call, @funbindexpr { | |||
this.isVirtual() or | |||
this.getTarget().getAnAttribute().getName() = "weak" | |||
} | |||
|
|||
override predicate mayBeFromImplicitlyDeclaredFunction() { | |||
this.getType() instanceof IntType and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this be anything else than int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it can be, so perhaps this test is redundant. But, I'm wondering what happens if a function is implicit in some compilation units, but explicit in others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sound like that's something that need further thought, especially with regard to the implications that has for the changes made here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a test for this scenario, and removed the redundant test. So it seems that Function::getType()
can return multiple things if there are conflicting declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks somewhat odd, as the modifications differ across sprintf
and fprintf
, and as printf
itself is left untouched.
@@ -50,6 +50,8 @@ private class Fprintf extends FormattingFunction, NonThrowingFunction { | |||
override int getFormatParameterIndex() { result = 1 } | |||
|
|||
override int getOutputParameterIndex(boolean isStream) { result = 0 and isStream = true } | |||
|
|||
override int getFirstFormatArgumentIndex() { result = 2 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks odd. It feels like there might be a bug in one of the super classes. Otherwise, why doesn't Printf
have a similar predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this, but it's what's needed to remove false-positives on the asterisk
project. The declaration of int fprintf()
in the last commit (5315a5c) replicates what happened in the database. I briefly forgot why I needed that line.
This line is safe because class Fprintf
always has 2 arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root cause is actually that the extractor sometimes fails to mark function declarations as implicit. Hmm. Maybe that's the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed instead using 4341fab. I couldn't find a nice way to identify these extraction errors but hopefully the solution makes sense.
@@ -534,6 +534,9 @@ class Expr extends StmtParent, @expr { | |||
|
|||
/** Gets the function containing this control-flow node. */ | |||
override Function getControlFlowScope() { result = this.getEnclosingFunction() } | |||
|
|||
/** Holds if this expression could be the return value of an implicitly declared function. */ | |||
predicate mayBeFromImplicitlyDeclaredFunction() { none() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just make this a predicate of Call
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to ask an Expr
if it came from an implicitly-declared function. I suppose, we could have done arg.(Call).mayBeFromImplicitlyDeclaredFunction()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that would be better.
--- | ||
category: minorAnalysis | ||
--- | ||
* Fixed false positives in the `cpp/wrong-type-format-argument` ("Wrong type of arguments to formatting function") query if there are extraction errors in the function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically not true, as there are valid cases where a function is implicitly defined and correctly being used as a argument in a printf call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it might also reject true-positives. Do you have a suggested text I could use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we mostly want to be explicit about what we ignore, i.e., calls to functions that are implicitly declared, as in the majority of cases we see that these are due to extraction errors, and not because the codebase is pre-C99 C-code, which allows implicit function declarations.
printf("%s", 1); // BAD | ||
printf("%s", implicit_function()); // GOOD - we should ignore the type | ||
sprintf(0, "%s", ""); // GOOD | ||
fprintf(0, "%s", ""); // GOOD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is now good, because int fprintf()
now no longer yields a value for getFirstFormatArgumentIndex
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Will re-run DCA after current changes. |
@@ -125,6 +125,7 @@ abstract class FormattingFunction extends ArrayFunction, TaintFunction { | |||
// The formatting function either has a definition in the snapshot, or all | |||
// `DeclarationEntry`s agree on the number of parameters (otherwise we don't | |||
// really know the correct number) | |||
result > 0 and // Avoid invalid declarations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make more sense to me to make this condition part of the charPred of this abstract class. Reasoning: there are no arguments besides vararg ones, then there is no explicit argument for the formatting string, which would be very odd for a formatting function.
Maybe something like:
abstract class FormattingFunction extends ArrayFunction, TaintFunction {
int firstFormatArgumentIndex;
int numberOfExplicitParameters;
FormattingFunction() {
forex(FunctionDeclarationEntry fde | fde = this.getAnExplicitDeclarationEntry() |
numberOfExplicitParameters = fde.getNumberOfParameters()
) and
(
if this.hasDefinition()
then firstFormatArgumentIndex = this.getDefinition().getNumberOfParameters()
else firstFormatArgumentIndex = numberOfExplicitParameters
) and
firstFormatArgumentIndex > 0
}
and implement the relevant get
predicates in terms of firstFormatArgumentIndex
and numberOfExplicitParameters
.
I think it would also make sense at this point to move getAnExplicitDeclarationEntry
to Function
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this also work for intrinsic functions like __builtin___sprintf_chk
? I will try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intrinsic functions should still have a prototype in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this out, and encountered a couple of problems:
- Certain dataflow functions no longer identify the formatting functions as sinks if the functions have not been declared. Maybe this is ok, but I needed to fix up some of the qltests to add missing function declarations.
- The builtin functions have no declaration entries - I tested this using
count(fn.getADeclarationEntry())
which returned0
. This is fine, but I did need to add a special case toFormattingFunction
.
I'll need to run DCA on this (once the tests pass), fix changenotes, and fix the tests on semmle-code.
Are you still ok with this design? @jketema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The builtin functions have no declaration entries - I tested this using
count(fn.getADeclarationEntry())
which returned0
. This is fine, but I did need to add a special case toFormattingFunction
.
That's an extractor bug, if it's reproducible in an integration test. Otherwise it's a bug in how we run tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at this a bit:
- The question mark in
#114=@"fun_decl___builtin___sprintf_chk(?)"
is there, because the compiler generated function declaration is a C function, so name-mangling of the parameters should be ignored. - For compiler generated functions we apparently do not output a row in the
fun_decls
relation, which explains why you get a count of zero. - We do output parameter information directly on the function:
#13d=@"fun_decl___builtin___snprintf_chk(?)" .push * functions(#13d, "__builtin___snprintf_chk", 6) link_parent(#13d, #101) function_return_type(#13d, #10e) #13d_13e=@"{#13d}_0_par" params(#13d_13e, #13d, 0, #116) #13d_13f=@"{#13d}_1_par" params(#13d_13f, #13d, 1, #119) #13d_140=@"{#13d}_2_par" params(#13d_140, #13d, 2, #10e) #13d_141=@"{#13d}_3_par" params(#13d_141, #13d, 3, #119) #13d_142=@"{#13d}_4_par" params(#13d_142, #13d, 4, #11c) .pop
Questions (I'll also have a look at whether I can answer this myself):
- Why do we look at the declaration entries and not at the functions?
- Should we sometimes fallback to the parameter information of the
Function
if there are no declaration entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We look at declaration entries in order to figure out if the declaration was implicit or not. There can be spurious function parameters caused by extraction errors, so we need to double-check the function declaration entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened a PR for an additional internal test related to this. I think the right solution here is to use the information from the Function
and not the FunctionDeclarationEntry
when we see that the function is a builtin function. In that case the information from the Function
should be reliable, because the function's signature is under control of out frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally added if this instanceof BuiltinFunction then firstFormatArgumentIndex = this.getNumberOfParameters()
, because getNumberOfParameters()
was reliable for builtins. I'm happy to add this back, because we shouldn't rely on FunctionDeclarationEntry
for builtins. If it's ok with you, shall this be the final solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good.
--- | ||
category: minorAnalysis | ||
--- | ||
* Fixed false positives in the `cpp/wrong-type-format-argument` ("Wrong type of arguments to formatting function") query if there are extraction errors in the function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we mostly want to be explicit about what we ignore, i.e., calls to functions that are implicitly declared, as in the majority of cases we see that these are due to extraction errors, and not because the codebase is pre-C99 C-code, which allows implicit function declarations.
6463531
to
c5a082f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could we run a final DCA experiment to see everything is ok?
Fixes false positives cause by buildless extraction. Commit-by-commit review recommended.
Removes 1008 false positives in the
asterisk/asterisk
projects as observed on codeql-qa.Pull Request checklist
All query authors
.qhelp
. See the documentation in this repository.Internal query authors only
.ql
,.qll
, or.qhelp
files. See the documentation (internal access required).