Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Model Microsoft's "Active Template Library" #18136

Merged
merged 65 commits into from
Dec 9, 2024

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Nov 27, 2024

This PR adds MaD models for most of the relevant classes in Active Template Library (ATL).

In addition I've also added ATL flow sources whenever I came across something relevant.

There are still some models that could do with more MaD rows, but they depend on a couple of features that aren't yet available. In particular:

  • It's not currently possible to add a MaD row for a templated conversion operator. i.e.:
      template<typename T>
      struct MyStruct {
          operator T();
      };
    because the name of the conversion operator is just whatever name T is instantiated with. You can see an example of where this is used here (where you'd like taint from this to the result of the call). A natural way to write this MaD would be:
    ["", "CAtlFileMapping<T>", True, "operator T *", "", "", "Argument[-1]", "ReturnValue[*]", "taint", "manual"]
    However, we don't currently allow using template arguments in the "name" column like we do for the signature column. I plan on tackling this as a follow-up to this PR.
  • Lots of container related models contains code allows you to do stuff like:
    Container container;
    POS pos = container.find(myValue);
    sink(container.lookup(pos));
    we could add a taint-step from find to the return value, and a taint-step from lookup's argument to its return value, in order to get this taint. However, I would much rather do the very small addition to dataflow (that Java has) which adds a MapKey content to express that "this is a key that points to something tainted". So that's another follow-up from this PR.

In addition to adding a bunch of models, this PR also fixes two problems in our interpretation of MaD rows:

  • It was impossible to add non-templated models since we were using isConstructedFrom in a few places to obtain the uninstantiated template of classes and functions. The problem with that is that isConstructedFrom doesn't have a result when the function/class isn't a template instantiation. So I fixed that as part of this work, and you can see the effect of this in the commit that follows the fix.
  • As I mentioned offline, a conversion operator such as:
      using MyInt = int;
      struct S {
        operator MyInt();
      };
    gets the name operator int instead of operator MyInt. So without some special handling a MaD row for operator MyInt would have to use the name operator int which is slightly confusing. So I added special handling of conversion operators so that we get the name operator MyInt instead. The effect of this can also be seen in the commit that follows the fix.

Commit-by-commit review highly encouraged. There is a lot of code, but most of it is just adding tests and models. Every commit can be read and understood in isolation!

@github-actions github-actions bot added the C++ label Nov 27, 2024
@MathiasVP MathiasVP force-pushed the model-active-template-library branch from 257c657 to de75e03 Compare December 4, 2024 12:46
@MathiasVP
Copy link
Contributor Author

I've responded to all the comments now, Geoffrey. While fixing some of them I also spotted a bug in one of the models that I fixed in 279a30c

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another 8 commits reviewed (to the end of CPathT).

cpp/ql/lib/ext/CComBSTR.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CComBSTR.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CComBSTR.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CComBSTR.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CComBSTR.model.yml Outdated Show resolved Hide resolved
- ["", "CComSafeArray", True, "CComSafeArray", "(const SAFEARRAY &)", "", "Argument[*0]", "Argument[-1].Field[*m_psa]", "value", "manual"]
- ["", "CComSafeArray", True, "CComSafeArray", "(const SAFEARRAY *)", "", "Argument[*0]", "Argument[-1].Field[*m_psa]", "value", "manual"]
- ["", "CComSafeArray", True, "Add", "(const SAFEARRAY *)", "", "Argument[*0]", "Argument[-1].Field[*m_psa]", "value", "manual"]
- ["", "CComSafeArray<T>", True, "Add", "(const T &,BOOL)", "", "Argument[*@0]", "Argument[-1].Field[*m_psa].Field[*@pvData]", "value", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on here? I mean, what is .Field[*m_psa].Field[*@pvData] and why do we sometimes reference it instead of just .Field[*m_psa]?

Copy link
Contributor Author

@MathiasVP MathiasVP Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The m_psa field is a public member of CComSafeArray: https://learn.microsoft.com/en-us/cpp/atl/reference/ccomsafearray-class?view=msvc-170#m_psa. So it's possible to do:

CComSafeArray<int> array;
array.Add(42);
LPSAFEARRAY underlyingArray = array.m_psa;
int* data = (int*)underlyingArray->pvData;
printf("%d", *data); // this will print 42

So in order to capture this flow we model the effect of calling Add by making it a write to the m_psa->pvData member.

cpp/ql/lib/ext/CComSafeArray.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CPathT.model.yml Show resolved Hide resolved
@MathiasVP
Copy link
Contributor Author

Wonderful 🦅 eye'ing, Geoffrey! I've responded to all of them now.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete. 😅 🎉

Overall this looks fantastic, I expect it will really improve flow (and sources for that matter) in projects that use this stuff heavily.

cpp/ql/lib/ext/CSimpleArray.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CPathT.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CSimpleMap.model.yml Outdated Show resolved Hide resolved
cpp/ql/test/library-tests/dataflow/taint-tests/atl.cpp Outdated Show resolved Hide resolved
- ["", "CAtlFileMappingBase", True, "GetData", "", "", "Argument[-1]", "ReturnValue[*]", "taint", "manual"]
- ["", "CAtlFileMappingBase", True, "GetHandle", "", "", "Argument[-1]", "ReturnValue", "taint", "manual"]
- ["", "CAtlFileMappingBase", True, "MapFile", "", "", "Argument[0]", "Argument[-1]", "taint", "manual"]
- ["", "CAtlFileMappingBase", True, "MapSharedMem", "", "", "Argument[*1]", "Argument[-1]", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there are potentially some queries we could develop or extend into the area of memory mapped files.

data: # namespace, type, subtypes, name, signature, ext, input, output, kind, provenance
- ["", "CRegKey", True, "CRegKey", "(CRegKey&)", "", "Argument[*0]", "Argument[-1]", "value", "manual"]
- ["", "CRegKey", True, "CRegKey", "(HKEY)", "", "Argument[0]", "Argument[-1]", "taint", "manual"]
- ["", "CRegKey", True, "Attach", "", "", "Argument[0]", "Argument[-1]", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should also model CRegKey::Create, which I believe is commonly used when adding new data structures in the registry. The lpszKeyName (Argument[*1]) should have taint flow into the qualifier.

(it could also be a query sink, you would not want untrusted data controlling this operation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've added this summary model in 674dbce. I think this is one that could also be treated with MapKey flow at some point and be made value-preserving.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will be value preserving as you're still appending the key name to an existing registry path represented by the current state of the key. Unless maybe you consider the key to be a set of path edges, in which case I suppose you're adding one value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless maybe you consider the key to be a set of path edges, in which case I suppose you're adding one value.

Yeah, that was my thinking. But I guess we can settle on those details later. I agree that there are multiple ways to think of this.

cpp/ql/lib/ext/CRegKey.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CRegKey.model.yml Outdated Show resolved Hide resolved
cpp/ql/lib/ext/CRegKey.model.yml Outdated Show resolved Hide resolved
@MathiasVP
Copy link
Contributor Author

I believe I've addressed everything @geoffw0. The only discussion point left is #18136 (comment).

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of my concerns have been addressed, I think we're ready to merge. ✨

@MathiasVP MathiasVP merged commit 1266b24 into github:main Dec 9, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants