From 49a4318ab1af354c3b335b448a33696c635833cd Mon Sep 17 00:00:00 2001 From: Ed Minnix Date: Tue, 20 Aug 2024 00:30:16 -0400 Subject: [PATCH 01/25] DRAFT: Go MaD docs first draft (still need to change Select example) --- .../customizing-library-models-for-go.rst | 413 ++++++++++++++++++ 1 file changed, 413 insertions(+) create mode 100644 docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst new file mode 100644 index 000000000000..61861039f03f --- /dev/null +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -0,0 +1,413 @@ +.. _customizing-library-models-for-csharp: + +Customizing library models for Go +================================= + +You can model the methods and callables that control data flow in any framework or library. This is especially useful for custom frameworks or niche libraries, that are not supported by the standard CodeQL libraries. + +.. include:: ../reusables/beta-note-customizing-library-models.rst + +About this article +------------------ + +This article contains reference material about how to define custom models for sources, sinks, and flow summaries for Go dependencies in data extension files. + +About data extensions +--------------------- + +You can customize analysis by defining models (summaries, sinks, and sources) of your code's Go dependencies in data extension files. Each model defines the behavior of one or more elements of your library or framework, such as methods, properties, and callables. When you run dataflow analysis, these models expand the potential sources and sinks tracked by dataflow analysis and improve the precision of results. + +Most of the security queries search for paths from a source of untrusted input to a sink that represents a vulnerability. This is known as taint tracking. Each source is a starting point for dataflow analysis to track tainted data and each sink is an end point. + +Taint tracking queries also need to know how data can flow through elements that are not included in the source code. These are modeled as summaries. A summary model enables queries to synthesize the flow behavior through elements in dependency code that is not stored in your repository. + +Syntax used to define an element in an extension file +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each model of an element is defined using a data extension where each tuple constitutes a model. +A data extension file to extend the standard Go queries included with CodeQL is a YAML file with the form: + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: + data: + - + - + - ... + +Each YAML file may contain one or more top-level extensions. + +- ``addsTo`` defines the CodeQL pack name and extensible predicate that the extension is injected into. +- ``data`` defines one or more rows of tuples that are injected as values into the extensible predicate. The number of columns and their types must match the definition of the extensible predicate. + +Data extensions use union semantics, which means that the tuples of all extensions for a single extensible predicate are combined, duplicates are removed, and all of the remaining tuples are queryable by referencing the extensible predicate. + +Publish data extension files in a CodeQL model pack to share +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can group one or more data extension files into a CodeQL model pack and publish it to the GitHub Container Registry. This makes it easy for anyone to download the model pack and use it to extend their analysis. For more information, see `Creating a CodeQL model pack `__ and `Publishing and using CodeQL packs `__ in the CodeQL CLI documentation. + +Extensible predicates used to create custom models in Go +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The CodeQL library for Go analysis exposes the following extensible predicates: + +- ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models `." +- ``sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable. +- ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements. +- ``neutralModel(namespace, type, name, signature, kind, provenance)``. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such as ``manual`` or ``ai-manual``) can be used to override generated summary models (those with a provenance such as ``df-generated``), so that the summary model will be ignored. Other than that, neutral models have no effect. + +The extensible predicates are populated using the models defined in data extension files. + +Examples of custom model definitions +------------------------------------ + +The examples in this section are taken from the standard CodeQL Go query pack published by GitHub. They demonstrate how to add tuples to extend extensible predicates that are used by the standard queries. + +Example: Taint sink in the ``database/sql`` package +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This example shows how the Go query pack models the argument of the ``Prepare`` method as a SQL injection sink. +This is the ``Prepare`` method of the ``DB`` type in the ``database/sql`` package which creates a prepared statement. + +.. code-block:: go + + func Tainted(db *sql.DB, name string) { + stmt, err := db.Prepare("SELECT * FROM users WHERE name = " + name) // The argument to this method is a SQL injection sink. + ... + } + +We need to add a tuple to the ``sinkModel``\(namespace, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: sinkModel + data: + - ["database/sql", "DB", False, "Prepare", "(string)", "", "Argument[0]", "sql-injection", "manual"] + +Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. +The first five values identify the callable (in this case a method) to be modeled as a sink. + +- The first value ``database/sql`` is the package name. +- The second value ``DB`` is the name of the type that the method is associated with. +- The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. +- The fourth value ``Prepare`` is the method name. Constructors are named after the class. +- The fifth value ``(string)`` is the method input type signature. This value is often excluded and is simply set to an empty string since Go does not allow for a given type to have multiple methods with the same type. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the sink. + +- The seventh value ``Argument[0]`` is the ``access path`` to the first argument passed to the method, which means that this is the location of the sink. +- The eighth value ``sql-injection`` is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries. +- The ninth value ``manual`` is the provenance of the sink, which is used to identify the origin of the sink. + +Example: Taint source from the ``net`` package +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the Go query pack models the return value from the ``Listen`` method as a ``remote`` source. +This is the ``Listen`` function which is located in the ``net`` package. + +.. code-block:: go + + func Tainted() { + ln, err := net.Listen("tcp", ":8080") // The return value of this method is a remote source. + ... + } + + +We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: sourceModel + data: + - ["net", "", False, "Listen", "(string,string)", "", "ReturnValue", "remote", "manual"] + + +Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. +The first five values identify the callable (in this case a function) to be modeled as a source. + +- The first value ``net`` is the namespace name. +- The second value ``""`` is left blank, since the function is not a method of a type. +- The third value ``False`` is a flag that indicates whether or not the source also applies to all overrides of the method. +- The fourth value ``Listen`` is the function name. +- The fifth value ``(string,string)`` is the method input type signature. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. + +- The seventh value ``ReturnValue`` is the access path to the return of the method, which means that it is the return value that should be considered a source of tainted input. +- The eighth value ``remote`` is the kind of the source. The source kind is used to define the threat model where the source is in scope. ``remote`` applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses ``remote`` sources. For more information, see ":ref:`Threat models `." +- The ninth value ``manual`` is the provenance of the source, which is used to identify the origin of the source. + +Example: Add flow through the ``Join`` function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the Go query pack models flow through a method for a simple case. +This pattern covers many of the cases where we need to summarize flow through a function that is stored in a library or framework outside the repository. + +.. code-block:: go + + func TaintFlow() { + ss := []string{"Hello", "World"} + sep := " " + t := strings.Join(ss, sep) // There is taint flow from s1 and s2 to t. + ... + } + +We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: summaryModel + data: + - ["strings", "", False, "Join", "", "", "Argument[0]", "ReturnValue", "taint", "manual"] + - ["strings", "", False, "Join", "", "", "Argument[1]", "ReturnValue", "taint", "manual"] + +Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. +Each tuple defines flow from one argument to the return value. +The first row defines flow from the first argument (``ss`` in the example) to the return value (``t`` in the example) and the second row defines flow from the second argument (``sep`` in the example) to the return value (``t`` in the example). + +The first five values identify the callable (in this case a method) to be modeled as a summary. +These are the same for both of the rows above as we are adding two summaries for the same method. + +- The first value ``strings`` is the pacakge name. +- The second value ``""`` is left blank, since the function is not a method of a type. +- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The fourth value ``Join`` is the function name. +- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. + +- The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument (``ss`` in the example) and ``Argument[1]`` is the access path to the second argument (``sep`` in the example). +- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. +- The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. +- The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. + +It would also be possible to merge the two rows into one by using a comma-separated list in the seventh value. This would be useful if the method has many arguments and the flow is the same for all of them. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: summaryModel + data: + - ["strings", "", False, "Join", "", "", "Argument[0,1]", "ReturnValue", "taint", "manual"] + +This row defines flow from both the first and the second argument to the return value. The seventh value ``Argument[0,1]`` is shorthand for specifying an access path to both ``Argument[0]`` and ``Argument[1]``. + +Example: Add flow through the ``Hostname`` method +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the Go query pack models flow through a method for a simple case. + +.. code-block:: go + + func TaintFlow(u *url.URL) { + host := u.Hostname() // There is taint flow from u to s. + ... + } + +We need to add a tuple to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: summaryModel + data: + - ["net/url", "URL", False, "Hostname", "()", "", "Argument[this]", "ReturnValue", "taint", "manual"] + +Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. +Each tuple defines flow from one argument to the return value. +The first row defines flow from the qualifier of the method call (``u`` in the example) to the return value (``host`` in the example). + +The first five values identify the callable (in this case a method) to be modeled as a summary. +These are the same for both of the rows above as we are adding two summaries for the same method. + +- The first value ``net/url`` is the package name. +- The second value ``URL`` is the receiver type. +- The third value ``True`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The fourth value ``Hostname`` is the method name. +- The fifth value ``()`` is the method input type signature. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. + +- The seventh value is the access path to the input (where data flows from). ``Argument[this]`` is the access path to the qualifier (``u`` in the example). +- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. +- The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. +- The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. + +Example: Add flow through the ``Select`` method +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the C# query pack models a more complex flow through a method. +Here we model flow through higher order methods and collection types, as well as how to handle extension methods and generics. + +.. code-block:: csharp + + public static void TaintFlow(IEnumerable stream) { + IEnumerable lines = stream.Select(item => item + "\n"); + ... + } + +We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/csharp-all + extensible: summaryModel + data: + - ["System.Linq", "Enumerable", False, "Select", "(System.Collections.Generic.IEnumerable,System.Func)", "", "Argument[0].Element", "Argument[1].Parameter[0]", "value", "manual"] + - ["System.Linq", "Enumerable", False, "Select", "(System.Collections.Generic.IEnumerable,System.Func)", "", "Argument[1].ReturnValue", "ReturnValue.Element", "value", "manual"] + + +Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. +Each tuple defines part of the flow that comprises the total flow through the ``Select`` method. +The first five values identify the callable (in this case a method) to be modeled as a summary. +These are the same for both of the rows above as we are adding two summaries for the same method. + +- The first value ``System.Linq`` is the namespace name. +- The second value ``Enumerable`` is the class (type) name. +- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The fourth value ``Select`` is the method name, along with the type parameters for the method. The names of the generic type parameters provided in the model must match the names of the generic type parameters in the method signature in the source code. +- The fifth value ``(System.Collections.Generic.IEnumerable,System.Func)`` is the method input type signature. The generics in the signature must match the generics in the method signature in the source code. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary definition. + +- The seventh value is the access path to the ``input`` (where data flows from). +- The eighth value is the access path to the ``output`` (where data flows to). + +For the first row: + +- The seventh value is ``Argument[0].Element``, which is the access path to the elements of the qualifier (the elements of the enumerable ``stream`` in the example). +- The eight value is ``Argument[1].Parameter[0]``, which is the access path to the first parameter of the ``System.Func`` argument of ``Select`` (the lambda parameter ``item`` in the example). + +For the second row: + +- The seventh value is ``Argument[1].ReturnValue``, which is the access path to the return value of the ``System.Func`` argument of ``Select`` (the return value of the lambda in the example). +- The eighth value is ``ReturnValue.Element``, which is the access path to the elements of the return value of ``Select`` (the elements of the enumerable ``lines`` in the example). + +For the remaining values for both rows: + +- The ninth value ``value`` is the kind of the flow. ``value`` means that the value is preserved. +- The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. + +That is, the first row specifies that values can flow from the elements of the qualifier enumerable into the first argument of the function provided to ``Select``. The second row specifies that values can flow from the return value of the function to the elements of the enumerable returned from ``Select``. + +Example: Add a ``neutral`` method +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how we can model a method as being neutral with respect to flow. We will also cover how to model a property by modeling the getter of the ``Now`` property of the ``DateTime`` class as neutral. +A neutral model is used to define that there is no flow through a method. + +.. code-block:: csharp + + public static void TaintFlow() { + System.DateTime t = System.DateTime.Now; // There is no flow from Now to t. + ... + } + +We need to add a tuple to the ``neutralModel``\(namespace, type, name, signature, kind, provenance) extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/csharp-all + extensible: neutralModel + data: + - ["System", "DateTime", "get_Now", "()", "summary", "manual"] + + +Since we are adding a neutral model, we need to add tuples to the ``neutralModel`` extensible predicate. +The first four values identify the callable (in this case the getter of the ``Now`` property) to be modeled as a neutral, the fifth value is the kind, and the sixth value is the provenance (origin) of the neutral. + +- The first value ``System`` is the namespace name. +- The second value ``DateTime`` is the class (type) name. +- The third value ``get_Now`` is the method name. Getter and setter methods are named ``get_`` and ``set_`` respectively. +- The fourth value ``()`` is the method input type signature. +- The fifth value ``summary`` is the kind of the neutral. +- The sixth value ``manual`` is the provenance of the neutral. + +Example: Accessing the ``Body`` field of an HTTP request +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how we can model a field as a source of tainted data. + +.. code-block:: go + + func TaintFlow(w http.ResponseWriter, r *http.Request) { + body := r.Body // The Body field of an HTTP request is a source of tainted data. + ... + } + +We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: sourceModel + data: + - ["net/http", "Request", True, "Body", "", "", "", "remote", "manual"] + +Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. +The first five values identify the field to be modeled as a source. + +- The first value ``net/http`` is the package name. +- The second value ``Request`` is the name of the type that the field is associated with. +- The third value ``True`` is a flag that indicates whether or not the source also applies to all overrides of the field. +- The fourth value ``Body`` is the field name. +- The fifth value ``""`` is blank since it is a field access and field accesses do not have method signatures in Go. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. + +- The seventh value ``""`` is left blank. Leaving the access path of a source model blank indicates that it is a field access. +- The eighth value ``remote`` is the source kind. This indicates that the source is a remote source of untrusted data. +- The ninth value ``manual`` is the provenance of the source, which is used to identify the origin of the source. + +Package grouping +~~~~~~~~~~~~~~~~ + +Since Go uses URLs for package identifiers, it is possible for packages to be imported with different paths. For example, the ``glog`` package can be imported using both the ``github.com/golang/glog`` and ``gopkg.in/glog`` paths. + +To handle this, the CodeQL Go library uses a mapping from the package path to a name for the package. This mapping can be specified using the ``packageGrouping`` extensible predicate, and then the models for the APIs in the package +will use the group name in place of the package path. The package field in models will be the prefix ``group:`` followed by the group name. + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go + extensible: packageGrouping + data: + - ["glog", "github.com/golang/glog"] + - ["glog", "gopkg.in/glog"] + - addsTo: + pack: codeql/go + extensible: sinkModel + data: + - ["group:glog", "Info", "()", "Argument[0]", "log-injection", "manual"] + +.. _threat-models-go: + +Threat models +------------- + +.. include:: ../reusables/threat-model-description.rst From cfa1ad65c8d24c7900ffed90a69e7064c16c0ebc Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:00:32 -0400 Subject: [PATCH 02/25] Consistently replace usage of `namespace` with `package` Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 61861039f03f..c871c240dc7b 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -55,10 +55,10 @@ Extensible predicates used to create custom models in Go The CodeQL library for Go analysis exposes the following extensible predicates: -- ``sourceModel(namespace, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models `." -- ``sinkModel(namespace, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable. -- ``summaryModel(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements. -- ``neutralModel(namespace, type, name, signature, kind, provenance)``. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such as ``manual`` or ``ai-manual``) can be used to override generated summary models (those with a provenance such as ``df-generated``), so that the summary model will be ignored. Other than that, neutral models have no effect. +- ``sourceModel(package, type, subtypes, name, signature, ext, output, kind, provenance)``. This is used to model sources of potentially tainted data. The ``kind`` of the sources defined using this predicate determine which threat model they are associated with. Different threat models can be used to customize the sources used in an analysis. For more information, see ":ref:`Threat models `." +- ``sinkModel(package, type, subtypes, name, signature, ext, input, kind, provenance)``. This is used to model sinks where tainted data may be used in a way that makes the code vulnerable. +- ``summaryModel(package, type, subtypes, name, signature, ext, input, output, kind, provenance)``. This is used to model flow through elements. +- ``neutralModel(package, type, name, signature, kind, provenance)``. This is similar to a summary model but used to model the flow of values that have only a minor impact on the dataflow analysis. Manual neutral models (those with a provenance such as ``manual`` or ``ai-manual``) can be used to override generated summary models (those with a provenance such as ``df-generated``), so that the summary model will be ignored. Other than that, neutral models have no effect. The extensible predicates are populated using the models defined in data extension files. @@ -135,7 +135,7 @@ We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. The first five values identify the callable (in this case a function) to be modeled as a source. -- The first value ``net`` is the namespace name. +- The first value ``net`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. - The third value ``False`` is a flag that indicates whether or not the source also applies to all overrides of the method. - The fourth value ``Listen`` is the function name. From 211cda390d1212cbf6eb0717bb8e1b92014d5c02 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:01:45 -0400 Subject: [PATCH 03/25] Method signatures and receiver/qualifier language Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index c871c240dc7b..ac7f6d3cc3e8 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -89,7 +89,7 @@ We need to add a tuple to the ``sinkModel``\(namespace, type, subtypes, name, si pack: codeql/go-all extensible: sinkModel data: - - ["database/sql", "DB", False, "Prepare", "(string)", "", "Argument[0]", "sql-injection", "manual"] + - ["database/sql", "DB", False, "Prepare", "", "", "Argument[0]", "sql-injection", "manual"] Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. The first five values identify the callable (in this case a method) to be modeled as a sink. @@ -98,7 +98,7 @@ The first five values identify the callable (in this case a method) to be modele - The second value ``DB`` is the name of the type that the method is associated with. - The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. - The fourth value ``Prepare`` is the method name. Constructors are named after the class. -- The fifth value ``(string)`` is the method input type signature. This value is often excluded and is simply set to an empty string since Go does not allow for a given type to have multiple methods with the same type. +- The fifth value ``""`` is the method input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the sink. @@ -228,7 +228,7 @@ We need to add a tuple to the ``summaryModel``\(namespace, type, subtypes, name, pack: codeql/go-all extensible: summaryModel data: - - ["net/url", "URL", False, "Hostname", "()", "", "Argument[this]", "ReturnValue", "taint", "manual"] + - ["net/url", "URL", False, "Hostname", "", "", "Argument[receiver]", "ReturnValue", "taint", "manual"] Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. @@ -241,12 +241,12 @@ These are the same for both of the rows above as we are adding two summaries for - The second value ``URL`` is the receiver type. - The third value ``True`` is a flag that indicates whether or not the summary also applies to all overrides of the method. - The fourth value ``Hostname`` is the method name. -- The fifth value ``()`` is the method input type signature. +- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. -- The seventh value is the access path to the input (where data flows from). ``Argument[this]`` is the access path to the qualifier (``u`` in the example). +- The seventh value is the access path to the input (where data flows from). ``Argument[receiver]`` is the access path to the receiver (``u`` in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. - The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. From 9b92ff7e78c6ccb7ed3629fe866a61915f5d8bd4 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:02:24 -0400 Subject: [PATCH 04/25] Typos and minor wording Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index ac7f6d3cc3e8..9bd619511a37 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -97,7 +97,7 @@ The first five values identify the callable (in this case a method) to be modele - The first value ``database/sql`` is the package name. - The second value ``DB`` is the name of the type that the method is associated with. - The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. -- The fourth value ``Prepare`` is the method name. Constructors are named after the class. +- The fourth value ``Prepare`` is the method name. - The fifth value ``""`` is the method input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. @@ -158,7 +158,7 @@ This pattern covers many of the cases where we need to summarize flow through a func TaintFlow() { ss := []string{"Hello", "World"} sep := " " - t := strings.Join(ss, sep) // There is taint flow from s1 and s2 to t. + t := strings.Join(ss, sep) // There is taint flow from ss and sep to t. ... } @@ -235,7 +235,6 @@ Each tuple defines flow from one argument to the return value. The first row defines flow from the qualifier of the method call (``u`` in the example) to the return value (``host`` in the example). The first five values identify the callable (in this case a method) to be modeled as a summary. -These are the same for both of the rows above as we are adding two summaries for the same method. - The first value ``net/url`` is the package name. - The second value ``URL`` is the receiver type. @@ -346,7 +345,7 @@ The first four values identify the callable (in this case the getter of the ``No Example: Accessing the ``Body`` field of an HTTP request ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This example shows how we can model a field as a source of tainted data. +This example shows how we can model a field read as a source of tainted data. .. code-block:: go @@ -387,7 +386,7 @@ Package grouping Since Go uses URLs for package identifiers, it is possible for packages to be imported with different paths. For example, the ``glog`` package can be imported using both the ``github.com/golang/glog`` and ``gopkg.in/glog`` paths. -To handle this, the CodeQL Go library uses a mapping from the package path to a name for the package. This mapping can be specified using the ``packageGrouping`` extensible predicate, and then the models for the APIs in the package +To handle this, the CodeQL Go library uses a mapping from the package path to a group name for the package. This mapping can be specified using the ``packageGrouping`` extensible predicate, and then the models for the APIs in the package will use the group name in place of the package path. The package field in models will be the prefix ``group:`` followed by the group name. .. code-block:: yaml @@ -403,7 +402,7 @@ will use the group name in place of the package path. The package field in model pack: codeql/go extensible: sinkModel data: - - ["group:glog", "Info", "()", "Argument[0]", "log-injection", "manual"] + - ["group:glog", "", False, "Info", "", "", "Argument[0]", "log-injection", "manual"] .. _threat-models-go: From 2bfca21a2ff8c0ee689a9bd9057d05baacf8b3be Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:04:42 -0400 Subject: [PATCH 05/25] Replace `ss` with `elems` --- .../customizing-library-models-for-go.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 9bd619511a37..36bfd6c9d9e8 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -156,9 +156,9 @@ This pattern covers many of the cases where we need to summarize flow through a .. code-block:: go func TaintFlow() { - ss := []string{"Hello", "World"} + elems := []string{"Hello", "World"} sep := " " - t := strings.Join(ss, sep) // There is taint flow from ss and sep to t. + t := strings.Join(elems, sep) // There is taint flow from ss and sep to t. ... } @@ -176,7 +176,7 @@ We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. -The first row defines flow from the first argument (``ss`` in the example) to the return value (``t`` in the example) and the second row defines flow from the second argument (``sep`` in the example) to the return value (``t`` in the example). +The first row defines flow from the first argument (``elems`` in the example) to the return value (``t`` in the example) and the second row defines flow from the second argument (``sep`` in the example) to the return value (``t`` in the example). The first five values identify the callable (in this case a method) to be modeled as a summary. These are the same for both of the rows above as we are adding two summaries for the same method. @@ -190,7 +190,7 @@ These are the same for both of the rows above as we are adding two summaries for The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. -- The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument (``ss`` in the example) and ``Argument[1]`` is the access path to the second argument (``sep`` in the example). +- The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument (``elems`` in the example) and ``Argument[1]`` is the access path to the second argument (``sep`` in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. - The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. From 27ad882f543ff967225bf1ca29d2d193f966b357 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:05:33 -0400 Subject: [PATCH 06/25] Usage range pattern instead of comma separation Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 36bfd6c9d9e8..50f832f2ca5f 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -195,7 +195,7 @@ The remaining values are used to define the ``access path``, the ``kind``, and t - The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. -It would also be possible to merge the two rows into one by using a comma-separated list in the seventh value. This would be useful if the method has many arguments and the flow is the same for all of them. +It would also be possible to merge the two rows into one by using ".." to indicate a range in the seventh value. This would be useful if the method has many arguments and the flow is the same for all of them. .. code-block:: yaml @@ -204,9 +204,9 @@ It would also be possible to merge the two rows into one by using a comma-separa pack: codeql/go-all extensible: summaryModel data: - - ["strings", "", False, "Join", "", "", "Argument[0,1]", "ReturnValue", "taint", "manual"] + - ["strings", "", False, "Join", "", "", "Argument[0..1]", "ReturnValue", "taint", "manual"] -This row defines flow from both the first and the second argument to the return value. The seventh value ``Argument[0,1]`` is shorthand for specifying an access path to both ``Argument[0]`` and ``Argument[1]``. +This row defines flow from both the first and the second argument to the return value. The seventh value ``Argument[0..1]`` is shorthand for specifying an access path to both ``Argument[0]`` and ``Argument[1]``. Example: Add flow through the ``Hostname`` method ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From e8aac2be9aed061990aaf2276f44f90c920d087b Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:06:58 -0400 Subject: [PATCH 07/25] Remove `neutral` example Go currently does not use `neutralModel`s and they are less relevant for Go than for Java/C#. --- .../customizing-library-models-for-go.rst | 34 ------------------- 1 file changed, 34 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 50f832f2ca5f..2ed3ab3e6591 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -309,40 +309,6 @@ For the remaining values for both rows: That is, the first row specifies that values can flow from the elements of the qualifier enumerable into the first argument of the function provided to ``Select``. The second row specifies that values can flow from the return value of the function to the elements of the enumerable returned from ``Select``. -Example: Add a ``neutral`` method -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This example shows how we can model a method as being neutral with respect to flow. We will also cover how to model a property by modeling the getter of the ``Now`` property of the ``DateTime`` class as neutral. -A neutral model is used to define that there is no flow through a method. - -.. code-block:: csharp - - public static void TaintFlow() { - System.DateTime t = System.DateTime.Now; // There is no flow from Now to t. - ... - } - -We need to add a tuple to the ``neutralModel``\(namespace, type, name, signature, kind, provenance) extensible predicate by updating a data extension file. - -.. code-block:: yaml - - extensions: - - addsTo: - pack: codeql/csharp-all - extensible: neutralModel - data: - - ["System", "DateTime", "get_Now", "()", "summary", "manual"] - - -Since we are adding a neutral model, we need to add tuples to the ``neutralModel`` extensible predicate. -The first four values identify the callable (in this case the getter of the ``Now`` property) to be modeled as a neutral, the fifth value is the kind, and the sixth value is the provenance (origin) of the neutral. - -- The first value ``System`` is the namespace name. -- The second value ``DateTime`` is the class (type) name. -- The third value ``get_Now`` is the method name. Getter and setter methods are named ``get_`` and ``set_`` respectively. -- The fourth value ``()`` is the method input type signature. -- The fifth value ``summary`` is the kind of the neutral. -- The sixth value ``manual`` is the provenance of the neutral. - Example: Accessing the ``Body`` field of an HTTP request ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how we can model a field read as a source of tainted data. From e142818fe5f6c03a166b743bed8d8dc7835cd337 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:08:50 -0400 Subject: [PATCH 08/25] Remove `Select` example. Go does not currently have any equivalent with regards to lambda flow --- .../customizing-library-models-for-go.rst | 59 ------------------- 1 file changed, 59 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 2ed3ab3e6591..d11eda01e2dc 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -250,65 +250,6 @@ The remaining values are used to define the ``access path``, the ``kind``, and t - The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. -Example: Add flow through the ``Select`` method -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This example shows how the C# query pack models a more complex flow through a method. -Here we model flow through higher order methods and collection types, as well as how to handle extension methods and generics. - -.. code-block:: csharp - - public static void TaintFlow(IEnumerable stream) { - IEnumerable lines = stream.Select(item => item + "\n"); - ... - } - -We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: - -.. code-block:: yaml - - extensions: - - addsTo: - pack: codeql/csharp-all - extensible: summaryModel - data: - - ["System.Linq", "Enumerable", False, "Select", "(System.Collections.Generic.IEnumerable,System.Func)", "", "Argument[0].Element", "Argument[1].Parameter[0]", "value", "manual"] - - ["System.Linq", "Enumerable", False, "Select", "(System.Collections.Generic.IEnumerable,System.Func)", "", "Argument[1].ReturnValue", "ReturnValue.Element", "value", "manual"] - - -Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. -Each tuple defines part of the flow that comprises the total flow through the ``Select`` method. -The first five values identify the callable (in this case a method) to be modeled as a summary. -These are the same for both of the rows above as we are adding two summaries for the same method. - -- The first value ``System.Linq`` is the namespace name. -- The second value ``Enumerable`` is the class (type) name. -- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. -- The fourth value ``Select`` is the method name, along with the type parameters for the method. The names of the generic type parameters provided in the model must match the names of the generic type parameters in the method signature in the source code. -- The fifth value ``(System.Collections.Generic.IEnumerable,System.Func)`` is the method input type signature. The generics in the signature must match the generics in the method signature in the source code. - -The sixth value should be left empty and is out of scope for this documentation. -The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary definition. - -- The seventh value is the access path to the ``input`` (where data flows from). -- The eighth value is the access path to the ``output`` (where data flows to). - -For the first row: - -- The seventh value is ``Argument[0].Element``, which is the access path to the elements of the qualifier (the elements of the enumerable ``stream`` in the example). -- The eight value is ``Argument[1].Parameter[0]``, which is the access path to the first parameter of the ``System.Func`` argument of ``Select`` (the lambda parameter ``item`` in the example). - -For the second row: - -- The seventh value is ``Argument[1].ReturnValue``, which is the access path to the return value of the ``System.Func`` argument of ``Select`` (the return value of the lambda in the example). -- The eighth value is ``ReturnValue.Element``, which is the access path to the elements of the return value of ``Select`` (the elements of the enumerable ``lines`` in the example). - -For the remaining values for both rows: - -- The ninth value ``value`` is the kind of the flow. ``value`` means that the value is preserved. -- The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. - -That is, the first row specifies that values can flow from the elements of the qualifier enumerable into the first argument of the function provided to ``Select``. The second row specifies that values can flow from the return value of the function to the elements of the enumerable returned from ``Select``. - Example: Accessing the ``Body`` field of an HTTP request ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how we can model a field read as a source of tainted data. From de2f8a15770166ec190fbdf714548162d3eaf054 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:09:47 -0400 Subject: [PATCH 09/25] Make field consistent with existing model --- .../customizing-library-models-for-go.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index d11eda01e2dc..f7ad0aebbcaa 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -228,7 +228,7 @@ We need to add a tuple to the ``summaryModel``\(namespace, type, subtypes, name, pack: codeql/go-all extensible: summaryModel data: - - ["net/url", "URL", False, "Hostname", "", "", "Argument[receiver]", "ReturnValue", "taint", "manual"] + - ["net/url", "URL", True, "Hostname", "", "", "Argument[receiver]", "ReturnValue", "taint", "manual"] Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. Each tuple defines flow from one argument to the return value. From a99dd69d87f28445f948a88b76ceb1bc5f618c76 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:12:07 -0400 Subject: [PATCH 10/25] Remove function signature --- .../customizing-library-models-for-go.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index f7ad0aebbcaa..4448b8d79687 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -129,7 +129,7 @@ We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, pack: codeql/go-all extensible: sourceModel data: - - ["net", "", False, "Listen", "(string,string)", "", "ReturnValue", "remote", "manual"] + - ["net", "", False, "Listen", "", "", "ReturnValue", "remote", "manual"] Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. @@ -139,7 +139,7 @@ The first five values identify the callable (in this case a function) to be mode - The second value ``""`` is left blank, since the function is not a method of a type. - The third value ``False`` is a flag that indicates whether or not the source also applies to all overrides of the method. - The fourth value ``Listen`` is the function name. -- The fifth value ``(string,string)`` is the method input type signature. +- The fifth value ``""`` is the function input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. From cc6b09da4898f02b201beafa6315322674a705fc Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 17:16:29 -0400 Subject: [PATCH 11/25] Fix name of section --- .../customizing-library-models-for-go.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 4448b8d79687..3701a41aa02f 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -1,4 +1,4 @@ -.. _customizing-library-models-for-csharp: +.. _customizing-library-models-for-go: Customizing library models for Go ================================= From 107948603277f597ff1ea70ff498a1798fbca0c9 Mon Sep 17 00:00:00 2001 From: Ed Minnix Date: Tue, 20 Aug 2024 17:31:20 -0400 Subject: [PATCH 12/25] Mention Go in codeql-for-go toctree --- docs/codeql/codeql-language-guides/codeql-for-go.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/codeql/codeql-language-guides/codeql-for-go.rst b/docs/codeql/codeql-language-guides/codeql-for-go.rst index 0eaefbb59226..040a4e3b6d35 100644 --- a/docs/codeql/codeql-language-guides/codeql-for-go.rst +++ b/docs/codeql/codeql-language-guides/codeql-for-go.rst @@ -12,6 +12,7 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat codeql-library-for-go abstract-syntax-tree-classes-for-working-with-go-programs modeling-data-flow-in-go-libraries + customizing-library-models-for-go - :doc:`Basic query for Go code `: Learn to write and run a simple CodeQL query. @@ -21,3 +22,5 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat - :doc:`Modeling data flow in Go libraries `: When analyzing a Go program, CodeQL does not examine the source code for external packages. To track the flow of untrusted data through a library, you can create a model of the library. + +- :doc:`Customizing library models for Go `: You can model frameworks and libraries that your codebase depends on using data extensions and publish them as CodeQL model packs. From 8b73d4af869c631daf04085494a70372d660078d Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 20 Aug 2024 21:19:11 -0400 Subject: [PATCH 13/25] Fix typo Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 3701a41aa02f..72f80d4b6061 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -158,7 +158,7 @@ This pattern covers many of the cases where we need to summarize flow through a func TaintFlow() { elems := []string{"Hello", "World"} sep := " " - t := strings.Join(elems, sep) // There is taint flow from ss and sep to t. + t := strings.Join(elems, sep) // There is taint flow from elems and sep to t. ... } From 1e1bbe92a3d25ad8ed53994d26c28cd5faa27316 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Wed, 21 Aug 2024 18:12:40 -0400 Subject: [PATCH 14/25] Wording and typo Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 26 +++++++++---------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 72f80d4b6061..47e98871c0b0 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -3,7 +3,7 @@ Customizing library models for Go ================================= -You can model the methods and callables that control data flow in any framework or library. This is especially useful for custom frameworks or niche libraries, that are not supported by the standard CodeQL libraries. +You can model the methods and functions that control data flow in any framework or library. This is especially useful for custom frameworks or niche libraries, that are not supported by the standard CodeQL libraries. .. include:: ../reusables/beta-note-customizing-library-models.rst @@ -15,7 +15,7 @@ This article contains reference material about how to define custom models for s About data extensions --------------------- -You can customize analysis by defining models (summaries, sinks, and sources) of your code's Go dependencies in data extension files. Each model defines the behavior of one or more elements of your library or framework, such as methods, properties, and callables. When you run dataflow analysis, these models expand the potential sources and sinks tracked by dataflow analysis and improve the precision of results. +You can customize analysis by defining models (summaries, sinks, and sources) of your code's Go dependencies in data extension files. Each model defines the behavior of one or more elements of your library or framework, such as functions, methods, and fields. When you run dataflow analysis, these models expand the potential sources and sinks tracked by dataflow analysis and improve the precision of results. Most of the security queries search for paths from a source of untrusted input to a sink that represents a vulnerability. This is known as taint tracking. Each source is a starting point for dataflow analysis to track tainted data and each sink is an end point. @@ -80,7 +80,7 @@ This is the ``Prepare`` method of the ``DB`` type in the ``database/sql`` packag ... } -We need to add a tuple to the ``sinkModel``\(namespace, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sinkModel``\(package, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate by updating a data extension file. .. code-block:: yaml @@ -92,7 +92,7 @@ We need to add a tuple to the ``sinkModel``\(namespace, type, subtypes, name, si - ["database/sql", "DB", False, "Prepare", "", "", "Argument[0]", "sql-injection", "manual"] Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. -The first five values identify the callable (in this case a method) to be modeled as a sink. +The first five values identify the function (in this case a method) to be modeled as a sink. - The first value ``database/sql`` is the package name. - The second value ``DB`` is the name of the type that the method is associated with. @@ -120,7 +120,7 @@ This is the ``Listen`` function which is located in the ``net`` package. } -We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. .. code-block:: yaml @@ -133,7 +133,7 @@ We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. -The first five values identify the callable (in this case a function) to be modeled as a source. +The first five values identify the function to be modeled as a source. - The first value ``net`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. @@ -162,7 +162,7 @@ This pattern covers many of the cases where we need to summarize flow through a ... } -We need to add tuples to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add tuples to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: .. code-block:: yaml @@ -178,10 +178,10 @@ Since we are adding flow through a method, we need to add tuples to the ``summar Each tuple defines flow from one argument to the return value. The first row defines flow from the first argument (``elems`` in the example) to the return value (``t`` in the example) and the second row defines flow from the second argument (``sep`` in the example) to the return value (``t`` in the example). -The first five values identify the callable (in this case a method) to be modeled as a summary. +The first five values identify the function to be modeled as a summary. These are the same for both of the rows above as we are adding two summaries for the same method. -- The first value ``strings`` is the pacakge name. +- The first value ``strings`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. - The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. - The fourth value ``Join`` is the function name. @@ -219,7 +219,7 @@ This example shows how the Go query pack models flow through a method for a simp ... } -We need to add a tuple to the ``summaryModel``\(namespace, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: +We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: .. code-block:: yaml @@ -234,7 +234,7 @@ Since we are adding flow through a method, we need to add tuples to the ``summar Each tuple defines flow from one argument to the return value. The first row defines flow from the qualifier of the method call (``u`` in the example) to the return value (``host`` in the example). -The first five values identify the callable (in this case a method) to be modeled as a summary. +The first five values identify the function (in this case a method) to be modeled as a summary. - The first value ``net/url`` is the package name. - The second value ``URL`` is the receiver type. @@ -261,7 +261,7 @@ This example shows how we can model a field read as a source of tainted data. ... } -We need to add a tuple to the ``sourceModel``\(namespace, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. +We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate by updating a data extension file. .. code-block:: yaml @@ -294,7 +294,7 @@ Package grouping Since Go uses URLs for package identifiers, it is possible for packages to be imported with different paths. For example, the ``glog`` package can be imported using both the ``github.com/golang/glog`` and ``gopkg.in/glog`` paths. To handle this, the CodeQL Go library uses a mapping from the package path to a group name for the package. This mapping can be specified using the ``packageGrouping`` extensible predicate, and then the models for the APIs in the package -will use the group name in place of the package path. The package field in models will be the prefix ``group:`` followed by the group name. +will use the the prefix ``group:`` followed by the group name in place of the package path. .. code-block:: yaml From 2757b0ba6e62480fcce8fd0eb34a41002c6a1792 Mon Sep 17 00:00:00 2001 From: Ed Minnix Date: Wed, 21 Aug 2024 18:35:19 -0400 Subject: [PATCH 15/25] Change example to net/http Request::FormValue --- .../customizing-library-models-for-go.rst | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 47e98871c0b0..aabe440ade74 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -107,15 +107,15 @@ The remaining values are used to define the ``access path``, the ``kind``, and t - The eighth value ``sql-injection`` is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries. - The ninth value ``manual`` is the provenance of the sink, which is used to identify the origin of the sink. -Example: Taint source from the ``net`` package -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This example shows how the Go query pack models the return value from the ``Listen`` method as a ``remote`` source. -This is the ``Listen`` function which is located in the ``net`` package. +Example: Taint source from the ``net/http`` package +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This example shows how the Go query pack models the return value from the ``FormValue`` method as a ``remote`` source. +This is the ``FormValue`` method of the ``Request`` struct which is located in the ``net/http`` package. .. code-block:: go - func Tainted() { - ln, err := net.Listen("tcp", ":8080") // The return value of this method is a remote source. + func Tainted(r *http.Request) { + name := r.FormValue("name") // The return value of this method is a source of tainted data. ... } @@ -129,16 +129,16 @@ We need to add a tuple to the ``sourceModel``\(package, type, subtypes, name, si pack: codeql/go-all extensible: sourceModel data: - - ["net", "", False, "Listen", "", "", "ReturnValue", "remote", "manual"] + - ["net/http", "Request", True, "FormValue", "", "", "ReturnValue", "remote", "manual"] Since we are adding a new source, we need to add a tuple to the ``sourceModel`` extensible predicate. The first five values identify the function to be modeled as a source. -- The first value ``net`` is the package name. -- The second value ``""`` is left blank, since the function is not a method of a type. -- The third value ``False`` is a flag that indicates whether or not the source also applies to all overrides of the method. -- The fourth value ``Listen`` is the function name. +- The first value ``net/http`` is the package name. +- The second value ``Request`` is the type name, since the function is a method of the ``Request`` type. +- The third value ``True`` is a flag that indicates whether or not the source also applies to all overrides of the method. +- The fourth value ``FormValue`` is the function name. - The fifth value ``""`` is the function input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. From 7e98d02d56dd7ebaad0306f3654521939d15cc1f Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Thu, 22 Aug 2024 08:51:30 -0400 Subject: [PATCH 16/25] Wording Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index aabe440ade74..b55707cbf879 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -110,7 +110,7 @@ The remaining values are used to define the ``access path``, the ``kind``, and t Example: Taint source from the ``net/http`` package ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how the Go query pack models the return value from the ``FormValue`` method as a ``remote`` source. -This is the ``FormValue`` method of the ``Request`` struct which is located in the ``net/http`` package. +This is the ``FormValue`` method of the ``Request`` type which is located in the ``net/http`` package. .. code-block:: go From 9b43b4994e10025c7694c79b1dc1e949bc3350fe Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Thu, 22 Aug 2024 08:52:02 -0400 Subject: [PATCH 17/25] `fixed-version:` example Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index b55707cbf879..d79300c522bd 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -288,6 +288,24 @@ The remaining values are used to define the ``access path``, the ``kind``, and t - The eighth value ``remote`` is the source kind. This indicates that the source is a remote source of untrusted data. - The ninth value ``manual`` is the provenance of the source, which is used to identify the origin of the source. +Package versions +~~~~~~~~~~~~~~~~ + +When the major version number is greater than 1 it is included in the package import path. It usually looks like ``/v2`` after the module import path. This is called the major version suffix. We normally want our models to apply to all versions of a package. Rather than having to repeat models with the package column changed to include all available versions, we can just use the package name without the major version suffix and this will be matched to any version. So models with ``github.com/couchbase/gocb`` in the package column will match packages imported from ``github.com/couchbase/gocb`` and ``github.com/couchbase/gocb/v2`` (or any other version). + +Note that packages hosted at ``gopkg.in`` use a slightly different syntax: the major version suffix looks like ``.v2``, and it is present even for version 1. This is also supported. So models with ``gopkg.in/yaml`` in the package column will match packages imported from ``gopkg.in/yaml.v1``, ``gopkg.in/yaml.v2`` and ``gopkg.in/yaml.v3``. + +To write models that only apply to ``github.com/couchbase/gocb/v2``, it is sufficient to include the major version suffix (``/v2``) in the package column. To write models that only apply to ``github.com/couchbase/gocb``, you may prefix the package column with ``fixed-version:``. For example, here are two models for a method that has changed name from v1 to v2. + +.. code-block:: yaml + extensions: + - addsTo: + pack: codeql/go-all + extensible: sinkModel + data: + - ["fixed-version:github.com/couchbase/gocb", "Cluster", True, "ExecuteAnalyticsQuery", "", "", "Argument[0]", "nosql-injection", "manual"] + - ["github.com/couchbase/gocb/v2", "Cluster", True, "AnalyticsQuery", "", "", "Argument[0]", "nosql-injection", "manual"] + Package grouping ~~~~~~~~~~~~~~~~ From bf11e2cd0fec58c8f5ae562a39d42f1cf641015a Mon Sep 17 00:00:00 2001 From: Ed Minnix Date: Thu, 22 Aug 2024 08:57:54 -0400 Subject: [PATCH 18/25] Fix code block --- .../customizing-library-models-for-go.rst | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index d79300c522bd..bf56447fce15 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -298,13 +298,14 @@ Note that packages hosted at ``gopkg.in`` use a slightly different syntax: the m To write models that only apply to ``github.com/couchbase/gocb/v2``, it is sufficient to include the major version suffix (``/v2``) in the package column. To write models that only apply to ``github.com/couchbase/gocb``, you may prefix the package column with ``fixed-version:``. For example, here are two models for a method that has changed name from v1 to v2. .. code-block:: yaml - extensions: - - addsTo: - pack: codeql/go-all - extensible: sinkModel - data: - - ["fixed-version:github.com/couchbase/gocb", "Cluster", True, "ExecuteAnalyticsQuery", "", "", "Argument[0]", "nosql-injection", "manual"] - - ["github.com/couchbase/gocb/v2", "Cluster", True, "AnalyticsQuery", "", "", "Argument[0]", "nosql-injection", "manual"] + + extensions: + - addsTo: + pack: codeql/go-all + extensible: sinkModel + data: + - ["fixed-version:github.com/couchbase/gocb", "Cluster", True, "ExecuteAnalyticsQuery", "", "", "Argument[0]", "nosql-injection", "manual"] + - ["github.com/couchbase/gocb/v2", "Cluster", True, "AnalyticsQuery", "", "", "Argument[0]", "nosql-injection", "manual"] Package grouping ~~~~~~~~~~~~~~~~ From 72107867215ab4254b83dbe41499118128f1152c Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Sun, 24 Nov 2024 21:24:24 -0500 Subject: [PATCH 19/25] Subtypes/overrides documentation Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index bf56447fce15..d5a092d7dcfb 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -89,14 +89,14 @@ We need to add a tuple to the ``sinkModel``\(package, type, subtypes, name, sign pack: codeql/go-all extensible: sinkModel data: - - ["database/sql", "DB", False, "Prepare", "", "", "Argument[0]", "sql-injection", "manual"] + - ["database/sql", "DB", True, "Prepare", "", "", "Argument[0]", "sql-injection", "manual"] Since we want to add a new sink, we need to add a tuple to the ``sinkModel`` extensible predicate. The first five values identify the function (in this case a method) to be modeled as a sink. - The first value ``database/sql`` is the package name. - The second value ``DB`` is the name of the type that the method is associated with. -- The third value ``False`` is a flag that indicates whether or not the sink also applies to all overrides of the method. +- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``Prepare`` is the method name. - The fifth value ``""`` is the method input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. @@ -137,7 +137,7 @@ The first five values identify the function to be modeled as a source. - The first value ``net/http`` is the package name. - The second value ``Request`` is the type name, since the function is a method of the ``Request`` type. -- The third value ``True`` is a flag that indicates whether or not the source also applies to all overrides of the method. +- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``FormValue`` is the function name. - The fifth value ``""`` is the function input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. @@ -183,7 +183,7 @@ These are the same for both of the rows above as we are adding two summaries for - The first value ``strings`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. -- The third value ``False`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. - The fourth value ``Join`` is the function name. - The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. @@ -238,7 +238,7 @@ The first five values identify the function (in this case a method) to be modele - The first value ``net/url`` is the package name. - The second value ``URL`` is the receiver type. -- The third value ``True`` is a flag that indicates whether or not the summary also applies to all overrides of the method. +- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``Hostname`` is the method name. - The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. @@ -277,7 +277,7 @@ The first five values identify the field to be modeled as a source. - The first value ``net/http`` is the package name. - The second value ``Request`` is the name of the type that the field is associated with. -- The third value ``True`` is a flag that indicates whether or not the source also applies to all overrides of the field. +- The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. For fields this means when the field is accessed as a promoted field in another type. - The fourth value ``Body`` is the field name. - The fifth value ``""`` is blank since it is a field access and field accesses do not have method signatures in Go. From fb04e39935cd1e780a880097302a375d364d5e9c Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Sun, 24 Nov 2024 21:24:53 -0500 Subject: [PATCH 20/25] `ReturnValue[i]` text Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index d5a092d7dcfb..4458613a976f 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -246,7 +246,7 @@ The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[receiver]`` is the access path to the receiver (``u`` in the example). -- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. +- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. When there are multiple return values, use `ReturnValue[i]` to refer to the `i`th return value (starting from 0). - The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. From 940a99db3b63bf0ceb773618a6577982d92d7618 Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Sun, 24 Nov 2024 21:25:09 -0500 Subject: [PATCH 21/25] Fix typo Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 4458613a976f..08bc8e29e190 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -215,7 +215,7 @@ This example shows how the Go query pack models flow through a method for a simp .. code-block:: go func TaintFlow(u *url.URL) { - host := u.Hostname() // There is taint flow from u to s. + host := u.Hostname() // There is taint flow from u to host. ... } From 460df89f28d58ae853697a3530b83189aebea7ec Mon Sep 17 00:00:00 2001 From: Ed Minnix Date: Mon, 25 Nov 2024 21:56:52 -0500 Subject: [PATCH 22/25] Add ``slices.Max`` example --- .../customizing-library-models-for-go.rst | 47 +++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 08bc8e29e190..5e96a2f314d6 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -148,6 +148,53 @@ The remaining values are used to define the ``access path``, the ``kind``, and t - The eighth value ``remote`` is the kind of the source. The source kind is used to define the threat model where the source is in scope. ``remote`` applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses ``remote`` sources. For more information, see ":ref:`Threat models `." - The ninth value ``manual`` is the provenance of the source, which is used to identify the origin of the source. +Example: Add flow through the ``Max`` function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This example shows how the Go query pack models flow through a function for a simple case. +This pattern covers many of the cases where we need to summarize flow through a function that is stored in a library or framework outside the repository. + +.. code-block:: go + + import "slices" + + func ValueFlow { + a := []int{1, 2, 3} + max := slices.Max(a) // There is taint flow from `a` to `max`. + ... + } + +We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: summaryModel + data: + - ["slices", "", True, "Max", "", "", "Argument[0].ArrayElement", "ReturnValue", "value", "manual"] + +Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. +Each tuple defines flow from one argument to the return value. +The first row defines flow from the first argument (``a`` in the example) to the return value (``max`` in the example). + +The first five values identify the function to be modeled as a summary. +These are the same for both of the rows above as we are adding two summaries for the same method. + +- The first value ``slices`` is the package name. +- The second value ``""`` is left blank, since the function is not a method of a type. +- The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. +- The fourth value ``Max`` is the function name. +- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. + +- The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument. +- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. +- The ninth value ``value`` is the kind of the flow. ``value`` flow indicates an entire value is moved, ``taint`` means that taint is propagated through the call. +- The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. Example: Add flow through the ``Join`` function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how the Go query pack models flow through a method for a simple case. From 96a796585f849ee863632741da90255f799e5d3d Mon Sep 17 00:00:00 2001 From: Ed Minnix Date: Mon, 25 Nov 2024 21:57:09 -0500 Subject: [PATCH 23/25] fix formatting issue --- .../customizing-library-models-for-go.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 5e96a2f314d6..f02c79b397ec 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -293,7 +293,7 @@ The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. - The seventh value is the access path to the input (where data flows from). ``Argument[receiver]`` is the access path to the receiver (``u`` in the example). -- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. When there are multiple return values, use `ReturnValue[i]` to refer to the `i`th return value (starting from 0). +- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. When there are multiple return values, use ``ReturnValue[i]`` to refer to the ``i`` th return value (starting from 0). - The ninth value ``taint`` is the kind of the flow. ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. From 8c6e08c94e83e96d7c687a101790600cdb793fc3 Mon Sep 17 00:00:00 2001 From: Ed Minnix Date: Mon, 25 Nov 2024 21:57:24 -0500 Subject: [PATCH 24/25] Add ``slices.Concat`` example --- .../customizing-library-models-for-go.rst | 50 +++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index f02c79b397ec..5a200274f4a4 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -195,6 +195,56 @@ The remaining values are used to define the ``access path``, the ``kind``, and t - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. - The ninth value ``value`` is the kind of the flow. ``value`` flow indicates an entire value is moved, ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. + +Example: Add flow through the ``Concat`` function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This example shows how the Go query pack models flow through a function for a simple case. +This pattern covers many of the cases where we need to summarize flow through a function that is stored in a library or framework outside the repository. + +.. code-block:: go + + import "slices" + + func ValueFlow { + a := []int{1, 2, 3} + b := []int{4, 5, 6} + c := slices.Concat(a, b) // There is taint flow from `a` and `b` to `c`. + ... + } + +We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate by updating a data extension file: + +.. code-block:: yaml + + extensions: + - addsTo: + pack: codeql/go-all + extensible: summaryModel + data: + - ["slices", "", True, "Concat", "", "", "Argument[0].ArrayElement.ArrayElement", "ReturnValue.ArrayElement", "value", "manual"] + +Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. +Each tuple defines flow from one argument to the return value. +The first row defines flow from the arguments (``a`` and ``b`` in the example) to the return value (``c`` in the example) and the second row defines flow from the second argument (``sep`` in the example) to the return value (``t`` in the example). + +The first five values identify the function to be modeled as a summary. +These are the same for both of the rows above as we are adding two summaries for the same method. + +- The first value ``slices`` is the package name. +- The second value ``""`` is left blank, since the function is not a method of a type. +- The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. +- The fourth value ``Max`` is the function name. +- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. + +The sixth value should be left empty and is out of scope for this documentation. +The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. + +- The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument. +- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. +- The ninth value ``value`` is the kind of the flow. ``value`` flow indicates an entire value is moved, ``taint`` means that taint is propagated through the call. +- The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. + Example: Add flow through the ``Join`` function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This example shows how the Go query pack models flow through a method for a simple case. From 86c7a492645a2786f15055ebfdee24793e7f229c Mon Sep 17 00:00:00 2001 From: Edward Minnix III Date: Tue, 26 Nov 2024 13:12:16 -0500 Subject: [PATCH 25/25] Apply suggestions from code review Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com> --- .../customizing-library-models-for-go.rst | 36 ++++++++----------- 1 file changed, 14 insertions(+), 22 deletions(-) diff --git a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst index 5a200274f4a4..c5b74ccd73ae 100644 --- a/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst +++ b/docs/codeql/codeql-language-guides/customizing-library-models-for-go.rst @@ -98,7 +98,7 @@ The first five values identify the function (in this case a method) to be modele - The second value ``DB`` is the name of the type that the method is associated with. - The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``Prepare`` is the method name. -- The fifth value ``""`` is the method input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the sink. @@ -139,7 +139,7 @@ The first five values identify the function to be modeled as a source. - The second value ``Request`` is the type name, since the function is a method of the ``Request`` type. - The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``FormValue`` is the function name. -- The fifth value ``""`` is the function input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions or methods may have the same name and they need to be distinguished by the number and types of the arguments. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source. @@ -156,11 +156,9 @@ This pattern covers many of the cases where we need to summarize flow through a .. code-block:: go - import "slices" - func ValueFlow { a := []int{1, 2, 3} - max := slices.Max(a) // There is taint flow from `a` to `max`. + max := slices.Max(a) // There is value flow from the elements of `a` to `max`. ... } @@ -173,25 +171,23 @@ We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, s pack: codeql/go-all extensible: summaryModel data: - - ["slices", "", True, "Max", "", "", "Argument[0].ArrayElement", "ReturnValue", "value", "manual"] + - ["slices", "", False, "Max", "", "", "Argument[0].ArrayElement", "ReturnValue", "value", "manual"] Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. -Each tuple defines flow from one argument to the return value. The first row defines flow from the first argument (``a`` in the example) to the return value (``max`` in the example). The first five values identify the function to be modeled as a summary. -These are the same for both of the rows above as we are adding two summaries for the same method. - The first value ``slices`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. - The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. - The fourth value ``Max`` is the function name. -- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. -- The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument. +- The seventh value is the access path to the input (where data flows from). ``Argument[0].ArrayElement`` is the access path to the array elements of the first argument (the elements of the slice in the example). - The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. - The ninth value ``value`` is the kind of the flow. ``value`` flow indicates an entire value is moved, ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. @@ -204,8 +200,6 @@ This pattern covers many of the cases where we need to summarize flow through a .. code-block:: go - import "slices" - func ValueFlow { a := []int{1, 2, 3} b := []int{4, 5, 6} @@ -222,26 +216,24 @@ We need to add a tuple to the ``summaryModel``\(package, type, subtypes, name, s pack: codeql/go-all extensible: summaryModel data: - - ["slices", "", True, "Concat", "", "", "Argument[0].ArrayElement.ArrayElement", "ReturnValue.ArrayElement", "value", "manual"] + - ["slices", "", False, "Concat", "", "", "Argument[0].ArrayElement.ArrayElement", "ReturnValue.ArrayElement", "value", "manual"] Since we are adding flow through a method, we need to add tuples to the ``summaryModel`` extensible predicate. -Each tuple defines flow from one argument to the return value. -The first row defines flow from the arguments (``a`` and ``b`` in the example) to the return value (``c`` in the example) and the second row defines flow from the second argument (``sep`` in the example) to the return value (``t`` in the example). +The first row defines flow from the arguments (``a`` and ``b`` in the example) to the return value (``c`` in the example). The first five values identify the function to be modeled as a summary. -These are the same for both of the rows above as we are adding two summaries for the same method. - The first value ``slices`` is the package name. - The second value ``""`` is left blank, since the function is not a method of a type. - The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. - The fourth value ``Max`` is the function name. -- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. -- The seventh value is the access path to the input (where data flows from). ``Argument[0]`` is the access path to the first argument. -- The eighth value ``ReturnValue`` is the access path to the output (where data flows to), in this case ``ReturnValue``, which means that the input flows to the return value. +- The seventh value is the access path to the input (where data flows from). ``Argument[0].ArrayElement.ArrayElement`` is the access path to the array elements of the array elements of the first argument. Note that a variadic parameter of type `...T` is treated as if it has type `[]T` and arguments corresponding to the variadic parameter are accessed as elements of this slice. +- The eighth value ``ReturnValue.ArrayElement`` is the access path to the output (where data flows to), in this case ``ReturnValue.ArrayElement``, which means that the input flows to the array elements of the return value. - The ninth value ``value`` is the kind of the flow. ``value`` flow indicates an entire value is moved, ``taint`` means that taint is propagated through the call. - The tenth value ``manual`` is the provenance of the summary, which is used to identify the origin of the summary. @@ -282,7 +274,7 @@ These are the same for both of the rows above as we are adding two summaries for - The second value ``""`` is left blank, since the function is not a method of a type. - The third value ``False`` is a flag that indicates whether or not the sink also applies to subtypes. This has no effect for non-method functions. - The fourth value ``Join`` is the function name. -- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. @@ -337,7 +329,7 @@ The first five values identify the function (in this case a method) to be modele - The second value ``URL`` is the receiver type. - The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. This includes when the subtype embeds the given type, so that the method or field is promoted to be a method or field of the subtype. For interface methods it also includes types which implement the interface type. - The fourth value ``Hostname`` is the method name. -- The fifth value ``""`` is left blank, since specifying the signature is optional and Go does not allow multiple signature overloads for the same function. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the summary. @@ -376,7 +368,7 @@ The first five values identify the field to be modeled as a source. - The second value ``Request`` is the name of the type that the field is associated with. - The third value ``True`` is a flag that indicates whether or not the sink also applies to subtypes. For fields this means when the field is accessed as a promoted field in another type. - The fourth value ``Body`` is the field name. -- The fifth value ``""`` is blank since it is a field access and field accesses do not have method signatures in Go. +- The fifth value ``""`` is the input type signature. For Go it should always be an empty string. It is needed for other languages where multiple functions may have the same name and they need to be distinguished by the number and types of the arguments. The sixth value should be left empty and is out of scope for this documentation. The remaining values are used to define the ``access path``, the ``kind``, and the ``provenance`` (origin) of the source.