Skip to content

Commit

Permalink
refactor: accept provenance data in artifact pipeline check (#872)
Browse files Browse the repository at this point in the history
This PR renames `mcn_infer_artifact_pipeline_1` to `mcn_find_artifact_pipeline_1`. This check can support all the package registries now. When a verifiable provenance is found for an artifact, we use it to obtain the pipeline trigger. Otherwise, we use heuristics to find the triggering pipeline.

Signed-off-by: behnazh-w <[email protected]>
  • Loading branch information
behnazh-w authored Nov 18, 2024
1 parent b65f0db commit 4235041
Show file tree
Hide file tree
Showing 88 changed files with 1,780 additions and 314 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ the requirements that are currently supported by Macaron.
* - ``mcn_build_as_code_1``
- **Build as code** - If a trusted builder is not present, this requirement determines that the build definition and configuration executed by the build service is verifiably derived from text file definitions stored in a version control system.
- Identify and validate the CI service(s) used to build and deploy/publish an artifact.
* - ``mcn_infer_artifact_pipeline_1``
* - ``mcn_find_artifact_pipeline_1``
- **Infer artifact publish pipeline** - When a provenance is not available, checks whether a CI workflow run has automatically published the artifact.
- Identify a workflow run that has triggered the deploy step determined by the ``Build as code`` check.
* - ``mcn_provenance_level_three_1``
Expand Down
48 changes: 24 additions & 24 deletions docs/source/pages/tutorials/detect_malicious_java_dep.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
.. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved.
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
.. _detect-malicious-java-dep:
.. _detect-manual-upload-java-dep:

------------------------------------------------------------------------
Detecting a malicious Java dependency uploaded manually to Maven Central
------------------------------------------------------------------------
--------------------------------------------------------------
Detecting Java dependencies manually uploaded to Maven Central
--------------------------------------------------------------

In this tutorial we show how Macaron can determine whether the dependencies of a Java project are built
and published via transparent CI workflows or manually uploaded to Maven Central. You can also
Expand All @@ -24,12 +24,12 @@ dependencies:

* - Artifact name
- `Package URL (PURL) <https://github.com/package-url/purl-spec>`_
* - `guava <https://central.sonatype.com/artifact/com.google.guava/guava>`_
- ``pkg:maven/com.google.guava/[email protected]?type=jar``
* - `log4j-core <https://central.sonatype.com/artifact/org.apache.logging.log4j/log4j-core>`_
- ``pkg:maven/org.apache.logging.log4j/[email protected]?type=jar``
* - `jackson-databind <https://central.sonatype.com/artifact/io.github.behnazh-w.demo/jackson-databind>`_
- ``pkg:maven/io.github.behnazh-w.demo/[email protected]?type=jar``

While the ``guava`` dependency follows best practices to publish artifacts automatically with minimal human
While the ``log4j-core`` dependency follows best practices to publish artifacts automatically with minimal human
intervention, ``jackson-databind`` is a malicious dependency that pretends to provide data-binding functionalities
like `the official jackson-databind <https://github.com/FasterXML/jackson-databind>`_ library (note that
this artifact is created for demonstration purposes and is not actually malicious).
Expand Down Expand Up @@ -70,7 +70,7 @@ First, we need to run the ``analyze`` command of Macaron to run a number of :ref

.. code-block:: shell
./run_macaron.sh analyze -purl pkg:maven/io.github.behnazh-w.demo/example-maven-app@1.0?type=jar -rp https://github.com/behnazh-w/example-maven-app
./run_macaron.sh analyze -purl pkg:maven/io.github.behnazh-w.demo/example-maven-app@2.0?type=jar -rp https://github.com/behnazh-w/example-maven-app --deps-depth=1
.. note:: By default, Macaron clones the repositories and creates output files under the ``output`` directory. To understand the structure of this directory please see :ref:`Output Files Guide <output_files_guide>`.

Expand All @@ -96,7 +96,7 @@ As you can see, some of the checks are passing and some are failing. In summary,
* is not producing any :term:`SLSA` or :term:`Witness` provenances (``mcn_provenance_available_1``)
* is using GitHub Actions to build and test using ``mvnw`` (``mcn_build_service_1``)
* but it is not deploying any artifacts automatically (``mcn_build_as_code_1``)
* and no CI workflow runs are detected that automatically publish artifacts (``mcn_infer_artifact_pipeline_1``)
* and no CI workflow runs are detected that automatically publish artifacts (``mcn_find_artifact_pipeline_1``)

As you scroll down in the HTML report, you will see a section for the dependencies that were automatically identified:

Expand All @@ -110,25 +110,25 @@ As you scroll down in the HTML report, you will see a section for the dependenci
| Macaron has found the two dependencies as expected:
* ``io.github.behnazh-w.demo:jackson-databind:1.0``
* ``com.google.guava:guava:32.1.2-jre``
* ``org.apache.logging.log4j:log4j-core:3.0.0-beta2``

When we open the reports for each dependency, we see that ``mcn_infer_artifact_pipeline_1`` is passed for ``com.google.guava:guava:32.1.2-jre``
and a GitHub Actions workflow run is found for publishing version ``32.1.2-jre``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``.
When we open the reports for each dependency, we see that ``mcn_find_artifact_pipeline_1`` is passed for ``org.apache.logging.log4j:log4j-core:3.0.0-beta2``
and a GitHub Actions workflow run is found for publishing version ``3.0.0-beta2``. However, this check is failing for ``io.github.behnazh-w.demo:jackson-databind:1.0``.
This means that ``io.github.behnazh-w.demo:jackson-databind:1.0`` could have been built and published manually to Maven Central
and could potentially be malicious.

.. _fig_infer_artifact_pipeline_guava:
.. _fig_find_artifact_pipeline_log4j:

.. figure:: ../../_static/images/tutorial_guava_infer_pipeline.png
:alt: mcn_infer_artifact_pipeline_1 for com.google.guava:guava:32.1.2-jre
.. figure:: ../../_static/images/tutorial_log4j_find_pipeline.png
:alt: mcn_find_artifact_pipeline_1 for org.apache.logging.log4j:log4j-core:3.0.0-beta2
:align: center

``com.google.guava:guava:32.1.2-jre``
``org.apache.logging.log4j:log4j-core:3.0.0-beta2``

.. _fig_infer_artifact_pipeline_bh_jackson_databind:

.. figure:: ../../_static/images/tutorial_bh_jackson_databind_infer_pipeline.png
:alt: mcn_infer_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0
:alt: mcn_find_artifact_pipeline_1 for io.github.behnazh-w.demo:jackson-databind:1.0
:align: center

``io.github.behnazh-w.demo:jackson-databind:1.0``
Expand All @@ -154,7 +154,7 @@ The security requirement in this tutorial is to mandate dependencies of our proj
transparent artifact publish CI workflows. To write a policy for this requirement, first we need to
revisit the checks shown in the HTML report in the previous :ref:`step <fig_example-maven-app>`.
The result of each of the checks can be queried by the check ID in the first column. For the policy in this tutorial,
we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks:
we are interested in the ``mcn_find_artifact_pipeline_1`` and ``mcn_provenance_level_three_1`` checks:

.. code-block:: prolog
Expand All @@ -167,7 +167,7 @@ we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_
.decl violating_dependencies(parent: number)
violating_dependencies(parent) :-
transitive_dependency(parent, dependency),
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
!check_passed(dependency, "mcn_find_artifact_pipeline_1"),
!check_passed(dependency, "mcn_provenance_level_three_1").
apply_policy_to("detect-malicious-upload", component_id) :-
Expand All @@ -176,8 +176,8 @@ we are interested in the ``mcn_infer_artifact_pipeline_1`` and ``mcn_provenance_
This policy requires that all the dependencies
of repository ``github.com/behnazh-w/example-maven-app`` either pass the ``mcn_provenance_level_three_1`` (have non-forgeable
:term:`SLSA` provenances) or ``mcn_infer_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced
by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_infer_artifact_pipeline_1`` needs to pass
:term:`SLSA` provenances) or ``mcn_find_artifact_pipeline_1`` check. Note that if an artifact already has a non-forgeable provenance, it means it is produced
by a hosted build platform, such as GitHub Actions CI workflows. So, the ``mcn_find_artifact_pipeline_1`` needs to pass
only if ``mcn_provenance_level_three_1`` fails.

Let's take a closer look at this policy to understand what each line means.
Expand Down Expand Up @@ -219,12 +219,12 @@ This rule populates the ``Policy`` relation if ``component_id`` exists in the da
.decl violating_dependencies(parent: number)
violating_dependencies(parent) :-
transitive_dependency(parent, dependency),
!check_passed(dependency, "mcn_infer_artifact_pipeline_1"),
!check_passed(dependency, "mcn_find_artifact_pipeline_1"),
!check_passed(dependency, "mcn_provenance_level_three_1").
This is the rule that the user needs to design to detect dependencies that violate a security requirement.
Here we declare a relation called ``violating_dependencies`` and populate it if the dependencies in the
``transitive_dependency`` relation do not pass any of the ``mcn_infer_artifact_pipeline_1`` and
``transitive_dependency`` relation do not pass any of the ``mcn_find_artifact_pipeline_1`` and
``mcn_provenance_level_three_1`` checks.

.. code-block:: prolog
Expand Down Expand Up @@ -253,7 +253,7 @@ printed to the console will look like the following:
failed_policies
['detect-malicious-upload']
component_violates_policy
['1', 'pkg:github.com/behnazh-w/example-maven-app@34c06e8ae3811885c57f8bd42db61f37ac57eb6c', 'detect-malicious-upload']
['1', 'pkg:maven/io.github.behnazh-w.demo/example-maven-app@2.0?type=jar', 'detect-malicious-upload']
As you can see, the policy has failed because the ``io.github.behnazh-w.demo:jackson-databind:1.0``
dependency is manually uploaded to Maven Central and does not meet the security requirement.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/pages/tutorials/exclude_include_checks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ This tutorial will show how you can configure Macaron to:
Prerequisites
-------------

* You are expected to have gone through :ref:`this tutorial <detect-malicious-java-dep>`.
* You are expected to have gone through :ref:`this tutorial <detect-manual-upload-java-dep>`.
* This tutorial requires a high-level understanding of checks in Macaron and how they depend on each other. Please see this :ref:`page <macaron-developer-guide>` for more information.

------------------
Expand Down
18 changes: 15 additions & 3 deletions src/macaron/config/defaults.ini
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,12 @@ wrapper_files =
mvnw

[builder.maven.ci.build]
github_actions = actions/setup-java
github_actions =
actions/setup-java
# Parent project used in Maven-based projects of the Apache Logging Services.
apache/logging-parent/.github/workflows/build-reusable.yaml
# This action can be used to deploy artifacts to a JFrog artifactory server.
spring-io/artifactory-deploy-action
travis_ci = jdk
circle_ci =
gitlab_ci =
Expand All @@ -159,6 +164,8 @@ jenkins =

[builder.maven.ci.deploy]
github_actions =
# Parent project used in Maven-based projects of the Apache Logging Services.
apache/logging-parent/.github/workflows/deploy-release-reusable.yaml
travis_ci =
gpg:sign-and-deploy-file
deploy:deploy
Expand Down Expand Up @@ -237,6 +244,8 @@ jenkins =

[builder.gradle.ci.deploy]
github_actions =
# This action can be used to deploy artifacts to a JFrog artifactory server.
spring-io/artifactory-deploy-action
travis_ci =
artifactoryPublish
./gradlew publish
Expand Down Expand Up @@ -495,7 +504,7 @@ artifact_extensions =
# Package registries.
[package_registry]
# The allowed time range (in seconds) from a deploy workflow run start time to publish time.
publish_time_range = 3600
publish_time_range = 7200

# [package_registry.jfrog.maven]
# In this example, the Maven repo can be accessed at `https://internal.registry.org/repo-name`.
Expand All @@ -505,9 +514,12 @@ publish_time_range = 3600

[package_registry.maven_central]
# Maven Central host name.
hostname = search.maven.org
search_netloc = search.maven.org
search_scheme = https
# The search REST API. See https://central.sonatype.org/search/rest-api-guide/
search_endpoint = solrsearch/select
registry_url_netloc = repo1.maven.org/maven2
registry_url_scheme = https
request_timeout = 20

[package_registry.npm]
Expand Down
29 changes: 14 additions & 15 deletions src/macaron/json_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,27 @@ def json_extract(entry: dict | list, keys: Sequence[str | int], type_: type[T])
T | None:
The found value as the type of the type parameter.
"""
target: JsonType = entry
for key in keys:
if isinstance(target, dict) and isinstance(key, str):
if key not in target:
logger.debug("JSON key '%s' not found in dict target.", key)
if isinstance(entry, dict) and isinstance(key, str):
if key not in entry:
logger.debug("JSON key '%s' not found in dict entry.", key)
return None
elif isinstance(target, list) and isinstance(key, int):
if key < 0 or key >= len(target):
logger.debug("JSON list index '%s' is outside of list bounds %s.", key, len(target))
elif isinstance(entry, list) and isinstance(key, int):
if key < 0 or key >= len(entry):
logger.debug("JSON list index '%s' is outside of list bounds %s.", key, len(entry))
return None
else:
logger.debug("Cannot index '%s' (type: %s) in target (type: %s).", key, type(key), type(target))
logger.debug("Cannot index '%s' (type: %s) in entry (type: %s).", key, type(key), type(entry))
return None

# If statement required for mypy to not complain. The else case can never happen because of the above if block.
if isinstance(target, dict) and isinstance(key, str):
target = target[key]
elif isinstance(target, list) and isinstance(key, int):
target = target[key]
if isinstance(entry, dict) and isinstance(key, str):
entry = entry[key]
elif isinstance(entry, list) and isinstance(key, int):
entry = entry[key]

if isinstance(target, type_):
return target
if isinstance(entry, type_):
return entry

logger.debug("Found value of incorrect type: %s instead of %s.", type(target), type(type_))
logger.debug("Found value of incorrect type: %s instead of %s.", type(entry), type(type_))
return None
Loading

0 comments on commit 4235041

Please sign in to comment.