From 17c9c76314fa487c8a9b3082779fabe6c554ea03 Mon Sep 17 00:00:00 2001 From: Bas Zalmstra Date: Thu, 23 Nov 2023 14:57:52 +0100 Subject: [PATCH 1/2] purls CEP --- cep-purls.md | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 cep-purls.md diff --git a/cep-purls.md b/cep-purls.md new file mode 100644 index 00000000..16df2c17 --- /dev/null +++ b/cep-purls.md @@ -0,0 +1,100 @@ + + + + + + + + +
Title Add package-urls to PackageRecord
Status Draft
Author(s) Bas Zalmstra <bas@prefix.dev>
Created Nov 23, 2023
Updated Nov 23, 2023
Discussion NA
Implementation NA
+ +## Abstract + +This CEP describes a change to the `PackageRecord` format and the corresponding `repodata.json` file to include `purls` (Package Urls) of repackaged packages to identify packages across multiple ecosystems. + +## Specification + +We propose to add the optional `purls: [string]` field to `PackageRecord`. +To identify the repackaged package we use [PURL](https://github.com/package-url/purl-spec/) (Package URL), which implements a scheme for identifying packages that is meant to be portable across packaging ecosystems. + +```json +{ + ... + "pinject-0.14.1-pyhd8ed1ab_0.tar.bz2": { + "name": "pinject", + "version": "0.14.1", + "purls": ["pkg:pypi/pinject@0.14.1"], + ... + } + ... +} +``` + +PURL is already supported by dependency-related tooling like SPDX (see [External Repository Identifiers in the SPDX 2.3 spec](https://spdx.github.io/spdx-spec/v2.3/external-repository-identifiers/#f35-purl)), the [Open Source Vulnerability format](https://ossf.github.io/osv-schema/#affectedpackage-field), and the [Sonatype OSS Index](https://ossindex.sonatype.org/doc/coordinates); not having to wait years before support in such tooling arrives is valuable. + +## Motivation + +Conda packages can repackage packages from other ecosystems. +Conda-forge and other channels famously repackages a lot of pypi packages. +However, without actually downloading the conda package and inspecting its contents there is no reliable way to know whether a certain conda package is a repackaged package. + +Pixi and conda-lock are both tools that try to combine the conda and pypi package ecosystem but this is hard to do because conda package names and pypi package names do not necesarily match up. + +Its hard to use open-source vulnerability databases because they often do not contain conda packages. +Using PURL allows us to link vulnerabilities from other ecosystems to conda package. + +## Rationale + +Adding the information to the `repodata.json` file has some advantages: + +* We can keep this information close to the conda package description. +* We can incrementally add `purls` through repodata patches. + +The downside is that the (already large) repodata.json file will grow. + +The `purls` field is an array because: + +* A package might exist in multiple ecosystems +* A single conda package might repackage multiple other packages. + +## Alternatives + +Some work has been done to try and map conda package names to pypi package names through the grayskull mapping: + +https://raw.githubusercontent.com/regro/cf-graph-countyfair/master/mappings/pypi/grayskull_pypi_mapping.yaml + +This file is generated automatically from the recipes in conda-forge feedstocks. + +However, this approach has some serious drawbacks: + +* It only works for packages from conda-forge. +* Its a heuristic based on source urls. +* The implementation is based on the recipes instead of the actual package files. +* The implementation does not work with multi-output recipes. +* Its maintained as a seperate file that is hard to discover + +## Backwards Compatibility + +Since the `purls` field is an addition (and optional) there should be no breaking changes. + + + +## Copyright + +All CEPs are explicitly [CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/). From f9d65ff7ca40ca13273cf355c44bcd4fc22ea0e7 Mon Sep 17 00:00:00 2001 From: Wolf Vollprecht Date: Fri, 24 Nov 2023 09:01:57 +0100 Subject: [PATCH 2/2] tiny improvements to spelling and sentences --- cep-purls.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/cep-purls.md b/cep-purls.md index 16df2c17..007cff65 100644 --- a/cep-purls.md +++ b/cep-purls.md @@ -34,14 +34,14 @@ PURL is already supported by dependency-related tooling like SPDX (see [External ## Motivation -Conda packages can repackage packages from other ecosystems. -Conda-forge and other channels famously repackages a lot of pypi packages. +Conda packages can repackage packages from other ecosystems. +Conda-forge and other channels famously repackages a lot of PyPI packages. However, without actually downloading the conda package and inspecting its contents there is no reliable way to know whether a certain conda package is a repackaged package. -Pixi and conda-lock are both tools that try to combine the conda and pypi package ecosystem but this is hard to do because conda package names and pypi package names do not necesarily match up. +Pixi and conda-lock are both tools that try to combine the conda and PyPI package ecosystem but this is hard to do because conda package names and PyPI package names do not necessarily match up. Its hard to use open-source vulnerability databases because they often do not contain conda packages. -Using PURL allows us to link vulnerabilities from other ecosystems to conda package. +Using the PURL standard allows us to link vulnerabilities from other ecosystems to conda package. ## Rationale @@ -59,7 +59,7 @@ The `purls` field is an array because: ## Alternatives -Some work has been done to try and map conda package names to pypi package names through the grayskull mapping: +Some work has been done to try and map conda package names to PyPI package names through the grayskull mapping: https://raw.githubusercontent.com/regro/cf-graph-countyfair/master/mappings/pypi/grayskull_pypi_mapping.yaml @@ -71,7 +71,7 @@ However, this approach has some serious drawbacks: * Its a heuristic based on source urls. * The implementation is based on the recipes instead of the actual package files. * The implementation does not work with multi-output recipes. -* Its maintained as a seperate file that is hard to discover +* Its maintained as a separate file that is hard to discover ## Backwards Compatibility