Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured Metadata Identity #471

Closed
wants to merge 8 commits into from

Conversation

orbisvicis
Copy link

Backport of python/cpython#108585. The first commit parameterizes the test example with parameterized rather than hypothesis, and omits the Hypothesis strategy. The second commit switches back to Hypothesis and includes the identity strategy. With this commit both PRs are equivalent.

Add `PackageMetadata.authors` and `PackageMetadata.maintainers` to the
`importlib.metadata` module. These unify and provide minimal parsing for
the respective core metadata fields ("Author", "Author-email"), and
("Maintainer", "Maintainer-email").
A hypothesis strategy for generating structured core metadata and
equivalent unstructured text. Ensures that parsing the text using
PackageMetadata results in the same structure - a roundtrip test.
@orbisvicis orbisvicis changed the title Structured.identity Structured Metadata Identity Sep 19, 2023
* Document `PackageMetadata.authors` and `PackageMetadata.maintainers`
  in the usage guide.
* Export `Ident` so it can be documented but also used to build custom
  parsing strategies.
Run the `vermin` tool to identify and fix code incompatible with Python
>= 3.8.
Properties `PackageMetadata.authors` and `PackageMetadata.maintainers`
are now lists with repeated elements removed, rather than sets.
@jaraco
Copy link
Member

jaraco commented Apr 19, 2024

I've been learning more about how metadata works, and I now have a chance to review this with a better understanding.

In jaraco/jaraco.packaging#17, I learned about how the switch from setup.py/setup.cfg style in Setuptools projects to pyproject.toml causes the metadata to be rendered differently. On setup.cfg, the "author" and "email" are solicited separately and stored in the metadata separately. Consider the author fields in cheroot 10.0.0, for example.

These fields produce the following metadata:

 ~ @ pip-run cheroot==10 git+https://github.com/python/importlib_metadata@refs/pull/471/head -- -q
>>> import importlib_metadata as im
>>> md = im.metadata('cheroot')
>>> md['Author']
'CherryPy Team'
>>> md['author-email']
'[email protected]'

And unfortunately, this patch renders that declaration as two separate individuals:

>>> md.authors
[Ident(name='CherryPy Team', email=None), Ident(name=None, email='[email protected]')]

Similarly, have a look at pytest-ignore-flaky, which declares two people.

 ~ @ pip-run pytest-ignore-flaky==2.2 git+https://github.com/python/importlib_metadata@refs/pull/471/head -- -q
  WARNING: Did not find branch or tag 'refs/pull/471/head', assuming revision or ref.
>>> import importlib_metadata as im
>>> im.metadata('pytest-ignore-flaky').authors
[Ident(name='Eduardo Naufel Schettino', email=None), Ident(name='Marcos Alfredo Camargo Leal Pinto', email=None), Ident(name=None, email='[email protected]'), Ident(name=None, email='[email protected]')]

When I migrate these projects to pyproject.toml, they get the new format, with the name and email being stored in the *-Email field, but until then, .authors and .maintainers is not producing what I'd expect (matching up names with emails and combining them).

@jaraco
Copy link
Member

jaraco commented Apr 19, 2024

Another issue I encountered was that my name, presumably because it contains a period, gets quoted:

 ~ @ pip-run git+https://github.com/jaraco/calendra git+https://github.com/python/importlib_metadata@refs/pull/471/head -- -q
  WARNING: Did not find branch or tag 'refs/pull/471/head', assuming revision or ref.
>>> import importlib_metadata as im
>>> im.metadata('calendra').authors
[Ident(name='"Jason R. Coombs"', email='[email protected]')]

Notice the excess quoting of the name. The solution here should remove those quotes as they're not part of the declared metadata.

@jaraco
Copy link
Member

jaraco commented Apr 19, 2024

If we can address the above two concerns, I'd like to get this merged and available for use.

@jaraco
Copy link
Member

jaraco commented Jun 23, 2024

@orbisvicis Do you have plans to address the aforementioned issues?

@jaraco
Copy link
Member

jaraco commented Aug 20, 2024

In the jaraco.packaging.metadata module, I've implemented a quick and dirty routine to extract authors and emails from metadata, and that's what I've been using in my projects. I'd rather prefer we have something robust and tested in importlib metadata itself, but the current approach isn't acceptable. I'm going to close this for now, but I welcome a revival of the effort in the future. Just say the word and we can re-open this pull request, or feel free to file a new one, addressing the aforementioned concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants