Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC fetcher can't extract RFC title from some RFCs #43

Open
jschauma opened this issue May 11, 2023 · 3 comments
Open

RFC fetcher can't extract RFC title from some RFCs #43

jschauma opened this issue May 11, 2023 · 3 comments

Comments

@jschauma
Copy link

jschauma commented May 11, 2023

It looks like some RFCs use different markup from others, leading to the RFC fetcher to fail to extract the title.

For example, https://datatracker.ietf.org/doc/html/rfc3986 uses

<span class="h1">Uniform Resource Identifier (URI): Generic Syntax</span>  

and is successfully matched by

title = xml.xpath('string(//span[@class="h1"])')

but e.g., https://datatracker.ietf.org/doc/html/rfc9110 uses

<h1 id="title">HTTP Semantics</h1>

and is not matched.

It's possible that adding

if not title:
    title = xml.xpath('string(//h1[@id="title"])')

in https://github.com/sipb/chiron/blob/master/chiron_bot/fetchers.py fetch_rfc might help, but I didn't test this.

@richsalz
Copy link

If you use https://www.rfc-editor.org/info/rfcXXX as the URL, that has consistent formatting for all RFC metadata.

@richsalz
Copy link

(And the IETF considers rfc-editor.org the canonical site for published RFCs)

@jschauma
Copy link
Author

Ah, yes, I thought we had talked about that before. :-)

#42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants