Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[next major] remove **extra from Creator.config_metadata #205

Open
rgaudin opened this issue Oct 19, 2024 · 2 comments · May be fixed by #221
Open

[next major] remove **extra from Creator.config_metadata #205

rgaudin opened this issue Oct 19, 2024 · 2 comments · May be fixed by #221
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@rgaudin
Copy link
Member

rgaudin commented Oct 19, 2024

In kiwix/operations#286 we had two misspelled yet undetected metadata: tags and scraper.

I think accepting extra metadata in this method defeats the purpose of having them all exposed. I also think it's use is marginal and that additional metadata can still be added by other means.

@benoit74 Can we get rid of this?

@rgaudin rgaudin added enhancement New feature or request good first issue Good for newcomers labels Oct 19, 2024
@benoit74 benoit74 changed the title [next major] remove **extra from Creator.config_indexing [next major] remove **extra from Creator.config_metadata Oct 21, 2024
@benoit74
Copy link
Collaborator

I would consider to split it in two: config_std_metadata (to be used by default) and config_extra_metadata (for those scraper like warc2zim who want to add custom metadata). This seems important to me so that both method can still benefit from same logic (currently we remove control characters for instance, but we might add more logic in the future). I recommend to even force config_extra_metadata to force the X- prefix we used in warc2zim for X-ContentDate, so that we limit even further the risks of strange metadata. WDYT?

And obviously we need to keep config_indexing till the next major.

@rgaudin
Copy link
Member Author

rgaudin commented Oct 21, 2024

I recommend to even force config_extra_metadata to force the X- prefix we used in warc2zim for X-ContentDate, so that we limit even further the risks of strange metadata. WDYT?

Works for me, as long as there's still the possibility to add non-prefixed metadata (via add_metadata()).

@benoit74 benoit74 added this to the 5.0.0 milestone Nov 14, 2024
@benoit74 benoit74 self-assigned this Nov 14, 2024
@benoit74 benoit74 removed the good first issue Good for newcomers label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants