Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove « .html » extension #166

Open
kelson42 opened this issue Jan 25, 2023 · 8 comments
Open

Remove « .html » extension #166

kelson42 opened this issue Jan 25, 2023 · 8 comments
Assignees
Milestone

Comments

@kelson42
Copy link
Contributor

These « .html » extensions, for example here https://library.kiwix.org/content/gutenberg_fr_all/A/Les%20Fleurs%20du%20Mal_cover.6099.html, were necessary at the time we were using zimwriterfs. Zimwriterfs neede this to identify HTML content which shoukd be indexed. This is not necessary anymore. There it should be simplified and removed for cleaner URLs and smaller ZIM size.

@benoit74
Copy link
Collaborator

It is not that simple, or I miss something, this extension is necessary to make a distinction between the various file formats in the archive.

For instance for book ID 18812 we have these three files now:

Douze ans de séjour dans la Haute-Éthiopie.18812.epub
Douze ans de séjour dans la Haute-Éthiopie.18812.html
Douze ans de séjour dans la Haute-Éthiopie_cover.18812.html

@kelson42
Copy link
Contributor Author

@benoit74 Should not create a conflict to remove « html » for books in html. This topic will anyway disappear IMO if we implement #95.

@benoit74
Copy link
Collaborator

Ok, I didn't got this, all files would have an extension except for the HTML version.
Makes sense to me.

@rgaudin
Copy link
Member

rgaudin commented Jan 26, 2023

This topic will anyway disappear IMO if we implement #95.

No, we'd still need the cover page so it won't be affected.

@benoit74 beside the chrome urls (Home.html), the most important one is the cover and yes the HTML format version when it's included.

To avoid conflicts yet keep decent-looking URLs I'd propose the following:

/18812/Douze ans de séjour dans la Haute-Éthiopie  # Cover page
/18812/Douze ans de séjour dans la Haute-Éthiopie.epub
/18812/Douze ans de séjour dans la Haute-Éthiopie.pdf
/18812/Douze ans de séjour dans la Haute-Éthiopie.html

I am fine with the HTML format being named .html because it's a formatted book, is a single file that can be saved as well ; and I like consistency.

@kelson42 if you don't like it, please suggest another pattern ; keeping in mind:

  • extensions are very important for files that can be saved to disk/phone.
  • We need the book ID somewhere because there can be duplicates in titles

@kelson42
Copy link
Contributor Author

@rgaudin Agree with your proposal.

@eshellman
Copy link
Collaborator

If it helps, I have code that will make a safe title based github-safe filename slug for any book in PG.

@prathamkumarjha
Copy link

hiii may i help by removing the « .html » extensions

@rgaudin
Copy link
Member

rgaudin commented Apr 22, 2023

@prathamkumarjha ; yes, you can submit a PR , as per my comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants