You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
libzim provides a way for scrappers to provide a different content than the one stored for the indexation.
It allow a better indexation when a lot of content is not relevant about the subject of the content itself.
mwoffliner should parse the html content and extract only the relevant information (so remove thing such has menu, footer, user information, links to other questions...)
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.
Improvement would be marginal I think because we don't include much non-content text in the HTML.
A side effect would be parsing all our output using an in-scraper HTML parser versus letting libzim do it.
libzim provides a way for scrappers to provide a different content than the one stored for the indexation.
It allow a better indexation when a lot of content is not relevant about the subject of the content itself.
mwoffliner should parse the html content and extract only the relevant information (so remove thing such has menu, footer, user information, links to other questions...)
See comments in openzim/libzim#653
The text was updated successfully, but these errors were encountered: