-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPCC Glossary: ToC creation when using multiple HTML files #1236
Comments
Apologies I'm not giving enough context to the project, and secondly, I need to break down my ToC questions - a simple pointer to your support docs will give me all the answers I'm sure. The publishing project is by a volunteer group who have the goal of making a semantic index of all IPCC Reports. A first level project is to semantify the IPCC Glossary. We have met with IPCC and other UN agencies are they receptive to this being done. Part of the project would be to create outputs from the semantic source - one of these being a Hyperbook with user enhancements . Wikipedia/data entries etc. See IPCC Source https://apps.ipcc.ch/glossary/ and an example Vivliostyle output - https://vivliostyle.vercel.app/#src=https://raw.githubusercontent.com/semanticClimate/glossary-demo/main/html/index.html |
My questions about generating ToCs and using multiple HTML files. Taking into account we think we want to use Vivliostyle.js and Vivliostyle CLI. We want to use CLI for PDF Bookmarks, PoD preparation, and other CLI features.
|
Re: Questions 1. Its seems from your documentation that a 'Web publication manifest' seems like the best route. Any recommendation to use W3C or Readium version, Readium seems seems more convenient due to its documentation and examples - but happy to use either - https://docs.vivliostyle.org/#/vivliostyle-viewer#web-publications-multi-html-documents |
I had a very basic go at using a Manifest example, just to get things going: W3C Publication Manifest https://semanticclimate.github.io/glossary-sandbox/ipccglossary.jsonld Render Tomorrow I'll work on building up a W3C Manifest properly. Would be nice if you have a pointer to a good example of a W3C Manifest example thats good for copying and building on. |
Yes, you can use Publication Manifest to organize multiple HTML documents into one publication. (we use W3C standards unless there is a particular reason not to) Vivliostyle.js recognizes ToC that is specified in the publication manifest. See the following sections in Publication Manifest: A simple example of publication manifest that includes a ToC resource is below: {
"@context": [
"https://schema.org",
"https://www.w3.org/ns/pub-context"
],
"conformsTo": "https://www.w3.org/TR/pub-manifest/",
"type": "Book",
"name": "IPCC Glossary",
"author": "IPCC",
"inLanguage": "en",
"readingOrder": [
{
"url": "index.html",
"rel": "contents"
},
"glossary.html",
"acronyms.html"
]
} In this example, "index.html" is the ToC file. The table of contents in the ToC file is displayed in the ToC panel of Vivliostyle Viewer. Note that when ToC resource (the item with "rel": "contents") is not found, Vivliostyle.js use the first item of "readingOrder" as ToC resource if ToC-like elements (e.g., "readingOrder": [
"glossary.html",
"acronyms.html"
] is treated as if "rel": "contents" is specified in the "glossary.html" item, and the table of contents of glossary is displayed in the Vivliostyle Viewer's ToC panel. However, it would be better to specify "rel": "contents" explicitly when you use Publication Manifest. You can also just use the ToC file without publication manifest (this idea is from http://glazman.org/e0/webbook.html). See the Vivliostyle Viewer document: https://docs.vivliostyle.org/#/vivliostyle-viewer#table-of-contents-in-html
There are a few advantages of using publication manifest:
About ToC generationThere is a simple ToC auto-generation option in Vivliostyle CLI. See the Vivliostyle CLI document: However this feature is very limited: it generates only one ToC link item per one HTML document. There have been a feature request to extend it to include every (or selective) heading in HTML documents. |
Thank you so much for your assistance here - wonderful. Apologies for my slow reply, but I got ill last week, and now only back to 'full power' as well as catching up on my 'day job' work :-) Your answers about the ToC functions and using Vivlio CLI here are exactly what I needed right now - semanticClimate volunteer colleagues want to prepare a working publication for delgates to use at next weeks COP meeting https://unfccc.int/ UNFCCC produce the legal agreements behind COP - they have 200 such docs only as PDF. We convert to Scholarly HTML, then semantically stucture. While colleague continue to structure the HTML my I can create a publication containing all the content using a manigest and Vivlio CLI by the looks of it. I'll keep you posted. And again thanks you - well give Vivlio a big credit :-) |
BTW I got the Manifest working on the IPCC Glossary in avery basic way, will improve https://vivliostyle.vercel.app/#src=https://raw.githubusercontent.com/semanticClimate/glossary-demo/main/ipccglossary.jsonld And now I'll start on the COP docs https://github.com/semanticClimate/unfccc |
I wanted to ask about using CSS styles when I have lots of HTML files to bring together in a publication, at present its 26, but it may rise to 200. Currently Ive used the CSS override in Vivlio, which works (excuse the style the HTML and CSS is all mixed up at moment).
Thanks |
No, the CSS stylesheets need to be specified in each HTML document.
Vivliostyle.js uses CSS stylesheets specified in HTML documents, and does not use the CSS resources in the publication manifest. The CSS resources in the publication manifest are meaningless for Vivliostyle.js. |
Thank you @MurakamiShinyu really appreciated. Things are moving along now well with the manifest use. I've been wanting to move onto using the manifest approach for a really long time, so happy to be able to use it at last - there's no going back now :-) For the moment I'll append the Vivlio viewer with CSS as we are automaticallly generating the HTML files from a PDF extraction pipeline - I could have the CSS automatically linked here, but I'll do that later once were out of this development round. Eventually there will be about 200 HTML files linked into the publication, the higher level ones in the ToC via the manifest, and the others rendered on the page in a main ToC and then in section sub-ToCs - we'll of course generate these ToC and nav files automaticall from here: https://github.com/petermr/pyamihtml/tree/main/test/resources/unfccc/unfcccdocuments1 |
HI @MurakamiShinyu - we've been progressing well with the project. I had a question about ToCs generated from the Publication Manifest and using Vivliostyle. I seem to be getting a problem of my main toc rendering at the end of a publication when I don't want it to be there. I wondered if you could help solve the problem? Here is the sample publication.json rendered in Vivliostyle Canary. This is the directory in the repository where the publication is created https://github.com/semanticClimate/cma3-test/tree/main/CMA_3 I have looked at Vivlio's multi-file examples, and W3C docs, Vivlio docs - but I cant see a solution. Thanks Simon |
Your publication.json has "toc_ses_dec_res.html" in the "readingOrder" and "toc_toplevel_sum_ses_dec_res.html" in the "resources": "readingOrder": [
"front_cover.html",
"imprint.html",
"toc_ses_dec_res.html",
"LEAD/split.html",
"Decision_1_CMA_3/split.html",
"Decision_2_CMA_3/split.html",
"Decision_3_CMA_3/split.html",
"Decision_4_CMA_3/split.html",
"back_cover.html"
],
"resources": [
{
"type": "LinkedResource",
"url": "toc_toplevel_sum_ses_dec_res.html",
"rel": "contents"
}, Unfortunately, Vivliostyle has a limitation that it cannot hide HTML documents listed in the "resources" in the output. If you use "toc_ses_dec_res.html" in the "readingOrder" for "contents", you can avoid this problem: "readingOrder": [
"front_cover.html",
"imprint.html",
{
"url": "toc_ses_dec_res.html",
"rel": "contents"
},
"LEAD/split.html",
"Decision_1_CMA_3/split.html",
"Decision_2_CMA_3/split.html",
"Decision_3_CMA_3/split.html",
"Decision_4_CMA_3/split.html",
"back_cover.html"
], |
Ah great thank you. Much appreciated - I'll have a go at this now :-) I'm just writing instructions for my colleague @petermr to auto-generate manifests and tocs from the Text and Data Miniing software Py4ami as a first trial so I'm trying to get things done properly on what will be a first trial. |
We've progressed well and will soon, like next week be cleaning things out and add the CSS and modifications to the HTML we generate to at least make a proof of concept presentation to the UN Climate people. I wanted to ask a quick question about a issue we have with the ToC reading in Vivliostyle. Apologies in advance but I think this is us messing up our HTML but before continueing to troubleshoot the issue - which will eventually solve the issue I wondered if you could take a quick look as your more knowledgeable eyes will do better than us and it might be very obvious what were getting wrong. Essentially we're getting the whole ToC doc showing up in the Vivlio menu. Thank you |
The current TOC handling in Vivliostyle.js is not good for your HTML structure, unfortunately. Your HTML structure is like this: <body>
<div id="sessionpre">
<img src="../images/UNlogo.jpg" alt="UN logo" id="unlogo">
<div class="sessionCode">/PA/CMA/2021/10/Add.1</div>
…
<div class="contents">
<div><span>Contents</span></div>
<div><span>Decisions adopted by the Conference of …</span></div>
<!-- TOC -->
<div class="toc">
<div>
<span>Decision</span><span>Page</span></a>
</div>
<nav role="doc-toc">
<ul>
<li>
<a href="../Decision_1_CMA_3/split.html"><span
class="descres-code">1/CMA.3</span><span
class="descres-title">Glasgow Climate Pact</span></a>
</li>
…
</ul>
</nav>
</div>
</div>
</div>
</body> Vivliostyle.js generates the TOC box (displayed in the TOC panel in the Viewer) from the HTML document, skipping elements that are BODY's child and not containing a TOC element. See the code: vivliostyle.js/packages/core/src/vivliostyle/toc.ts Lines 98 to 107 in 1747a92
In your HTML, the BODY has only one child element Also note that stylesheets are ignored in the TOC box. If you change the HTML structure like below, the TOC box will be generated better (but not very good because of lack of style): <body>
<div id="sessionpre">
<img src="../images/UNlogo.jpg" alt="UN logo" id="unlogo">
<div class="sessionCode">/PA/CMA/2021/10/Add.1</div>
…
</div>
<div class="contents">
<div><span>Contents</span></div>
<div><span>Decisions adopted by the Conference of …</span></div>
<!-- TOC -->
<div class="toc">
<div>
<span>Decision</span><span>Page</span></a>
</div>
<nav role="doc-toc">
<ul>
<li>
<a href="../Decision_1_CMA_3/split.html"><span
class="descres-code">1/CMA.3</span><span
class="descres-title">Glasgow Climate Pact</span></a>
</li>
…
</ul>
</nav>
</div>
</div>
</body> |
I am going to fix Vivliostyle.js on these problems:
|
Amazing @MurakamiShinyu - appreciate you looking at this :-) Our HTML is an output of a Text and Data Mining process which converts PDF to HTML running a series of regex normalisation processes when dealing with a specific corpus - in this case it is the UN FCCC treaty agreements - Kyoto Protocol, Paris Agreement, then all the subsequent COP meetings which are based on these treatise. So our expercise here is to come up with a recommendation for fixes to the PDF to HTML conversion that will allow for HTML to workin Vivlio and create Publication Manifests - automagically. We are nearly complete on this prototype and then we want to present to UN FCCC and get them to organise their documents using the process going forwards. So big thank you. For demo puposes I'll clean up HTML in the way you suggest at present. |
Amazing. Thank you so much :-) We were working on a work around Friday to create further DIV childs, but your fix makes it all work. I'll read up on the details etc. We can now proceed to demo the doc to the UN people, and then when we get the time integrate into the TDM pipline. We have a couple of weeks hackathon coming up in India so this will come in really useful with IPCC content too. |
Is your feature request related to a problem? Please describe.
Creating ToCs when using multiple HTML files - looking for support pages.
Describe the solution you'd like
See a pointer to the project we're working on which is to typeset a Linked Open Data copy of the IPCC Glossary - see semanticClimate/glossary-sandbox#1
Additional context
There are a few related ToC issues: how to make the ToC main file; how to relate CSS styles to the different HTML files; how to get ToC items to appear in the the Vivlio navigator; How to get the ToCs from the different HTML files into the front ToC on the page. Sorry a lot here. I will clearly list them over on our site: semanticClimate/glossary-sandbox#1
The text was updated successfully, but these errors were encountered: