-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements font styles in the output XML #936
base: master
Are you sure you want to change the base?
Conversation
Hi @lfoppiano ! This branch will require quite a few tests I think (I suspect it will raise problems to some of the grobid modules and I need to check the consistency with Pub2TEI), so I pushed its release to version 0.8.0. One thing related to "document structure" versus "narrative style" is the bold style for section titles. I think it's like the italic/bold for the reference markers, the logical "section title" structure is already captured by the For example in the attached pdf, the style should be ignored here: <div xmlns="http://www.tei-c.org/ns/1.0">
<head n="1"><hi rend="bold">Introduction</hi></head> In contrast, the style here should be kept because it corresponds to an highlight within the flow of the paragraph text: <p>12. <hi rend="bold">Average tf-idf similarity between citance and title of the cited paper (F12):</hi> We calculate the similarity of each citance with the title of the cited paper and take an average of it.</p>
<p>13. <hi rend="bold">Maximum tf-idf similarity between citance and title of the cited paper (F13):</hi> We take the maximum of similarity of the citances with the title of the cited paper.</p> Does it make sense? |
@kermitt2 yes, no problem to push it further. OK to the change you propose. |
The crazy part was to merge the master back in this branch 😅
I've made the change and now the text within the
I'm not sure what you mean in this case 🙂 |
# Conflicts: # grobid-core/src/main/java/org/grobid/core/document/TEIFormatter.java # grobid-core/src/test/java/org/grobid/core/document/TEIFormatterTest.java
This PR is implementing the styles italic, bold superscript and subscript in the output xml.
See information at #160