Skip to content

Commit

Permalink
semanticate: i inside a tag is not a Roman numeral.
Browse files Browse the repository at this point in the history
  • Loading branch information
gvtulder authored and acabal committed Sep 23, 2024
1 parent 99f028a commit bdce342
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 1 deletion.
2 changes: 1 addition & 1 deletion se/formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def semanticate(xhtml: str) -> str:
xhtml = regex.sub(r"""([^\p{Letter}>\"])([vxVX])(\b[^\-]|st\b|nd\b|rd\b|th\b)""", r"""\1<span epub:type="z3998:roman">\2</span>\3""", xhtml)

# We can assume a lowercase i is always a Roman numeral unless followed by ’
xhtml = regex.sub(r"""([^\p{Letter}<>/\"])i\b(?!’)""", r"""\1<span epub:type="z3998:roman">i</span>""", xhtml)
xhtml = regex.sub(r"""([^\p{Letter}<>/\"])i\b(?!’)(?![^<>]+>)""", r"""\1<span epub:type="z3998:roman">i</span>""", xhtml)

# Fix obscured names starting with I, V, or X
xhtml = regex.sub(fr"""<span epub:type="z3998:roman">([IVX])</span>{se.WORD_JOINER}⸺""", fr"""\1{se.WORD_JOINER}⸺""", xhtml)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,9 @@
<p>I gave him an <abbr epub:type="z3998:initialism" class="eoc">I.O.U.</abbr></p>
<p>The picture was almost <abbr epub:type="z3998:initialism" class="eoc">3D</abbr>.</p>
<p>His name was abbreviated <abbr epub:type="z3998:given-name" class="eoc">Chas.</abbr></p>
<!-- roman / not roman -->
<p>Edition <span epub:type="z3998:roman">i</span>. Pages <span epub:type="z3998:roman">i</span> and <span epub:type="z3998:roman">ii</span>. Number i’.</p>
<p>See <a href="appendix-i.xhtml">Appendix</a>.</p>
</section>
</body>
</html>
3 changes: 3 additions & 0 deletions tests/draft_commands/semanticate/test-1/in/semanticate.xhtml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,9 @@
<p>I gave him an <abbr epub:type="z3998:initialism" class="eoc">I.O.U.</abbr></p>
<p>The picture was almost <abbr epub:type="z3998:initialism" class="eoc">3D</abbr>.</p>
<p>His name was abbreviated <abbr epub:type="z3998:given-name" class="eoc">Chas.</abbr></p>
<!-- roman / not roman -->
<p>Edition i. Pages i and ii. Number i’.</p>
<p>See <a href="appendix-i.xhtml">Appendix</a>.</p>
</section>
</body>
</html>

0 comments on commit bdce342

Please sign in to comment.