Avoid trailing whitespace in kobo spans #174

Shiandow · 2024-10-05T16:54:00Z

Implement long overdue TODO

Description of change

This seems like a small change but has huge implications for the way text is rendered with optimizeLegibilty and geometricPrecision. Really all it does is move the trailing whitespace to the front of the next group, just like the TODO note suggested ought to be done.

I'm guessing at the exact mechanism, but for unclear reasons whatever justification program Kobo method uses it apparently does not expand or contract the last space in a kobo span. (unless using optimizeSpeed, maybe?) This can causes weird looking text, and in the worst case may even expand the space between letters just to stretch the text far enough. If this is correct it would fit the behaviour observed by jackie_w on the mobileRead forums

This change may create a few pure-whitespace kobo spans when a sentence has some trailing whitespace, which I think could be ignored.

Test results

This seems to fix most issues with optimizeLegibilty. I'm still testing but the improvements I'm seeing are substantial enough that I think it might be best if other people try this out as well. Even if we can't fix everything it improves things a lot.

Only niggle is that the regexp only works for proper sentences, there are still a couple of issues with things like emphasis where a span might end up with trailing whitespace. These occur when the text ends in whitespace mid sentence.

This seems like a small change but has huge implications for the way text is rendered with optimizeLegibilty and geometricPrecision. Signed-off-by: Shiandow <[email protected]>

deepsource-io · 2024-10-05T16:54:09Z

Here's the code health analysis summary for commits 327f809..d63e6b4. View details on DeepSource ↗.

Analysis Summary

Analyzer	Status	Summary	Link
Shell	✅ Success		View Check ↗
Python	❌ Failure	❗ 5 occurences introduced	View Check ↗

💡 If you’re a repository administrator, you can configure the quality gates from the settings.

Signed-off-by: Shiandow <[email protected]>

Shiandow · 2024-10-05T19:33:29Z

I've made a small change to also avoid trailing whitespace when the text cuts off midsentence (as it does when a  tag or other span is in the middle).

I think this works but I still have some things I'm not entirely sure about. Even though I don't think any would affect the result much.

Why limit it to only one quote after a period? It might be incorrect but I do occasionally see "Quotes like 'this.'"
Maybe 'split' is not the most logical option, though it does work and ensure that nothing is skipped.
Does whitespace need its own span? A sentence with emphasis like this, would start A sentence with . I've already confirmed that leaving the whitespace at the end results in problems, but I don't know if we need to wrap it in a span.

jgoguen · 2024-10-07T21:51:31Z

Thanks for this!

For whitespace-only spans, let's not wrap only whitespace in a span.

You can ignore the Lint tasks, those look like something I have to fix. But please fix the tests (test_container.TestContainer.test_spans_added is failing).

Shiandow · 2024-10-08T09:35:48Z

In that case it's worth trying to put all leading and trailing whitespace outside of the spans. That will look more logical in the html and if one works I don't see why the other wouldn't.

I'll have to see if the result works equally well in kobo. If it does it shouldn't be too hard to fix the tests (may be worth adding one to check spans don't still end in whitespace somehow).

BTW, I think the lint is just one line that is too long, I'll add the extra newlines so it'll stop complaining.

Some of these were just replicating the implementation, which seemed less than useful. I've tried to test the important properties. The PR#106 test needs to be weakened a bit but the weakened test should still catch the bug.

Shiandow · 2024-10-08T21:43:29Z

I haven't managed to get the tests running locally, but I think they should work now. The new version also seems to work at least equally well on an actual kobo, and I like both the result and the way it works better. Can't quite figure out how to make all linter(s?) happy though, I think the code is about as readable as I can make it.

Let me know if you want me to cleanup the commit history.

jgoguen · 2024-10-08T23:40:06Z

A lot of what I've set up for tests assumes Linux, though macOS should work just as well. If you're on Windows, I wish you the best of luck getting things running locally without moving into WSL2 territory.

Don't worry about the lint tests, vale seems to not work with no changes made and I can easily clean up black later. The CI tests are the important ones and those look like they're probably just whitespace-related changes.

I'm not fussy about commit history. It's a view of what changed, I don't really care if a PR is one squashed commit or a series of commits.

The githup_106 test is a bit odd, because I'm not sure what changed. Whitespace should be identical though, even without the lstrip. Which is easy enough to check.

Shiandow · 2024-10-09T00:36:56Z

Yeah I was worried that might be the case. I don't have a unix system handy right now, and while I could reverse engineer it to work for windows I hope to get through it with limited trial and error. Ironically I'm more confident the actual changes are correct than I am about the tests.

Though if the github_106 keeps failing I'll have to figure out what's going on with the non-significant whitespace, which is probably not toing to be possible without the ability to actually test the tests.

jgoguen · 2024-10-13T23:58:33Z

I had a chance to take a look at the code in more depth and the test failures are related to how text splitting has changed with this code. I'll fix that up, otherwise this looks good.

Shiandow · 2024-10-14T09:32:57Z

Ok, thanks for that, that helps a lot.

By the way, should you wish to change anything in the future, the current version should be able to handle changes to the regexp. Anything captured will be put in a span and anything not captured will be left outside.

callahad · 2024-11-27T15:36:57Z

I suspect this has caused a regression in #175

Shiandow · 2024-11-27T21:05:03Z

I see. Well if that is a problem you could probably just capture the leading whitespace as well. Changing the regexp to

TEXT_SPLIT_RE = re.compile(
    r'(.*?(?:[\.\!\?\:][\'"\u201c\u201d\u2018\u2019\u2026]*(?=\s)|(?=\s*$)))',
    re.UNICODE | re.MULTILINE,
)

ought to do it.

Under no circumstances should you try to capture the trailing whitespace however. The kepub text rendering does not like that.

Avoid trailing whitespace in kobo spans

4c88c17

This seems like a small change but has huge implications for the way text is rendered with optimizeLegibilty and geometricPrecision. Signed-off-by: Shiandow <[email protected]>

Also stop sentence on last non-whitespace character before end of line.

4e1a432

Signed-off-by: Shiandow <[email protected]>

Shiandow marked this pull request as ready for review October 5, 2024 19:34

Shiandow added 2 commits October 8, 2024 11:33

Stop linter nagging

452236e

Put trailing/leading whitespace outside of spans

78cd242

Shiandow added 2 commits October 8, 2024 19:06

Attempt to fix tests

5147596

Some of these were just replicating the implementation, which seemed less than useful. I've tried to test the important properties. The PR#106 test needs to be weakened a bit but the weakened test should still catch the bug.

Fix name

d46894f

Attempt to pass tests

d63e6b4

The githup_106 test is a bit odd, because I'm not sure what changed. Whitespace should be identical though, even without the lstrip. Which is easy enough to check.

jgoguen merged commit 296783a into jgoguen:main Oct 13, 2024
1 of 2 checks passed

Shiandow deleted the patch-1 branch October 14, 2024 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid trailing whitespace in kobo spans #174

Avoid trailing whitespace in kobo spans #174

Shiandow commented Oct 5, 2024 •

edited

Loading

deepsource-io bot commented Oct 5, 2024 •

edited

Loading

Analysis Summary

Shiandow commented Oct 5, 2024 •

edited

Loading

jgoguen commented Oct 7, 2024

Shiandow commented Oct 8, 2024

Shiandow commented Oct 8, 2024

jgoguen commented Oct 8, 2024

Shiandow commented Oct 9, 2024

jgoguen commented Oct 13, 2024

Shiandow commented Oct 14, 2024

callahad commented Nov 27, 2024

Shiandow commented Nov 27, 2024 •

edited

Loading

Avoid trailing whitespace in kobo spans #174

Avoid trailing whitespace in kobo spans #174

Conversation

Shiandow commented Oct 5, 2024 • edited Loading

Implement long overdue TODO

Description of change

Test results

deepsource-io bot commented Oct 5, 2024 • edited Loading

Analysis Summary

Shiandow commented Oct 5, 2024 • edited Loading

jgoguen commented Oct 7, 2024

Shiandow commented Oct 8, 2024

Shiandow commented Oct 8, 2024

jgoguen commented Oct 8, 2024

Shiandow commented Oct 9, 2024

jgoguen commented Oct 13, 2024

Shiandow commented Oct 14, 2024

callahad commented Nov 27, 2024

Shiandow commented Nov 27, 2024 • edited Loading

Shiandow commented Oct 5, 2024 •

edited

Loading

deepsource-io bot commented Oct 5, 2024 •

edited

Loading

Shiandow commented Oct 5, 2024 •

edited

Loading

Shiandow commented Nov 27, 2024 •

edited

Loading