-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid trailing whitespace in kobo spans #174
Conversation
This seems like a small change but has huge implications for the way text is rendered with optimizeLegibilty and geometricPrecision. Signed-off-by: Shiandow <[email protected]>
Here's the code health analysis summary for commits Analysis Summary
|
Signed-off-by: Shiandow <[email protected]>
I've made a small change to also avoid trailing whitespace when the text cuts off midsentence (as it does when a I think this works but I still have some things I'm not entirely sure about. Even though I don't think any would affect the result much.
|
Thanks for this! For whitespace-only spans, let's not wrap only whitespace in a span. You can ignore the Lint tasks, those look like something I have to fix. But please fix the tests ( |
In that case it's worth trying to put all leading and trailing whitespace outside of the spans. That will look more logical in the html and if one works I don't see why the other wouldn't. I'll have to see if the result works equally well in kobo. If it does it shouldn't be too hard to fix the tests (may be worth adding one to check spans don't still end in whitespace somehow). BTW, I think the lint is just one line that is too long, I'll add the extra newlines so it'll stop complaining. |
Some of these were just replicating the implementation, which seemed less than useful. I've tried to test the important properties. The PR#106 test needs to be weakened a bit but the weakened test should still catch the bug.
I haven't managed to get the tests running locally, but I think they should work now. The new version also seems to work at least equally well on an actual kobo, and I like both the result and the way it works better. Can't quite figure out how to make all linter(s?) happy though, I think the code is about as readable as I can make it. Let me know if you want me to cleanup the commit history. |
A lot of what I've set up for tests assumes Linux, though macOS should work just as well. If you're on Windows, I wish you the best of luck getting things running locally without moving into WSL2 territory. Don't worry about the lint tests, vale seems to not work with no changes made and I can easily clean up black later. The CI tests are the important ones and those look like they're probably just whitespace-related changes. I'm not fussy about commit history. It's a view of what changed, I don't really care if a PR is one squashed commit or a series of commits. |
The githup_106 test is a bit odd, because I'm not sure what changed. Whitespace should be identical though, even without the lstrip. Which is easy enough to check.
Yeah I was worried that might be the case. I don't have a unix system handy right now, and while I could reverse engineer it to work for windows I hope to get through it with limited trial and error. Ironically I'm more confident the actual changes are correct than I am about the tests. Though if the github_106 keeps failing I'll have to figure out what's going on with the non-significant whitespace, which is probably not toing to be possible without the ability to actually test the tests. |
I had a chance to take a look at the code in more depth and the test failures are related to how text splitting has changed with this code. I'll fix that up, otherwise this looks good. |
Ok, thanks for that, that helps a lot. By the way, should you wish to change anything in the future, the current version should be able to handle changes to the regexp. Anything captured will be put in a span and anything not captured will be left outside. |
I suspect this has caused a regression in #175 |
I see. Well if that is a problem you could probably just capture the leading whitespace as well. Changing the regexp to TEXT_SPLIT_RE = re.compile(
r'(.*?(?:[\.\!\?\:][\'"\u201c\u201d\u2018\u2019\u2026]*(?=\s)|(?=\s*$)))',
re.UNICODE | re.MULTILINE,
) ought to do it. Under no circumstances should you try to capture the trailing whitespace however. The kepub text rendering does not like that. |
Implement long overdue TODO
Description of change
This seems like a small change but has huge implications for the way text is rendered with optimizeLegibilty and geometricPrecision. Really all it does is move the trailing whitespace to the front of the next group, just like the TODO note suggested ought to be done.
I'm guessing at the exact mechanism, but for unclear reasons whatever justification program Kobo method uses it apparently does not expand or contract the last space in a kobo span. (unless using optimizeSpeed, maybe?) This can causes weird looking text, and in the worst case may even expand the space between letters just to stretch the text far enough. If this is correct it would fit the behaviour observed by jackie_w on the mobileRead forums
This change may create a few pure-whitespace kobo spans when a sentence has some trailing whitespace, which I think could be ignored.
Test results
This seems to fix most issues with optimizeLegibilty. I'm still testing but the improvements I'm seeing are substantial enough that I think it might be best if other people try this out as well. Even if we can't fix everything it improves things a lot.
Only niggle is that the regexp only works for proper sentences, there are still a couple of issues with things like emphasis where a span might end up with trailing whitespace. These occur when the text ends in whitespace mid sentence.