-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make word separators and splitters more flexible #402
base: master
Are you sure you want to change the base?
Conversation
This patch adds the find_word_ranges method to the WordSeparator struct that makes it possible to find words within a string without using textwrap’s Word type. This is especially useful when using custom types for strings.
This patch adds the word_splitters::Fragments struct that yields the fragments for a word. This makes it easier to reason over the lifetimes of the generated iterator and allows us to make it generic over the word type.
This patch introduces the new word_splitters::Splittable trait and makes word_splitters::Fragments generic over that trait. This allows library users to use their own fragment types and not only core::Word.
#[derive(Debug)] | ||
pub struct Fragments<W: Splittable, I: Iterator<Item = usize>> { | ||
word: W, | ||
split_points: I, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type parameter is unnecessary, we can directly use std::vec::IntoIter<usize>
.
Hey @robinkrahl, I just wanted to say that I think this looks really nice! One thought: would it make sense to combine So far, my thinking has been that fragments contain just the necessary information for us to be able to wrap them nicely with the two wrapping algorithms. That is, a fragment has a width and whitespce/penalty following it. However, since the proposed wrapping "pipeline" also involves finding and splitting the fragments ( Just a quick idea... I'll give it a proper look tomorrow. |
Thanks for the update!
One thought: would it make sense to combine `Splittable` with
`Fragment` and thus say that any type which a `Fragment` should have a
way of splitting itself? The trait could have a default implementation
which doesn't do any splitting.
Good point. I had a similar thought. The reason why I didn’t want to
combine these two is that the width computation is not required for the
word splitting, it might be expensive and might require more information
than just the string and the style. Therefore, users might want to have
separate types for a splittable word and a fragment with width
calculations that can also be cached.
(Even the current implementation for core::Word currently performs
unnecessary width calculations for words that are split up later, though
the cost is probably negligible.)
In my use case, I have to look up the glyphs in the font data to
calculate their widths. Therefore, my pipeline currently is: separate
and split words using string slices (→ StyledWord), calculate their
widths and copy the strings (→ StyledFragment), cache the result and
then run the wrap algorithm.
Of course this could also be implemented with a single struct, but it
would have to keep some state.
These are no big issues, so if you prefer a single trait, I could live
with that too.
|
Configure required features for style example
src/word_splitters.rs
Outdated
}, | ||
penalty: if keep_ending { | ||
self.word.penalty | ||
} else if word.ends_with('-') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition should be negated – fixed in the next commit.
Yes, I can see why that could be useful for users of the library. On the other hand, I wrote a script to download every reverse dependency, and I cannot find any public usages of Basically, we seem to have pretty free hands in how we structure the internals, while still making a 0.15.0 release a drop-in replacement for everybody. If two traits makes your life easier, than I'm happy to support it.
You're right and I remember toying with the idea of having an
No, let's go with two since it seems to better model the information necessary at different stages. Would it perhaps make sense to have
I'm thinking names like that might help to tie the two concepts together, while also being more descriptive. Please let me know what you think! |
Sounds good! |
Okay, cool — let's do that independently of this PR. |
let len = self.word.as_ref().len(); | ||
if self.prev < len || self.prev == 0 { | ||
let w = self.word.split(self.prev..len, true); | ||
// TODO: shouldn’t this be just len? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have added a comment to explain this: with just
self.prev = len;
you get an infinite loop when len == 0
since self.prev == 0
stays true after every call to next
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove the +1
on master
, you'lll see that cargo test
hangs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks. Will remove the TODO.
} | ||
} | ||
|
||
impl textwrap::word_splitters::Splittable for StyledWord<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of a shame that the example doesn't show why there are two traits :-) I love the example in itself, it's super great at demonstrating the concept of wrapping not-just-plain-text. However, it would be nice if it would exploit the two traits better.
Will you be having different structs in genpdf
, one for Fragment
and another for Splittable
? if not, then I would prefer to keep the number of concepts low and add a split
method to Fragment
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, you already explained that you have a StyledWord
struct for the unmeasured case and a StyledFragment
for the pre-split and measured words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, the distinction is especially relevant if the width computation is non-trivial, which is typically the case for scenarios other than the terminal. I could add an example that produces a PDF file, but I think that would be too complex to be useful as an example for textwrap
. Maybe we can have an example that produces an SVG image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right that a full-blown PDF seems unnecessary — could you instead pretend that you need two structs in the style
example? I believe you're using len()
on the strings, which is cheating ever so slightly :-)
It might look a bit arbitrary, but for educational purposes, I think we're allowed to exaggerate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can do that.
This name better matches the new `SplittableFragment` trait in #402.
This PR makes the word separators and splitters more flexible to allow users to use their own words types. It also adds an example that shows how to use textwrap with custom word types, namely with styled strings.
As this is only a draft, I did not add much documentation.
The basic changes are:
Fragments
iterator struct that yields the fragments for a word. It turned out to be easier to introduce aSplittable
trait than to use a closure to perform the splitting, but this could also be changed. I used a new struct for the iterator because that makes it much easier to keep track of the life times.