Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make word separators and splitters more flexible #402

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

robinkrahl
Copy link
Contributor

This PR makes the word separators and splitters more flexible to allow users to use their own words types. It also adds an example that shows how to use textwrap with custom word types, namely with styled strings.

As this is only a draft, I did not add much documentation.

The basic changes are:

  • For word separation, introduce a new method that just returns the range of the detected words. Users can then apply their own logic for creating words from these ranges.
  • For word splitting, introduce a new Fragments iterator struct that yields the fragments for a word. It turned out to be easier to introduce a Splittable trait than to use a closure to perform the splitting, but this could also be changed. I used a new struct for the iterator because that makes it much easier to keep track of the life times.

This patch adds the find_word_ranges method to the WordSeparator struct
that makes it possible to find words within a string without using
textwrap’s Word type.  This is especially useful when using custom types
for strings.
This patch adds the word_splitters::Fragments struct that yields the
fragments for a word.  This makes it easier to reason over the lifetimes
of the generated iterator and allows us to make it generic over the word
type.
This patch introduces the new word_splitters::Splittable trait and makes
word_splitters::Fragments generic over that trait.  This allows library
users to use their own fragment types and not only core::Word.
#[derive(Debug)]
pub struct Fragments<W: Splittable, I: Iterator<Item = usize>> {
word: W,
split_points: I,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type parameter is unnecessary, we can directly use std::vec::IntoIter<usize>.

@mgeisler
Copy link
Owner

mgeisler commented Jul 3, 2021

Hey @robinkrahl, I just wanted to say that I think this looks really nice!

One thought: would it make sense to combine Splittable with Fragment and thus say that any type which a Fragment should have a way of splitting itself? The trait could have a default implementation which doesn't do any splitting.

So far, my thinking has been that fragments contain just the necessary information for us to be able to wrap them nicely with the two wrapping algorithms. That is, a fragment has a width and whitespce/penalty following it. However, since the proposed wrapping "pipeline" also involves finding and splitting the fragments (Words), perhaps it would be much nicer to combine the whole thing into one.

Just a quick idea... I'll give it a proper look tomorrow.

@robinkrahl
Copy link
Contributor Author

robinkrahl commented Jul 4, 2021 via email

Configure required features for style example
},
penalty: if keep_ending {
self.word.penalty
} else if word.ends_with('-') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition should be negated – fixed in the next commit.

@mgeisler
Copy link
Owner

mgeisler commented Jul 4, 2021

Therefore, users might want to have separate types for a splittable word and a fragment with width calculations that can also be cached.

Yes, I can see why that could be useful for users of the library. On the other hand, I wrote a script to download every reverse dependency, and I cannot find any public usages of textwrap::core, except for Clap which uses textwrap::core::display_width. The more advanced functions seems unused except by my own examples — and perhaps genpdf at some point 😆

Basically, we seem to have pretty free hands in how we structure the internals, while still making a 0.15.0 release a drop-in replacement for everybody. If two traits makes your life easier, than I'm happy to support it.

(Even the current implementation for core::Word currently performs unnecessary width calculations for words that are split up later, though the cost is probably negligible.)

You're right and I remember toying with the idea of having an UnmeasuredWord struct back in the day. In the end I figured it was enough with Word and then measure the text twice. The difference in performance is about 7% on a quick test I just did — but given that it takes around 260 microseconds to wrap a 4800 character long line on my laptop, I think the performance is good enough, even for realtime wrapping in a GUI or similar.

These are no big issues, so if you prefer a single trait, I could live with that too.

No, let's go with two since it seems to better model the information necessary at different stages. Would it perhaps make sense to have

  • SplittableFragment: an abstract piece of text from which you can get a sub-fragment (your Splittable trait)
  • MeasuredFragment: an abstract piece of text which known how big it is and which we can word wrap (the existing Fragment trait)

I'm thinking names like that might help to tie the two concepts together, while also being more descriptive. Please let me know what you think!

@robinkrahl
Copy link
Contributor Author

robinkrahl commented Jul 4, 2021

  • SplittableFragment: an abstract piece of text from which you can get a sub-fragment (your Splittable trait)

  • MeasuredFragment: an abstract piece of text which known how big it is and which we can word wrap (the existing Fragment trait)

I'm thinking names like that might help to tie the two concepts together, while also being more descriptive. Please let me know what you think!

Sounds good!

@mgeisler
Copy link
Owner

mgeisler commented Jul 4, 2021

  • SplittableFragment: an abstract piece of text from which you can get a sub-fragment (your Splittable trait)
  • MeasuredFragment: an abstract piece of text which known how big it is and which we can word wrap (the existing Fragment trait)

I'm thinking names like that might help to tie the two concepts together, while also being more descriptive. Please let me know what you think!

Sounds good!

Okay, cool — let's do that independently of this PR.

let len = self.word.as_ref().len();
if self.prev < len || self.prev == 0 {
let w = self.word.split(self.prev..len, true);
// TODO: shouldn’t this be just len?
Copy link
Owner

@mgeisler mgeisler Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have added a comment to explain this: with just

self.prev = len;

you get an infinite loop when len == 0 since self.prev == 0 stays true after every call to next.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove the +1 on master, you'lll see that cargo test hangs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. Will remove the TODO.

}
}

impl textwrap::word_splitters::Splittable for StyledWord<'_> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a shame that the example doesn't show why there are two traits :-) I love the example in itself, it's super great at demonstrating the concept of wrapping not-just-plain-text. However, it would be nice if it would exploit the two traits better.

Will you be having different structs in genpdf, one for Fragment and another for Splittable? if not, then I would prefer to keep the number of concepts low and add a split method to Fragment.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, you already explained that you have a StyledWord struct for the unmeasured case and a StyledFragment for the pre-split and measured words.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, the distinction is especially relevant if the width computation is non-trivial, which is typically the case for scenarios other than the terminal. I could add an example that produces a PDF file, but I think that would be too complex to be useful as an example for textwrap. Maybe we can have an example that produces an SVG image?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that a full-blown PDF seems unnecessary — could you instead pretend that you need two structs in the style example? I believe you're using len() on the strings, which is cheating ever so slightly :-)

It might look a bit arbitrary, but for educational purposes, I think we're allowed to exaggerate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can do that.

mgeisler added a commit that referenced this pull request Jul 6, 2021
This name better matches the new `SplittableFragment` trait in #402.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants