Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tab_width support #490

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mtoohey31
Copy link

This merge request is an updated version of #429. It allows the width of a tab to be configured for a variety of this crate's functions, instead of the width being fixed at zero, as it currently is.

src/columns.rs Outdated Show resolved Hide resolved
src/columns.rs Outdated Show resolved Hide resolved
src/columns.rs Outdated Show resolved Hide resolved
src/core.rs Show resolved Hide resolved
@@ -118,6 +122,26 @@ impl<'a> Options<'a> {
}
}

/// Change [`self.tab_width`]. The tab width is how wide
/// a tab character should be considered. By default, a tab
/// character to be consiered to have a width of 0.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually thinking that we should use a more traditional width of 8. If we go through the effort of adding support for this, people should see it work out of the box — if people discover that they have unexpected TAB characters in their strings, they can either filter them out or set the width back to zero.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with that; it's just a default anyways, and changing it is simple. How do you feel about this given #490 (comment) though? If we set it to anything greater than 1, we loose the fast in wrap_single_line by default, at least with my current fix.

src/options.rs Outdated Show resolved Hide resolved
src/word_splitters.rs Outdated Show resolved Hide resolved
@mgeisler mgeisler mentioned this pull request Nov 9, 2022
@mgeisler
Copy link
Owner

The fuzz tests need some small fixes, something like this:

modified   fuzz/fuzz_targets/wrap_first_fit.rs
@@ -14,12 +14,12 @@ struct Word {
 #[rustfmt::skip]
 impl core::Fragment for Word {
     fn width(&self) -> f64 { self.width }
-    fn whitespace_width(&self) -> f64 { self.whitespace_width }
+    fn whitespace_width(&self, _: u8) -> f64 { self.whitespace_width }
     fn penalty_width(&self) -> f64 { self.penalty_width }
 }
 
 fuzz_target!(|input: (f64, Vec<Word>)| {
     let width = input.0;
     let words = input.1;
-    let _ = wrap_first_fit(&words, &[width]);
+    let _ = wrap_first_fit(&words, &[width], 0);
 });
modified   fuzz/fuzz_targets/wrap_optimal_fit.rs
@@ -35,7 +35,7 @@ struct Word {
 #[rustfmt::skip]
 impl core::Fragment for Word {
     fn width(&self) -> f64 { self.width }
-    fn whitespace_width(&self) -> f64 { self.whitespace_width }
+    fn whitespace_width(&self, _: u8) -> f64 { self.whitespace_width }
     fn penalty_width(&self) -> f64 { self.penalty_width }
 }
 
@@ -57,5 +57,5 @@ fuzz_target!(|input: (usize, Vec<Word>, Penalties)| {
         }
     }
 
-    let _ = wrap_optimal_fit(&words, &[width as f64], &penalties);
+    let _ = wrap_optimal_fit(&words, &[width as f64], 0, &penalties);
 });
modified   fuzz/fuzz_targets/wrap_optimal_fit_usize.rs
@@ -35,7 +35,7 @@ struct Word {
 #[rustfmt::skip]
 impl core::Fragment for Word {
     fn width(&self) -> f64 { self.width as f64 }
-    fn whitespace_width(&self) -> f64 { self.whitespace_width as f64 }
+    fn whitespace_width(&self, _: u8) -> f64 { self.whitespace_width as f64 }
     fn penalty_width(&self) -> f64 { self.penalty_width as f64 }
 }
 
@@ -45,5 +45,5 @@ fuzz_target!(|input: (usize, Vec<Word>, Penalties)| {
     let width = input.0;
     let words = input.1;
     let penalties = input.2.into();
-    let _ = wrap_optimal_fit(&words, &[width as f64], &penalties);
+    let _ = wrap_optimal_fit(&words, &[width as f64], 0, &penalties);
 });

With that, a problem will appear: running cargo fuzz run wrap_fast_path fails on input like

    let input = (String::from("foo \t"), 20);

You'll get

thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `["foo \t"]`,
 right: `["foo "]`', fuzz_targets/wrap_fast_path.rs:20:5

The problem is that we end up with two words instead of just one:

[src/textwrap/src/word_separators.rs:293] &opportunities = IntoIter(
    [
        (
            4,
            Allowed,
        ),
        (
            5,
            Mandatory,
        ),
    ],
)
[src/wrap.rs:231] &words = [
    Word {
        word: "foo",
        whitespace: " ",
        penalty: "",
        width: 3,
    },
    Word {
        word: "",
        whitespace: "\t",
        penalty: "",
        width: 0,
    },
]

The fast-path optimization assumes that every word matters, but here the second word should actually have been merged with the first word (with whitespace: " \t",). So when the slow path runs, it will add "foo" (word), then " " (whitespace) and finally "" (word) but no whitespace from the second word since it's the final word of the line.

I think we might be able to fix this if we run trim_end_matches(&[' ', '\t']) before finding words. We don't output the trailing whitespace any longer, so it should be correct to do that. This seems to work from some light testing:

modified   src/core.rs
@@ -243,7 +243,7 @@ impl<'a> Word<'a> {
     /// All trailing whitespace is automatically taken to be the whitespace part
     /// of the word.
     pub fn from(word: &str) -> Word<'_> {
-        let trimmed = word.trim_end();
+        let trimmed = word.trim_end_matches(&[' ', '\t']);
         Word {
             word: trimmed,
             // trimmed shouldn't contain whitespace, so we don't need to pass
modified   src/wrap.rs
@@ -203,7 +203,7 @@ pub(crate) fn wrap_single_line<'a>(
         options.subsequent_indent
     };
     if line.len() < options.width && options.tab_width <= 1 && indent.is_empty() {
-        lines.push(Cow::from(line.trim_end_matches(' ')));
+        lines.push(Cow::from(line.trim_end_matches(&[' ', '\t'])));
     } else {
         wrap_single_line_slow_path(line, options, lines)
     }
@@ -217,6 +217,8 @@ pub(crate) fn wrap_single_line_slow_path<'a>(
     options: &Options<'_>,
     lines: &mut Vec<Cow<'a, str>>,
 ) {
+    let line = line.trim_end_matches(&[' ', '\t']);
+
     let initial_width = options
         .width
         .saturating_sub(display_width(options.initial_indent, options.tab_width));

@mtoohey31
Copy link
Author

Fuzz tests should 🤞 be all fixed in 8b2a9f1, and the wasm demo should be fixed in f1352c8. I'm just ignoring the tab_width parameter in the wasm demo, because web_sys::CanvasRenderingContext2d::measure_text should already register tab widths correctly, right?

@mgeisler
Copy link
Owner

mgeisler commented Feb 9, 2023

Hey @mtoohey31! Happy new year... 😄

Can I get you to squash your fixes into the commits which introduced the problems in the first place? That will make the commits here easier to review. It would also be great if you can rebase on top of the latest master so we can get fresh test run.

@mtoohey31
Copy link
Author

Hey @mtoohey31! Happy new year... 😄

Haha, thanks 🙂 Same to you!

Can I get you to squash your fixes into the commits which introduced the problems in the first place? That will make the commits here easier to review. It would also be great if you can rebase on top of the latest master so we can get fresh test run.

Sure, should be rebased now! Pretty much all of the extra commits were fixes for problems, so I've squashed everything into one commit. Hope that's ok; if there was another way you'd like me to split things up though, let me know.

@mgeisler
Copy link
Owner

Pretty much all of the extra commits were fixes for problems, so I've squashed everything into one commit. Hope that's ok; if there was another way you'd like me to split things up though, let me know.

That is precisely what I was hoping for — fixes to problems found in PR discussions are not useful for the long-term history, so they should ideally be squashed away before merging. Thank you!

@callum-oakley
Copy link

Hi! What's the status of this? I'd be interested in helping to get it over the finish line if there's any more work required (I'm keen to get helix-editor/helix#3622 fixed).

Also thanks for the excellent crate. :)

@mtoohey31
Copy link
Author

@mgeisler I see there are some conflicts now, I would still like to get this fixed at some point though. If I rebase the changes, would you possibly have time to look at the merge request again? Or is there an issue with the general approach that you'd like to resolve before we move forward? I don't mind starting the PR from scratch with a new approach if that's what has to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants