Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: unified use of rspack_glob #7556

Closed
wants to merge 15 commits into from
Closed

feat: unified use of rspack_glob #7556

wants to merge 15 commits into from

Conversation

SyMind
Copy link
Member

@SyMind SyMind commented Aug 13, 2024

Summary

Background

In this PR, we unified the glob crate used in Rspack into a new crate named rspack_glob, which affects the SideEffectsFlagPlugin and CopyRspackPlugin.

The code is copied from the glob crate with some modifications, some modifications following the approach from biome:
https://github.com/biomejs/biome/blob/main/crates/biome_service/src/matcher/pattern.rs

  1. Added support for the {a,b} syntax.
  2. Removed the syntax validation for ** with preceding and trailing /. Instead of throwing an error, it now replaces ** with *.

Result

We do not need to unify the implementation of glob within Rspack:

  1. In SideEffectsFlagPlugin, the glob library is used to match the expressions configured in sideEffects with the import filenames. There is no need to interact with the file system I/O. The requirements are fixed, such as case sensitivity and the necessity of delimiters.

  2. In CopyRspackPlugin, the glob library uses user-configured expressions to traverse and find files matching the specified directory. To achieve this, it minimizes the I/O process as much as possible. In the JS ecosystem, fast-glob determines whether a glob expression is dynamic or static and performs partial matching of the glob. The Rust glob crate splits the glob based on delimiters.

  3. In CopyRspackPlugin, it is expected to have more configuration options for glob matching, and its requirements are not fixed.

In summary, to ensure the best performance for sideEffects matching and to satisfy the flexible needs of CopyRspackPlugin, unification is unnecessary.

Checklist

  • Tests updated (or not required).
  • Documentation updated (or not required).

@github-actions github-actions bot added release: feature release: feature related release(mr only) team The issue/pr is created by the member of Rspack. labels Aug 13, 2024
Copy link

netlify bot commented Aug 13, 2024

Deploy Preview for rspack canceled.

Name Link
🔨 Latest commit 2962094
🔍 Latest deploy log https://app.netlify.com/sites/rspack/deploys/66c2e97090643e00085911e8

@SyMind SyMind marked this pull request as ready for review August 13, 2024 09:56
@SyMind
Copy link
Member Author

SyMind commented Aug 13, 2024

!bench

@rspack-bot
Copy link

rspack-bot commented Aug 13, 2024

📝 Benchmark detail: Open

Name Base (2024-08-13 bea8415) Current Change
10000_development-mode + exec 2.3 s ± 17 ms 2.33 s ± 28 ms +1.27 %
10000_development-mode_hmr + exec 701 ms ± 11 ms 703 ms ± 4.6 ms +0.25 %
10000_production-mode + exec 2.85 s ± 35 ms 2.87 s ± 21 ms +0.54 %
arco-pro_development-mode + exec 1.91 s ± 85 ms 1.92 s ± 77 ms +0.67 %
arco-pro_development-mode_hmr + exec 433 ms ± 2.2 ms 435 ms ± 0.69 ms +0.27 %
arco-pro_production-mode + exec 3.46 s ± 88 ms 3.43 s ± 71 ms -0.87 %
arco-pro_production-mode_generate-package-json-webpack-plugin + exec 3.56 s ± 104 ms 3.49 s ± 79 ms -2.06 %
threejs_development-mode_10x + exec 1.7 s ± 20 ms 1.68 s ± 14 ms -1.29 %
threejs_development-mode_10x_hmr + exec 820 ms ± 13 ms 804 ms ± 11 ms -1.98 %
threejs_production-mode_10x + exec 5.49 s ± 19 ms 5.48 s ± 28 ms -0.19 %

LingyuCoder
LingyuCoder previously approved these changes Aug 13, 2024
@shulaoda
Copy link
Collaborator

Perhaps we should directly compare their matching speed.

use codspeed_criterion_compat::{criterion_group, criterion_main, Criterion};

const PATH: &str = "some/a/bigger/path/to/the/crazy/needle.txt";
const GLOB: &str = "some/**/needle.txt";

fn fast_glob_crate(b: &mut Criterion) {
  b.bench_function("fast_glob", |b| {
    b.iter(|| fast_glob::glob_match(GLOB, PATH))
  });
}

fn rspack_glob_crate(b: &mut Criterion) {
  b.bench_function("rspack_glob", |b| {
    let pat = rspack_glob::Pattern::new(GLOB).unwrap(); // Break out of loop bench
    b.iter(|| pat.matches(PATH))
  });
}

criterion_group!(benches, fast_glob_crate, rspack_glob_crate);
criterion_main!(benches);

LingyuCoder
LingyuCoder previously approved these changes Aug 13, 2024
@SyMind
Copy link
Member Author

SyMind commented Aug 13, 2024

@shulaoda I made some mistakes. I habitually thought that fast_glob and glob_match had the same performance. I only tested the performance of glob_match running once, which is slower than the matches method of glob.
But fast_glob is faster than the matches method of glob, it seems that I need to look at this PR again.

But unfortunately, fast_glob is missing the following configuration. Can you add it?

  • case_sensitive
  • require_literal_leading_dot

@shulaoda
Copy link
Collaborator

But unfortunately, fast_glob is missing the following configuration. Can you add it?

  • case_sensitive
  • require_literal_leading_dot

It's my pleasure.

@shulaoda
Copy link
Collaborator

I'm sorry, due to some reasons, I may not be able to complete this task until this weekend. @SyMind

@chenjiahan
Copy link
Member

That's totally fine, take your time 😄

@SyMind SyMind closed this Aug 20, 2024
@SyMind
Copy link
Member Author

SyMind commented Aug 20, 2024

@shulaoda In the end, we found we should not let fast-glob support case_sensitive and require_literal_leading_dot feature. It will reduce performance.

Another point is that the performance of fast-glob can also be optimized. In Rspack, the glob_match_with_brace method is used. This method preprocesses the glob expression and then analyzes the existing optional branches, which are the sub-items of the brace expressions.

When this method matches a glob without internal braces, its performance is even lower than glob_match. I think this preprocessing can be canceled and start only when { is encountered.

Here is my attempt: https://github.com/SyMind/glob-match/blob/main/src/lib.rs#L236

image

@SyMind SyMind deleted the rspack-glob branch August 20, 2024 03:32
@shulaoda
Copy link
Collaborator

When this method matches a glob without internal braces, its performance is even lower than glob_match. I think this preprocessing can be canceled and start only when { is encountered.

Here is my attempt: https://github.com/SyMind/glob-match/blob/main/src/lib.rs#L236

I will try to further optimize, thank you for your attempt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release: feature release: feature related release(mr only) team The issue/pr is created by the member of Rspack.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants