Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

du --exclude='[^/]*' / behave differently #3628

Closed
sylvestre opened this issue Jun 13, 2022 · 7 comments · Fixed by #3754
Closed

du --exclude='[^/]*' / behave differently #3628

sylvestre opened this issue Jun 13, 2022 · 7 comments · Fixed by #3754
Labels

Comments

@sylvestre
Copy link
Contributor

$ ./target/debug/coreutils du --exclude='[^/]*'  / 
$ du --exclude='[^/]*'  / 
4       /

Similarly:

$ ./target/debug/coreutils du --exclude='[^/tmp/]*'  /tmp
$ du --exclude='[^/tmp/]*'  /tmp 
[...]
4       /tmp/tmpydz9rgeb
19156   /tmp

Tested by
https://github.com/coreutils/coreutils/blob/master/tests/du/slash.sh

@ackerleytng
Copy link
Contributor

Let me take this up!

@sylvestre
Copy link
Contributor Author

sylvestre commented Jun 28, 2022 via email

@ackerleytng
Copy link
Contributor

I found that Rust's glob crate doesn't handle negation within the brackets [^. Instead of ^, glob expects the use of [^.

Should we build a wrapper? Perhaps do some kind of pattern replacement? I feel that that could be difficult to do correctly.

As an example, the following works:

$ ./target/debug/coreutils du --exclude='[!/]*'  /
4       /
$ du --exclude='[!/]*'  /
4       /
$

@ackerleytng
Copy link
Contributor

Both glob and globset use the ! syntax for negation

@ackerleytng
Copy link
Contributor

ackerleytng commented Jul 12, 2022

This issue extends to other tools:

$ ls
a  b  c  d  e
$ ~/projects/coreutils/target/debug/coreutils ls
a  b  c  d  e
$ ls --ignore '[a]'
b  c  d  e
$ ~/projects/coreutils/target/debug/coreutils ls --ignore '[a]'
b  c  d  e
$ ls --ignore '[!a]'
a
$ ls --ignore '[^a]'
a
$ ~/projects/coreutils/target/debug/coreutils ls --ignore '[!a]'
a
$ ~/projects/coreutils/target/debug/coreutils ls --ignore '[^a]'
b  c  d  e
$

gnu coreutils uses fnmatch for both these matches.

I'm thinking of creating an FnmatchPattern that extends the rust crate glob's Pattern. In FnmatchPattern, I would first call Pattern's new(), and then amend the vector of parsed tokens if AnyWithin(^) is the first token in a list of AnyWithins.

Where's a good place to put the code for FnmatchPattern? Would it be best placed in uucore so that it can be shared?

I've also created an issue upstream: rust-lang/glob#116

@ackerleytng
Copy link
Contributor

Explored all the places where fnmatch is used in GNU coreutils.

Only 3 utilities use fnmatch, which are du, ls, and dircolors.

  • ls uses fnmatch with fnmatch's third parameter (flags) set to 0 and FNM_PERIOD
  • dircolors uses fnmatch with flags set to 0
  • du uses fnmatch through the concept of excludes

The exclude stuff calls fnmatch through this code path:

  1. du.c first uses add_exclude with EXCLUDE_WILDCARDS to add its command line parameters from --exclude to an exclude struct.
  2. Later when doing the matching, du.c calls file_pattern_matches on each filename
  3. file_pattern_matches calls exclude_patopts, which calls exclude_fnmatch,
  4. Which then calls Gnulib's fnmatch, with flags set to options (which is EXCLUDE_WILDCARDS)
  5. Gnulib's fnmatch calls internal_fnmatch after adjusting for some locale stuff.

I believe internal_fnmatch is glibc's (or whichever libc's) implementation of fnmatch. Not completely sure, because grepping for internal_fnmatch in coreutils and Gnulib both don't return any function definitions for internal_fnmatch.

The libc fnmatch uses the following flags:

FNM_PATHNAME, FNM_NOESCAPE, FNM_PERIOD, FNM_LEADING_DIR, FNM_CASEFOLD

which are the lowest 5 bits.

When Gnulib's fnmatch is called with flags set to options, the options are from exclude.h, which are

EXCLUDE_ANCHORED, EXCLUDE_INCLUDE, EXCLUDE_WILDCARDS, EXCLUDE_REGEX, EXCLUDE_ALLOC,

and are bits 26 to 30 inclusive.

Hence du effectively uses fnmatch with flags set to 0, which matches the way fnmatch is used in dircolors.

TLDR:

  • Based on the current usage of fnmatch in GNU coreutils, we only need to re-implement fnmatch for flags = 0 and flags = FNM_PERIOD.
  • Only 3 utils du, dircolors and ls will use this implementation

@ackerleytng
Copy link
Contributor

Another implementation difference in ls: (uutils's coreutils does not implement FNM_PERIOD, which is used in ls)

When FNM_PERIOD is specified, wildcards will not match .

In ls, wildcards should not match .

$ ls -al
total 8
drwxr-xr-x  2 ackerleytng ackerleytng 4096 Jul 25 08:30 .
drwx------ 75 ackerleytng ackerleytng 4096 Jul 25 08:31 ..
-rw-r--r--  1 ackerleytng ackerleytng    0 Jul 25 08:30 .hidden.yml
$ ls -al --ignore '*ml'
total 8
drwxr-xr-x  2 ackerleytng ackerleytng 4096 Jul 25 08:30 .
drwx------ 75 ackerleytng ackerleytng 4096 Jul 25 08:31 ..
-rw-r--r--  1 ackerleytng ackerleytng    0 Jul 25 08:30 .hidden.yml
$ ~/projects/coreutils/target/debug/coreutils ls -al --ignore '*ml'
total 8
drwxr-xr-x  2 ackerleytng ackerleytng 4096 Jul 25 08:30 .
drwx------ 75 ackerleytng ackerleytng 4096 Jul 25 08:31 ..
$ ls -al --ignore '?hidden.yml'
total 8
drwxr-xr-x  2 ackerleytng ackerleytng 4096 Jul 25 08:30 .
drwx------ 75 ackerleytng ackerleytng 4096 Jul 25 08:36 ..
-rw-r--r--  1 ackerleytng ackerleytng    0 Jul 25 08:30 .hidden.yml
$ ~/projects/coreutils/target/debug/coreutils ls -al --ignore '?hidden.yml'
total 8
drwxr-xr-x  2 ackerleytng ackerleytng 4096 Jul 25 08:30 .
drwx------ 75 ackerleytng ackerleytng 4096 Jul 25 08:35 ..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants