Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(handler): add multi-part gzip handler. #689

Merged
merged 2 commits into from
Jan 4, 2024
Merged

Conversation

qkaiser
Copy link
Contributor

@qkaiser qkaiser commented Dec 16, 2023

It's possible to create multi-part gzip with split, which will create multiple gzip compressed files with a 'aa', 'ab', 'ac', .. suffix.

We match on .gz.aa in a directory, get all the files with same name but different suffix, order them and feed them to 7z.

This is very close to what we were already doing with multi-part 7zip archives.

@qkaiser qkaiser added enhancement New feature or request format:compression labels Dec 16, 2023
@qkaiser qkaiser self-assigned this Dec 16, 2023
@qkaiser qkaiser force-pushed the multi-gzip-handler branch 2 times, most recently from a29cedd to 59e1c99 Compare January 2, 2024 17:40
@qkaiser qkaiser force-pushed the multi-gzip-handler branch from 59e1c99 to 5ece896 Compare January 3, 2024 15:41
@qkaiser qkaiser requested a review from e3krisztian January 3, 2024 15:41
@qkaiser qkaiser force-pushed the multi-gzip-handler branch from 5ece896 to 9725c7c Compare January 3, 2024 16:26
@qkaiser
Copy link
Contributor Author

qkaiser commented Jan 4, 2024

Copying the exchange I had with @e3krisztian outside Github:

@e3krisztian:

I saw the gzip changes, let's talk about it tomorrow.
The problem is with these output files:
tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz.aa_extract/one.txt.aa
tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz.aa_extract/one.txt.ab
...
they should not be like this - it is not a multifile this way.
The expected output would be tests/integration/compression/gzip/__output__/multi-volume-split-then-gzip.tar_extract/one.txt.gz_extract/one.txt

@qkaiser:

I looked at it and here's the problem:

  • we provide the first path to 7z through MultiFileCommand, this is because 7z is smart enough to detect multi-volume archives or compressed streams
  • if we wanted to decompress "split then compressed" multi-volumes, we can provide a wildcard to 7z by adapting MultiFileCommand so that it runs something like this: 7z x -p -y 'mv.7z*' -o/tmp/out
  • the problem with this wildcard approach is that it blocks 7z from working with legit multi-volume archives, because it will consider each matching file independently if we provide a wildcard

So we can't have both. I think having split then compressed multi-volume is an edge case and should only be handled when we observe one. Until then it will still be handled, but each file decompressed independently, without causing issues.

@qkaiser qkaiser force-pushed the multi-gzip-handler branch from 9725c7c to a9e7598 Compare January 4, 2024 13:37
qkaiser and others added 2 commits January 4, 2024 15:57
It's possible to create multi-part gzip with 'split', which will create
multiple gzip compressed files with a 'aa', 'ab', 'ac', .. suffix.

We match on '.gz.aa' in a directory, get all the files with same name
but different suffix, order them and feed them to 7z.

This is very close to what we were already doing with multi-part 7zip
archives.

Co-authored-by: Krisztián Fekete <[email protected]>
@qkaiser qkaiser force-pushed the multi-gzip-handler branch from a9e7598 to 47b2fac Compare January 4, 2024 14:57
@qkaiser qkaiser enabled auto-merge January 4, 2024 14:58
@qkaiser qkaiser merged commit b9c5b38 into main Jan 4, 2024
15 checks passed
@qkaiser qkaiser deleted the multi-gzip-handler branch January 4, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request format:compression
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants