-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report carve dir #1017
Report carve dir #1017
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I would maybe add some tests where we set the ExtractionConfig
to use non-standard carve_suffix
and extract_suffix
? See if we can spot abnormal behavior that would appear out of assumption made on the default suffix values.
e89ad23
to
2767e86
Compare
@e3krisztian cool ! I'll wait for the tests to land before I do a final review round. |
@e3krisztian two small comments. I think we can rebase and merge when it's cleared. |
Current devenv config has problems on at least Ubuntu 22.04. .envrc.user with the below content enables work without devenv: layout_poetry() { # create venv if it doesn't exist poetry run true export VIRTUAL_ENV=$(poetry env info --path) export POETRY_ACTIVE=1 PATH_add "$VIRTUAL_ENV/bin" } layout_poetry export SKIP=nixpkgs-fmt export UNBLOB_USE_DEVENV=false
This removes the burden of carving from already complex function _extract_chunks and also allowed for some better variable names.
Carve directories were hard to explain, as they look like extraction directories and there was no public information to tell them apart. Adding this report makes the purpose of the directory visible.
`_FileTask.carve_dir` was initially used for both extraction and carving. The naming of the directories can now differ, so it is not used anymore apart from an existence check, which would terminate this branch of the extraction. This output directory existence check is now present in both the carving and extraction paths, and the output report's name is also renamed, to accommodate both types of output directories. `ExtractDirectoryExistsReport` was generalized to `OutputDirectoryExistsReport` instead of introducing yet another `Report` type - `CarveDirectoryExistsReport`.
Chunk statistics require a divide by total chunk size, which can be 0 in certain rare cases. This makes chunk related output is conditional, and not part of the summary. An example command line sequence which leads to a silent failure: (echo a; gzip < README.md ; echo b) > fw unblob fw # the next command would silently fail: unblob fw
With the separation of carve and extract directories, the output directory become dependent on the *content* of the input file: if it has multiple chunks, because it is not covered by a single handler the output directory will be generated as a *carve* directory, otherwise as an *extract* directory.
The output path is printed in the previous commit, so depending on the caller having to look at well known paths is no longer needed.
The test files were created with this script: # cd tests/files/suffixes # clean rm -rf chunks_carve/ extractions/ collisions.zip __input__ __outputs__ # reproduce output mkdir __input__ __outputs__ seq 100 | gzip > 0-160.gzip seq 128 | gzip > 160-375.gzip dd if=/dev/zero of=375-512.padding bs=1 count=137 cat 0-160.gzip 160-375.gzip 375-512.padding > __input__/chunks unblob --carve-suffix _carve chunks cp 0-160.gzip chunks_carve/ echo something else > chunks_carve/0-160.gzip_extract/gzip.uncompressed zip __input__/collisions.zip chunks chunks_carve/0-160.gzip chunks_carve/0-160.gzip_extract/gzip.uncompressed rm 0-160.gzip 160-375.gzip 375-512.padding rm -rf chunks_carve for input in collisions.zip chunks do unblob \ -e __outputs__/$input/defaults/ __input__/$input unblob --carve-suffix _carve \ -e __outputs__/$input/_carve_extract/ __input__/$input unblob --carve-suffix _c --extract-suffix _e \ -e __outputs__/$input/_c_e/ __input__/$input done
711e3d2
to
4f1bd85
Compare
Rebased and resolved raised problems in tests. |
Reworked #891 to report carve dirs instead of the carved files, as well as support different suffixes for carve and extraction directories.