Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report extraction problems from handlers/extractors #633

Merged
merged 9 commits into from
Aug 11, 2023
Merged

Conversation

e3krisztian
Copy link
Contributor

@e3krisztian e3krisztian commented Jul 28, 2023

Main changes are:

  • Extractor.extract can optionally return an ExtractionResult which has a reports attribute.
  • introduction of Sandbox, which prevents out of extraction directory file system object creation/reference and records any such attempts
  • support of problem reporting for tar, romfs, yaffs, ipkg, hdr, with most of them converted to use Sandbox

@e3krisztian e3krisztian force-pushed the extract-result branch 4 times, most recently from b3d0b91 to 3b5b2b3 Compare July 30, 2023 17:41
@e3krisztian e3krisztian requested a review from qkaiser July 30, 2023 17:41
@qkaiser
Copy link
Contributor

qkaiser commented Jul 31, 2023

I like the approach of Sandbox very much ! It makes writing secure unblob extractors easier for newcomers. Maybe use the name SandboxFS rather than Sandbox to convey the fact that it's about files rather than a generic sandbox with limits on execution / privileges / network.

Are there extractors other than tar, romfs, yaffs, ipkg, and hdr that could benefit from it ?

I'll do a deeper pass for review today.

@e3krisztian
Copy link
Contributor Author

Thanks, I hoped it will be useful later as well :)

This was the full evolution of the class name: ExtractionFS -> SandboxFS -> FSSandbox -> DirectorySandbox -> Sandbox.

I did not really like the FS suffix/prefix. I feel, that DirectorySandbox was the most descriptive of the bunch, but Sandbox was much shorter and won for me in the end as unique enough inside unblob.

I would prefer to keep the current Sandbox name, while the implementation is not something complex and generic, it is inside the helper module unblob.file_utils to show its scope, and if a more specific name is desired, it can be renamed on import.

Most of the Handlers use Command as extractor, either directly, or indirectly, or the extractor creates only a single output file with a deterministic name, thus not worth converting, so I think the list of converted extractors are exhaustive at the moment.

@e3krisztian e3krisztian force-pushed the extract-result branch 2 times, most recently from 2a749b3 to dfa6097 Compare July 31, 2023 17:47
Copy link
Contributor

@qkaiser qkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we all agree on the API of Sandbox, can you update the documentation as part of this PR ?

I would put it under "Writing extractors" in docs/development.md

unblob/handlers/filesystem/romfs.py Outdated Show resolved Hide resolved
unblob/handlers/filesystem/yaffs.py Outdated Show resolved Hide resolved
@martonilles
Copy link
Contributor

I like the Sandbox very much, not sure though about the name.

The other thing I was wondering about, what if we pass the Sandbox instance to the extractor instead of just the outdir? It would require api change, but might decrease boilerplate (not sure about CommandExtractor).

Also I was wondering if for the Sandbox apis do we want to pass absolute path or just a path within the sandbox? Maybe it would be easier if the extractor logic would not need to know about the outdir

@e3krisztian e3krisztian force-pushed the extract-result branch 3 times, most recently from 7931e03 to 55eb73a Compare August 3, 2023 10:08
@qkaiser qkaiser requested a review from martonilles August 3, 2023 13:49
@qkaiser
Copy link
Contributor

qkaiser commented Aug 3, 2023

The other thing I was wondering about, what if we pass the Sandbox instance to the extractor instead of just the outdir? It would require api change, but might decrease boilerplate (not sure about CommandExtractor).

I was not sure about this in the beginning since Sandbox would only be effective when the extractor does the extraction itself without relying on external tools or libraries. However, I like the idea of changing the API so that extractors receive a Sandbox instance, making it the only way for someone writing an extractor to interact with the extraction directory. Going into that direction, we need to define what is public/private within the Sandbox so that whenever someone does sandbox.outdir.whatever it's picked up and reported by pyright.

Of course CommandExtractor would be an exception to that since it would require access to outdir.

Also I was wondering if for the Sandbox apis do we want to pass absolute path or just a path within the sandbox? Maybe it would be easier if the extractor logic would not need to know about the outdir

We could do things so that all paths provided to the Sandbox APIs are considered relative to the extraction directory. If an absolute path is provided, its root should be considered to be the outdir.

What do you think ?

@qkaiser
Copy link
Contributor

qkaiser commented Aug 3, 2023

Just a side note, don't mind me. In the future, this Sandbox might be the place where we can sandbox external extractors using Landlock (#597). Instead of a general landlock rule, we could set one limited to outdir for the specific extraction command we run.

unblob/file_utils.py Outdated Show resolved Hide resolved
@e3krisztian
Copy link
Contributor Author

Thanks for the comments @qkaiser @martonilles and @vlaci.
I agree with all of your comments, and they will be implemented in this MR:

  • rename Sandbox to FileSystem
  • accept only relative paths (in contrast to the current absolute paths), that will be interpreted as paths relative to FileSystem.root

These proposals/ideas are nice, and will be/should be implemented later:

  • changing the Extractor.extract(self, inpath: Path, outdir: Path) signature
  • sandboxing third party extractors with Landlock

@e3krisztian e3krisztian marked this pull request as draft August 4, 2023 10:23
@e3krisztian e3krisztian force-pushed the extract-result branch 3 times, most recently from 3d8052a to 38764d0 Compare August 8, 2023 18:23
@e3krisztian e3krisztian marked this pull request as ready for review August 8, 2023 18:29
unblob/handlers/filesystem/romfs.py Outdated Show resolved Hide resolved
unblob/handlers/filesystem/romfs.py Outdated Show resolved Hide resolved
unblob/file_utils.py Show resolved Hide resolved
unblob/file_utils.py Show resolved Hide resolved
@qkaiser
Copy link
Contributor

qkaiser commented Aug 9, 2023

@e3krisztian would be nice to follow atomic commit message structure using feat, ci, doc, chore, fix, refactor to indicate the type of operation and the affected module (docs, handler, processing, ...) between parenthesis. This greatly helps when looking for specific changes later on.

@e3krisztian e3krisztian force-pushed the extract-result branch 2 times, most recently from 128610f to 3138c8b Compare August 10, 2023 15:05
@qkaiser qkaiser self-requested a review August 11, 2023 07:29
unblob/file_utils.py Show resolved Hide resolved
tests/test_file_utils.py Show resolved Hide resolved
vlaci and others added 9 commits August 11, 2023 15:19
…late in Extractors

For every handler every file system operation needs to be checked
for potential path traversal, attempt to create devices, and the ones
that could be problematic should be prevented and reported.

This was managed so far individually in ever handler, which resulted
in reporting the same problems somewhat differently and allowed for
potential bugs in implementation.

The FileSystem helper class introduced here should take care of these
repetitive tasks and hopefully could make for more succinct, yet safer
Handler/Extractor implementations.
In the newer version absolute symlinks are no longer dropped as path
traversals, but changed to point within the extraction directory.
@qkaiser qkaiser enabled auto-merge August 11, 2023 13:20
@qkaiser qkaiser merged commit f2dd8e7 into main Aug 11, 2023
@qkaiser qkaiser deleted the extract-result branch August 11, 2023 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants