-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extract: Proper parent directory modes #74
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR depends on #75. |
woky
force-pushed
the
pub/extract-proper-parent-modes
branch
from
June 22, 2023 04:44
2f54624
to
dccc993
Compare
woky
force-pushed
the
pub/extract-proper-parent-modes
branch
from
June 22, 2023 10:38
dccc993
to
a38c2a2
Compare
woky
force-pushed
the
pub/extract-proper-parent-modes
branch
from
August 29, 2023 16:04
a38c2a2
to
716a26f
Compare
woky
force-pushed
the
pub/extract-proper-parent-modes
branch
2 times, most recently
from
September 19, 2023 06:07
ae67d09
to
2874f12
Compare
woky
force-pushed
the
pub/extract-proper-parent-modes
branch
9 times, most recently
from
October 9, 2023 19:15
bc77cf5
to
08d8d56
Compare
Currently we only test with the embedded base-files files package. This quite limits what we can test. But since commit a3315e3 ("testutil/pkgdata: Create deb programatically") we have the ability to define our own custom package content. Add support for testing with custom packages. These can be defined in each test case per archive name. The content can be defined directly in test case definition with the help of testutil.MustMakeDeb(). The package content is wrapped in the testPackage structure. This introduces one level of nesting in each test case package definition. In future, we will add package metadata to the package definition, namely version. So wrapping the package content in the structure now and later just adding a new field to it the structure later will make the future diff much smaller.
At some point we will need to get information about (deb) packages from (Go) packages other than archive, namely from the slicer package. For instance: - Package entries in Chisel DB will include version, digest and architecture. - Multi-archive package selection will need to know versions of a package in all configured archives. Add Info() method to the Archive interface that returns a new PackageInfo interface that contains accessors for the underlying control.Section of the package. Currently these accessors are Name(), Version(), Arch() and SHA256() methods. The reason to introduce PackageInfo interface instead of returing control.Section directly is keep the archive abstraction and not leak implementation details.
woky
force-pushed
the
pub/extract-proper-parent-modes
branch
from
October 12, 2023 16:40
08d8d56
to
993ebb3
Compare
From the code: These functions exists because we work with slash terminated directory paths that come from deb package tarballs but standard library path functions treat slash terminated paths as unclean. We use ad-hoc code to make filepath.Dir() slash terminated in two places right now. For example this code targetDir := filepath.Dir(strings.TrimRight(targetPath, "/")) + "/" is not strictly correct as "/a/b/.///" will be turned into "/a/b/./" which is still "/a/b". Since we deal with slash terminated directory paths everywhere, create and use dedicated helper functions to process them.
The purpose of parent directory handling in deb/extract.go is to create parent directories of requested paths with the same attributes (only mode as of now) that appear in the package tarball. However, the current implementation is not correct when glob paths are requested. In what follows parent directory refers to a directory path that is not explicitly requested for extraction, but that is the parent of other paths that are requested for extraction, and so it is assumed to be implicitly requested for extraction. Currently, whether a package path should be extracted is determined by the shouldExtract() function that iterates over requested paths and for each checks whether it matches the package path if it's glob, or if it's non-glob, whether it equals the package path or whether some of its target paths have the package path as the parent. There are two problems with this implementation: 1) It only checks whether a package path is the parent of any target path of a requested non-glob path. It does not, and probably even cannot, check whether it is the parent of a requested glob path. 2) It iterates over the map of requested paths for every package path, even though for requested non-glob paths, it can match by directory lookup. And in each iteration, it checks whether a requested path is a glob by searching for wildcards in it. This commit addresses mainly the first problem, but it touches the second one as well. Track modes of directories as we encounter them in the tarball. Then, when creating a target path, create its missing parent directories with modes with which they were recorded in the tarball, or 0755 if they were not recorded yet. In the latter case, the directory mode is also tracked so that the directory can be recreated with the proper mode if we find it in the tarball later. This algorithm works for both glob and non-glob requested paths. Since the matching that was previously done in the shouldExtract() function is not used anymore, the function was removed. As part of this change, the requested glob paths are recorded before the extraction begins into the dedicated list which is scanned only when requested non-glob path lookup fails. We still match requested non-glob and glob paths separately. Ideally, we would use some kind of pattern matching on trees, something like a radix tree which also supports wildcards, but this commit is humble. One consequence of this change is that when an optional path that doesn't exist in the tarball is requested, its parent directories are not created (if they are not the parents of other requested paths). But this new behavior is arguably more natural than the old behavior where we created parent directories for non-existent paths, which seems to have been just an artifact of the implementation. Therefore, one test had to be changed for this behavior change. Since we do not allow optional paths to be defined in slices, this change is visible only to callers of the deb.Extract() function. In chisel, these callers are extract test and slicer. The latter depended on the old behavior to create parent directories for non-extracted content paths by including their direct parent directories in the requested paths. The behavior of chisel is preserved by changing slicer to include all, and not just direct, parent directories of non-extracted content paths in the requested paths.
woky
force-pushed
the
pub/extract-proper-parent-modes
branch
from
October 20, 2023 08:08
993ebb3
to
7569606
Compare
Per our offline discussion, I think our best way forward is to close this PR and discuss with @rebornplusplus to see if it is a priority right now, and how to amend the code here to make it compatible with the latest changes. We will take that discussion offline and we can always re-open the PRs or create new ones as we see fit. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The purpose of parent directory handling in deb/extract.go is to create
parent directories of requested paths with the same attributes (only
mode as of now) that appear in the package tarball. However, the current
implementation is not correct when glob paths are requested.
In what follows parent directory refers to a directory path that is not
explicitly requested for extraction, but that is the parent of other
paths that are requested for extraction, and so it is assumed to be
implicitly requested for extraction.
Currently, whether a package path should be extracted is determined by
the shouldExtract() function that iterates over requested paths and for
each checks whether it matches the package path if it's glob, or if it's
non-glob, whether it equals the package path or whether some of its
target paths have the package path as the parent.
There are two problems with this implementation:
It only checks whether a package path is the parent of any target
path of a requested non-glob path. It does not, and probably even
cannot, check whether it is the parent of a requested glob path.
It iterates over the map of requested paths for every package path,
even though for requested non-glob paths, it can match by directory
lookup. And in each iteration, it checks whether a requested path
is a glob by searching for wildcards in it.
This commit addresses mainly the first problem, but it touches the
second one as well.
Track modes of directories as we encounter them in the tarball. Then,
when creating a target path, create its missing parent directories with
modes with which they were recorded in the tarball, or 0755 if they were
not recorded yet. In the latter case, the directory mode is also tracked
so that the directory can be recreated with the proper mode if we find
it in the tarball later. This algorithm works for both glob and non-glob
requested paths.
Since the matching that was previously done in the shouldExtract()
function is not used anymore, the function was removed. As part of this
change, the requested glob paths are recorded before the extraction
begins into the dedicated list which is scanned only when requested
non-glob path lookup fails.
We still match requested non-glob and glob paths separately. Ideally, we
would use some kind of pattern matching on trees, something like a radix
tree which also supports wildcards, but this commit is humble.
One consequence of this change is that when an optional path that
doesn't exist in the tarball is requested, its parent directories are
not created (if they are not the parents of other requested paths). But
this new behavior is arguably more natural than the old behavior where
we created parent directories for non-existent paths, which seems to
have been just an artifact of the implementation. Therefore, one test
had to be changed for this behavior change.
Since we do not allow optional paths to be defined in slices, this
change is visible only to callers of the deb.Extract() function. In
chisel, these callers are extract test and slicer. The latter depended
on the old behavior to create parent directories for non-extracted
content paths by including their direct parent directories in the
requested paths. The behavior of chisel is preserved by changing slicer
to include all, and not just direct, parent directories of non-extracted
content paths in the requested paths.