Skip to content

Commit

Permalink
extract: Proper parent directory modes
Browse files Browse the repository at this point in the history
The purpose of parent directory handling in deb/extract.go is to create
parent directories of requested paths with the same attributes (only
mode as of now) that appear in the package tarball. However, the current
implementation is not correct when glob paths are requested.

In what follows parent directory refers to a directory path that is not
explicitly requested for extraction, but that is the parent of other
paths that are requested for extraction, and so it is assumed to be
implicitly requested for extraction.

Currently, whether a package path should be extracted is determined by
the shouldExtract() function that iterates over requested paths and for
each checks whether it matches the package path if it's glob, or if it's
non-glob, whether it equals the package path or whether some of its
target paths have the package path as the parent.

There are two problems with this implementation:

  1) It only checks whether a package path is the parent of any target
     path of a requested non-glob path. It does not, and probably even
     cannot, check whether it is the parent of a requested glob path.

  2) It iterates over the map of requested paths for every package path,
     even though for requested non-glob paths, it can match by directory
     lookup. And in each iteration, it checks whether a requested path
     is a glob by searching for wildcards in it.

This commit addresses mainly the first problem, but it touches the
second one as well.

Track modes of directories as we encounter them in the tarball. Then,
when creating a target path, create its missing parent directories with
modes with which they were recorded in the tarball, or 0755 if they were
not recorded yet. In the latter case, the directory mode is also tracked
so that the directory can be recreated with the proper mode if we find
it in the tarball later. This algorithm works for both glob and non-glob
requested paths.

Since the matching that was previously done in the shouldExtract()
function is not used anymore, the function was removed. As part of this
change, the requested glob paths are recorded before the extraction
begins into the dedicated list which is scanned only when requested
non-glob path lookup fails.

We still match requested non-glob and glob paths separately. Ideally, we
would use some kind of pattern matching on trees, something like a radix
tree which also supports wildcards, but this commit is humble.

One consequence of this change is that when an optional path that
doesn't exist in the tarball is requested, its parent directories are
not created (if they are not the parents of other requested paths). But
this new behavior is arguably more natural than the old behavior where
we created parent directories for non-existent paths, which seems to
have been just an artifact of the implementation. Therefore, one test
had to be changed for this behavior change.

Since we do not allow optional paths to be defined in slices, this
change is visible only to callers of the deb.Extract() function. In
chisel, these callers are extract test and slicer. The latter depended
on the old behavior to create parent directories for non-extracted
content paths by including their direct parent directories in the
requested paths. The behavior of chisel is preserved by changing slicer
to include all, and not just direct, parent directories of non-extracted
content paths in the requested paths.
  • Loading branch information
woky committed Jun 21, 2023
1 parent 67c8f68 commit 2f54624
Show file tree
Hide file tree
Showing 3 changed files with 239 additions and 68 deletions.
153 changes: 96 additions & 57 deletions internal/deb/extract.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (
"compress/gzip"
"fmt"
"io"
"io/fs"
"io/ioutil"
"os"
"path/filepath"
Expand Down Expand Up @@ -110,43 +111,35 @@ func extractData(dataReader io.Reader, options *ExtractOptions) error {
syscall.Umask(oldUmask)
}()

shouldExtract := func(pkgPath string) (globPath string, ok bool) {
if pkgPath == "" {
return "", false
}
pkgPathIsDir := pkgPath[len(pkgPath)-1] == '/'
for extractPath, extractInfos := range options.Extract {
if extractPath == "" {
continue
}
switch {
case strings.ContainsAny(extractPath, "*?"):
if strdist.GlobPath(extractPath, pkgPath) {
return extractPath, true
}
case extractPath == pkgPath:
return "", true
case pkgPathIsDir:
for _, extractInfo := range extractInfos {
if strings.HasPrefix(extractInfo.Path, pkgPath) {
return "", true
}
}
}
}
return "", false
}

pendingPaths := make(map[string]bool)
var globs []string
for extractPath, extractInfos := range options.Extract {
for _, extractInfo := range extractInfos {
if !extractInfo.Optional {
pendingPaths[extractPath] = true
break
}
}
if strings.ContainsAny(extractPath, "*?") {
globs = append(globs, extractPath)
}
}

// dirInfo represents a directory that we
// a) encountered in the tarball,
// b) created, or
// c) both a) and c).
type dirInfo struct {
// mode of the directory with which we
// a) encountered it in the tarball, or
// b) created it
mode fs.FileMode
// whether we created this directory
created bool
}
// directories we encountered and/or created
dirInfos := make(map[string]dirInfo)

tarReader := tar.NewReader(dataReader)
for {
tarHeader, err := tarReader.Next()
Expand All @@ -162,44 +155,83 @@ func extractData(dataReader io.Reader, options *ExtractOptions) error {
continue
}
sourcePath = sourcePath[1:]
globPath, ok := shouldExtract(sourcePath)
if !ok {
continue
sourceMode := tarHeader.FileInfo().Mode()

globPath := ""
extractInfos := options.Extract[sourcePath]

if extractInfos == nil {
for _, glob := range globs {
if strdist.GlobPath(glob, sourcePath) {
globPath = glob
extractInfos = []ExtractInfo{{Path: glob}}
break
}
}
}

sourceIsDir := sourcePath[len(sourcePath)-1] == '/'
// Is this a directory path that was not requested?
if extractInfos == nil && sourceMode.IsDir() {
if info := dirInfos[sourcePath]; info.mode != sourceMode {
// We have not seen this directory yet, or we
// have seen or created it with a different mode
// before. Record the source path mode.
info.mode = sourceMode
dirInfos[sourcePath] = info
if info.created {
// We have created this directory before
// with a different mode. Create it
// again with the proper mode.
extractInfos = []ExtractInfo{{Path: sourcePath}}
}
}
}

//debugf("Extracting header: %#v", tarHeader)
if extractInfos == nil {
continue
}

var extractInfos []ExtractInfo
if globPath != "" {
extractInfos = options.Extract[globPath]
delete(pendingPaths, globPath)
if options.Globbed != nil {
options.Globbed[globPath] = append(options.Globbed[globPath], sourcePath)
}
delete(pendingPaths, globPath)
} else {
extractInfos, ok = options.Extract[sourcePath]
if ok {
delete(pendingPaths, sourcePath)
} else {
// Base directory for extracted content. Relevant mainly to preserve
// the metadata, since the extracted content itself will also create
// any missing directories unaccounted for in the options.
err := fsutil.Create(&fsutil.CreateOptions{
Path: filepath.Join(options.TargetDir, sourcePath),
Mode: tarHeader.FileInfo().Mode(),
Dirs: true,
})
if err != nil {
return err
}
continue
delete(pendingPaths, sourcePath)
}

// createParents creates missing parent directories of the path
// with modes with which they were encountered in the tarball or
// 0755 if they were not encountered yet.
var createParents func(path string) error
createParents = func(path string) error {
dir := fsutil.Dir(path)
if dir == "/" {
return nil
}
info := dirInfos[dir]
if info.created {
return nil
} else if info.mode == 0 {
info.mode = fs.ModeDir | 0755
}
if err := createParents(dir); err != nil {
return err
}
create := fsutil.CreateOptions{
Path: filepath.Join(options.TargetDir, dir),
Mode: info.mode,
}
if err := fsutil.Create(&create); err != nil {
return err
}
info.created = true
dirInfos[dir] = info
return nil
}

var contentCache []byte
var contentIsCached = len(extractInfos) > 1 && !sourceIsDir && globPath == ""
var contentIsCached = len(extractInfos) > 1 && sourceMode.IsRegular() && globPath == ""
if contentIsCached {
// Read and cache the content so it may be reused.
// As an alternative, to avoid having an entire file in
Expand All @@ -220,23 +252,30 @@ func extractData(dataReader io.Reader, options *ExtractOptions) error {
}
var targetPath string
if globPath == "" {
targetPath = filepath.Join(options.TargetDir, extractInfo.Path)
targetPath = extractInfo.Path
} else {
targetPath = filepath.Join(options.TargetDir, sourcePath)
targetPath = sourcePath
}
if err := createParents(targetPath); err != nil {
return err
}
if extractInfo.Mode != 0 {
tarHeader.Mode = int64(extractInfo.Mode)
}
fsMode := tarHeader.FileInfo().Mode()
err := fsutil.Create(&fsutil.CreateOptions{
Path: targetPath,
Mode: tarHeader.FileInfo().Mode(),
Path: filepath.Join(options.TargetDir, targetPath),
Mode: fsMode,
Data: pathReader,
Link: tarHeader.Linkname,
Dirs: true,
})
if err != nil {
return err
}
if fsMode.IsDir() {
// Record the target directory mode.
dirInfos[targetPath] = dirInfo{fsMode, true}
}
if globPath != "" {
break
}
Expand Down
141 changes: 137 additions & 4 deletions internal/deb/extract_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ import (
"github.com/canonical/chisel/internal/testutil"
)

var (
REG = testutil.REG
DIR = testutil.DIR
LNK = testutil.LNK
)

type extractTest struct {
summary string
pkgdata []byte
Expand Down Expand Up @@ -260,10 +266,7 @@ var extractTests = []extractTest{{
},
},
result: map[string]string{
"/etc/": "dir 0755",
"/usr/": "dir 0755",
"/usr/bin/": "dir 0755",
"/tmp/": "dir 01777",
"/etc/": "dir 0755",
},
}, {
summary: "Optional entries mixed in cannot be missing",
Expand All @@ -280,6 +283,136 @@ var extractTests = []extractTest{{
},
},
error: `cannot extract from package "base-files": no content at /usr/bin/hallo`,
}, {
summary: "Implicit parent directories",
pkgdata: testutil.MakeTestDeb([]testutil.TarEntry{
DIR(0701, "./a/"),
DIR(0702, "./a/b/"),
REG(0601, "./a/b/c", ""),
}),
options: deb.ExtractOptions{
Extract: map[string][]deb.ExtractInfo{
"/a/b/c": []deb.ExtractInfo{{Path: "/a/b/c"}},
},
},
result: map[string]string{
"/a/": "dir 0701",
"/a/b/": "dir 0702",
"/a/b/c": "file 0601 empty",
},
}, {
summary: "Implicit parent directories with different target path",
pkgdata: testutil.MakeTestDeb([]testutil.TarEntry{
DIR(0701, "./a/"),
DIR(0702, "./b/"),
REG(0601, "./b/x", "shark"),
DIR(0703, "./c/"),
REG(0602, "./c/y", "octopus"),
DIR(0704, "./d/"),
}),
options: deb.ExtractOptions{
Extract: map[string][]deb.ExtractInfo{
"/b/x": []deb.ExtractInfo{{Path: "/a/x"}},
"/c/y": []deb.ExtractInfo{{Path: "/d/y"}},
},
},
result: map[string]string{
"/a/": "dir 0701",
"/a/x": "file 0601 31fc1594",
"/d/": "dir 0704",
"/d/y": "file 0602 5633c9b8",
},
}, {
summary: "Implicit parent directories with a glob",
pkgdata: testutil.MakeTestDeb([]testutil.TarEntry{
DIR(0701, "./a/"),
DIR(0702, "./a/aa/"),
DIR(0703, "./a/aa/aaa/"),
REG(0601, "./a/aa/aaa/ffff", ""),
}),
options: deb.ExtractOptions{
Extract: map[string][]deb.ExtractInfo{
"/a/aa/a**": []deb.ExtractInfo{{
Path: "/a/aa/a**",
}},
},
},
result: map[string]string{
"/a/": "dir 0701",
"/a/aa/": "dir 0702",
"/a/aa/aaa/": "dir 0703",
"/a/aa/aaa/ffff": "file 0601 empty",
},
}, {
summary: "Implicit parent directories with a glob and non-sorted tarball",
pkgdata: testutil.MakeTestDeb([]testutil.TarEntry{
REG(0601, "./a/b/c/d", ""),
DIR(0702, "./a/b/"),
DIR(0703, "./a/b/c/"),
DIR(0701, "./a/"),
}),
options: deb.ExtractOptions{
Extract: map[string][]deb.ExtractInfo{
"/a/b/c/*": []deb.ExtractInfo{{
Path: "/a/b/c/*",
}},
},
},
result: map[string]string{
"/a/": "dir 0701",
"/a/b/": "dir 0702",
"/a/b/c/": "dir 0703",
"/a/b/c/d": "file 0601 empty",
},
}, {
summary: "Implicit parent directories with a glob and some parents missing in the tarball",
pkgdata: testutil.MakeTestDeb([]testutil.TarEntry{
REG(0601, "./a/b/c/d", ""),
DIR(0702, "./a/b/"),
}),
options: deb.ExtractOptions{
Extract: map[string][]deb.ExtractInfo{
"/a/b/c/*": []deb.ExtractInfo{{
Path: "/a/b/c/*",
}},
},
},
result: map[string]string{
"/a/": "dir 0755",
"/a/b/": "dir 0702",
"/a/b/c/": "dir 0755",
"/a/b/c/d": "file 0601 empty",
},
}, {
summary: "Implicit parent directories with copied dirs and different modes",
pkgdata: testutil.MakeTestDeb([]testutil.TarEntry{
DIR(0701, "./a/"),
DIR(0702, "./a/b/"),
DIR(0703, "./a/b/c/"),
REG(0601, "./a/b/c/d", ""),
DIR(0704, "./e/"),
DIR(0705, "./e/f/"),
}),
options: deb.ExtractOptions{
Extract: map[string][]deb.ExtractInfo{
"/a/b/**": []deb.ExtractInfo{{
Path: "/a/b/**",
}},
"/e/f/": []deb.ExtractInfo{{
Path: "/a/",
}},
"/e/": []deb.ExtractInfo{{
Path: "/a/b/c/",
Mode: 0706,
}},
},
},
result: map[string]string{
"/a/": "dir 0705",
"/a/b/": "dir 0702",
"/a/b/c/": "dir 0706",
"/a/b/c/d": "file 0601 empty",
},
}}

func (s *S) TestExtract(c *C) {
Expand Down
13 changes: 6 additions & 7 deletions internal/slicer/slicer.go
Original file line number Diff line number Diff line change
Expand Up @@ -107,14 +107,13 @@ func Run(options *RunOptions) error {
hasCopyright = true
}
} else {
targetDir := fsutil.Dir(targetPath)
if targetDir == "" || targetDir == "/" {
continue
parent := fsutil.Dir(targetPath)
for ; parent != "/"; parent = fsutil.Dir(parent) {
extractPackage[parent] = append(extractPackage[parent], deb.ExtractInfo{
Path: parent,
Optional: true,
})
}
extractPackage[targetDir] = append(extractPackage[targetDir], deb.ExtractInfo{
Path: targetDir,
Optional: true,
})
}
}
if !hasCopyright {
Expand Down

0 comments on commit 2f54624

Please sign in to comment.