Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support symlinks when constructing a Merkle tree. #229

Merged
merged 14 commits into from
Nov 18, 2020
Merged

Support symlinks when constructing a Merkle tree. #229

merged 14 commits into from
Nov 18, 2020

Conversation

k-ye
Copy link
Contributor

@k-ye k-ye commented Nov 5, 2020

Added TreeSymlinkOpts to control the behavior of tree construction.

To be backward compatible, symlink is preserved only when TreeSymlinkOpts.Preserved is explicitly set to true. By default, symlinks are converted into files.

Issue: #146

@google-cla google-cla bot added the cla: yes The author signed a CLA label Nov 5, 2020
go/pkg/tree/tree.go Outdated Show resolved Hide resolved
go/pkg/tree/tree.go Outdated Show resolved Hide resolved
// be distinguished from the normal files.
DistinguishSymlinkFromFile bool
// If true, the dangling symlinks will not be treated as an error.
AllowDanglingSymlink bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to remove this and allow dangling symlink when symlink is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because the default client behavior is to treat this as an error, and we want backward compatibility?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling symlink is new behavior, so I think there is no previous behavior we want to keep compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these two are independent (i.e. it's fine if one enables AllowDanglingSymlink but not EnableSymlink). There is a check to return error for dangling symlinks:

if smd := meta.Symlink; smd != nil && smd.IsDangling {
return nil
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but I still think it is better to make it small where user can configure. Adding AllowDanglingSymlink later is easy if we found it is necessary, but removing is hard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to the functional pattern. Meanwhile I also removed the config for the dangling symlink

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #188 (comment) I think the SDK team wants the handling of symlink and allowing dangling symlinks to be separate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd prefer to have these as separate options, to preserve backward compatibility. Also, make sure that the default options are clear and initialized correctly (AllowDanglingSymlink=true).
Also, I'd actually prefer the introduction of the options and the implementation to be in the same PR, it's easier to review that way IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: now that tree is a part of Client after a recent PR, it would make a lot of sense to me to just make both of these Client Opts instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

go/pkg/tree/tree.go Outdated Show resolved Hide resolved
@k-ye k-ye changed the title Add tree.ConstructTreeConfig to control how symlinks can be handled Add tree.ConstructOption to control if symlinks is distinguished from normal files Nov 6, 2020
@k-ye k-ye changed the title Add tree.ConstructOption to control if symlinks is distinguished from normal files Add tree.ConstructOption to control how symlinks are handled Nov 6, 2020
@k-ye k-ye requested a review from atetubou November 6, 2020 02:31
@atetubou
Copy link
Collaborator

atetubou commented Nov 6, 2020

Could you add code using the config as well, at least for upload part? I think it is better to have one PR for that.

go/pkg/tree/tree.go Outdated Show resolved Hide resolved
go/pkg/tree/tree.go Outdated Show resolved Hide resolved
@@ -89,6 +110,16 @@ func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs
},
}
return nil
} else if t == command.SymlinkInputType {
fs[path] = &fileSysNode{
// TODO: How to handle a target with absolute path?
Copy link
Contributor Author

@k-ye k-ye Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Here says that an absolute path target can be supported. Do you think we still need to check if the target is within exec root or not?
  2. Also, I made the symlink handling "dumb" in the sense that we don't recursively invoke loadFiles on its target. Instead, if users want the target to be included, they need to be explicit about that in the input spec. I think this simplifies the implementation, especially if the target is an absolute path out of the exec root. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to keep symlink as-is here, as I don't remember we have the case handling in tree absolute path symlink.
I'm fine with 2 too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can error out on absolute path symlinks for now, and support them later if needed. However, traversing them is necessary. The users specify an input spec such as foo/,bar/, and it's our responsibility to decide what exactly we should upload based on that. If foo/bla is a directory symlink to /baz, we need to traverse the target to load all the files there into the Merkle tree, otherwise the symlink will be dangling on the bot.
Note that relative path symlinks that escape the input root are an error as well.

@k-ye k-ye changed the title Add tree.ConstructOption to control how symlinks are handled Support symlinks when constructing a Merkle tree. Nov 10, 2020
@k-ye k-ye requested a review from gkousik November 10, 2020 05:54
@k-ye k-ye requested a review from atetubou November 10, 2020 11:30
UnifiedCASOps UnifiedCASOps
UnifiedCASOps UnifiedCASOps
// TreeSymlinkOpts controls how symlinks are handled when constructing a tree.
TreeSymlinkOpts TreeSymlinkOpts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compatibility nit: we keep/pass all structs in this repo by pointer, not value (with the noted exception of Digest).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see, done.

@@ -170,6 +172,12 @@ func (s UnifiedCASOps) Apply(c *Client) {
c.UnifiedCASOps = s
}

func (o *TreeSymlinkOpts) Apply(c *Client) {
if o != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check needs to be removed once c.TreeSymlinkOpts is a pointer and nil is a legal value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -65,17 +80,27 @@ func shouldIgnore(inp string, t command.InputType, excl []*command.InputExclusio

// loadFiles reads all files specified by the given InputSpec (descending into subdirectories
// recursively), and loads their contents into the provided map.
func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs map[string]*fileSysNode, chunkSize int, cache filemetadata.Cache) error {
func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs map[string]*fileSysNode, chunkSize int, cache filemetadata.Cache, opts TreeSymlinkOpts) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, pass by pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -89,6 +110,16 @@ func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs
},
}
return nil
} else if t == command.SymlinkInputType {
fs[path] = &fileSysNode{
// TODO: How to handle a target with absolute path?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can error out on absolute path symlinks for now, and support them later if needed. However, traversing them is necessary. The users specify an input spec such as foo/,bar/, and it's our responsibility to decide what exactly we should upload based on that. If foo/bla is a directory symlink to /baz, we need to traverse the target to load all the files there into the Merkle tree, otherwise the symlink will be dangling on the bot.
Note that relative path symlinks that escape the input root are an error as well.

Copy link
Contributor Author

@k-ye k-ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can error out on absolute path symlinks for now, and support them later if needed. However, traversing them is necessary... (#229 (comment))

Ack, I've decided to handle this once and for all. I will check if target is under execRoot, regardless if it's an absolute path or not (see getTargetRelPath() and preprocessSymlink()):

  • Escapes execRoot: report this as an error.
  • Inside execRoot: we will traverse the target recursively, provided that it is not dangling.

It's not very easy to set up the test for this check in TestComputeMerkleTree, because of the combination of absolute target path, dangling symlinks, etc. So I added a tree_whitelist_test.go just to verify getTargetRelPath().

More about the tests: Since it is possible now to accept a symlink to an absolute path target, I had to add an execRootPlaceholder and substitute it with the actual root path when defining the symlink nodes in repb.Directory. Let me know if you think this is a bit ad-hoc..

// under execRoot, so this should never fail.
relTarget, err := getTargetRelPath(execRoot, path, meta.Symlink.Target)
if err != nil {
panic(fmt.Sprintf("Invalid target=%q err=%v", relTarget, err))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used panic here instead of returning err, because I think this is an internal invariant. If we ever reach here, it means there's a bug in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow the guildeline of "don't panic" in Go code. In particular, it makes sense here because this is a library, and never want to crash the calling executable. If we have a bug, we should surface an error message that will make it easy to catch and fix it.

go/pkg/client/tree.go Outdated Show resolved Hide resolved
go/pkg/client/tree.go Outdated Show resolved Hide resolved
@@ -44,6 +48,16 @@ func mustMarshal(p proto.Message) []byte {
return b
}

func prependExecRootPlaceholder(target string) string {
return fmt.Sprintf("%s/%s", execRootPlaceholder, target)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use filepath.Join or path.Join?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used in SymlinkNode, where the API says the target path is always separated by /: https://github.com/bazelbuild/remote-apis/blob/aa8e718768c2973274c2a23e9dcc27b42afea794/build/bazel/remote/execution/v2/remote_execution.proto#L798-L807

That said, I noticed that the existing tests's inputPath are also constructed with /, instead of filepath.Join. I wonder why this is not a problem on Windows..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this repo doesn't have Windows coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i thought cas already worked on Win?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo works on Windows -- there are some tests that are Windows or Linux specific (see e.g. cache_posix_test.go https://github.com/bazelbuild/remote-apis-sdks/blob/master/go/pkg/filemetadata/BUILD.bazel), but this test should work on both.

tests := []struct {
desc string
target string
isError bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think wantErr is more common in Go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, done

@k-ye
Copy link
Contributor Author

k-ye commented Nov 17, 2020

rebased

}

// NewTreeSymlinkOpts returns a default NewTreeSymlinkOpts object.
func NewTreeSymlinkOpts() *TreeSymlinkOpts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DefaultTreeSymlinkOpts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

go/pkg/client/tree.go Show resolved Hide resolved
// under execRoot, so this should never fail.
relTarget, err := getTargetRelPath(execRoot, path, meta.Symlink.Target)
if err != nil {
panic(fmt.Sprintf("Invalid target=%q err=%v", relTarget, err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We follow the guildeline of "don't panic" in Go code. In particular, it makes sense here because this is a library, and never want to crash the calling executable. If we have a bug, we should surface an error message that will make it easy to catch and fix it.

go/pkg/client/tree.go Show resolved Hide resolved
@@ -44,6 +48,16 @@ func mustMarshal(p proto.Message) []byte {
return b
}

func prependExecRootPlaceholder(target string) string {
return fmt.Sprintf("%s/%s", execRootPlaceholder, target)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repo works on Windows -- there are some tests that are Windows or Linux specific (see e.g. cache_posix_test.go https://github.com/bazelbuild/remote-apis-sdks/blob/master/go/pkg/filemetadata/BUILD.bazel), but this test should work on both.

@@ -89,6 +164,20 @@ func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs
},
}
return nil
} else if t == command.SymlinkInputType {
fs[path] = &fileSysNode{
symlink: &symlinkNode{target: meta.Symlink.Target},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a problem here -- if the symlink target is absolute (and we have Preserve=true), then even when it is actually under the exec root we cannot preserve it as is, because that would fail on the bot (because the bot will map the exec root to a different absolute path).
In this edge case, I think that we will need to convert the symlink target to a relative path instead.

Alternatively, we could disallow absolute symlink targets in general when Preserve=true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I suspect that when you fix this, you will no longer need the exec root placeholder workaround in tests.
But if you do want to test absolute symlinks, I think the IsAbsolute attribute of inputPath already does what you want, you don't need the exec root placeholder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Yeah the test is simplified..

I think the IsAbsolute attribute of inputPath already does what you want, you don't need the exec root placeholder.

Note that IsAbsolute only affects how the test data are constructed in the filesystem, but doesn't apply to the SymlinkNode protos. Anyway, the placeholder is removed now.

InputDirectories: 2,
InputFiles: 1,
InputSymlinks: 1,
TotalInputBytes: fooDg.Size + fooDirDg.Size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems low... maybe not counting the symlink nodes? Ditto below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you mean the TotalInputBytes should count the bytes for symlinks? I didn't do that because 1) the symlink node itself isn't associated with a Digest, and 2) will counting in both the symlink and the target result in double counting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I meant that it should count the size of the Directory protos correctly; since these include new SymlinkNodes, the size should change. However, I now looked at the code and I fail to understand why did I write it the way it is to begin with; now that I think about it, it would make a lot more sense to make TotalInputBytes simply equal to the sum of all the digest sizes in all the uploaded chunkers.
So please leave it alone, I'll fix it myself in a later CL. Thank you!

go/pkg/client/tree.go Show resolved Hide resolved
InputDirectories: 2,
InputFiles: 1,
InputSymlinks: 1,
TotalInputBytes: fooDg.Size + fooDirDg.Size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I meant that it should count the size of the Directory protos correctly; since these include new SymlinkNodes, the size should change. However, I now looked at the code and I fail to understand why did I write it the way it is to begin with; now that I think about it, it would make a lot more sense to make TotalInputBytes simply equal to the sum of all the digest sizes in all the uploaded chunkers.
So please leave it alone, I'll fix it myself in a later CL. Thank you!

if err != nil {
return err
}
fs[path] = &fileSysNode{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs[normPath] in order to not reintroduce the bug fixed by #225.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

symlink: &symlinkNode{target: relTarget},
}
if meta.Symlink.IsDangling || !opts.FollowsTarget {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one thing I wanted to check: is it possible to create dangling symlinks on Windows? I seem to recall there were some problems with that.

Copy link
Contributor Author

@k-ye k-ye Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i don't have a Win at this moment to test this. But i think the test case has been creating dangling symlinks for quite a while. Have you noticed that it has failed on Win?

if err := os.Symlink(target, path); err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right -- I guess that does work and I misremembered. Thank you!

@@ -570,6 +687,72 @@ func TestComputeMerkleTree(t *testing.T) {
TotalInputBytes: fooDg.Size + fooDirDg.Size + barDg.Size + barDirDg.Size,
},
},
{
desc: "Directory absoluate symlink (preserved)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: absolute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

symlink: &symlinkNode{target: relTarget},
}
if meta.Symlink.IsDangling || !opts.FollowsTarget {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right -- I guess that does work and I misremembered. Thank you!

@ola-rozenfeld ola-rozenfeld merged commit b51fd53 into bazelbuild:master Nov 18, 2020
@k-ye k-ye deleted the tree-config branch November 18, 2020 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes The author signed a CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants