-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support symlinks when constructing a Merkle tree. #229
Conversation
go/pkg/tree/tree.go
Outdated
// be distinguished from the normal files. | ||
DistinguishSymlinkFromFile bool | ||
// If true, the dangling symlinks will not be treated as an error. | ||
AllowDanglingSymlink bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to remove this and allow dangling symlink when symlink is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, because the default client behavior is to treat this as an error, and we want backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling symlink is new behavior, so I think there is no previous behavior we want to keep compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these two are independent (i.e. it's fine if one enables AllowDanglingSymlink
but not EnableSymlink
). There is a check to return error for dangling symlinks:
remote-apis-sdks/go/pkg/tree/tree.go
Lines 68 to 70 in af8c2ee
if smd := meta.Symlink; smd != nil && smd.IsDangling { | |
return nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but I still think it is better to make it small where user can configure. Adding AllowDanglingSymlink
later is easy if we found it is necessary, but removing is hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched to the functional pattern. Meanwhile I also removed the config for the dangling symlink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #188 (comment) I think the SDK team wants the handling of symlink and allowing dangling symlinks to be separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd prefer to have these as separate options, to preserve backward compatibility. Also, make sure that the default options are clear and initialized correctly (AllowDanglingSymlink=true).
Also, I'd actually prefer the introduction of the options and the implementation to be in the same PR, it's easier to review that way IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: now that tree is a part of Client after a recent PR, it would make a lot of sense to me to just make both of these Client Opts instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack.
Could you add code using the config as well, at least for upload part? I think it is better to have one PR for that. |
go/pkg/client/tree.go
Outdated
@@ -89,6 +110,16 @@ func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs | |||
}, | |||
} | |||
return nil | |||
} else if t == command.SymlinkInputType { | |||
fs[path] = &fileSysNode{ | |||
// TODO: How to handle a target with absolute path? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Here says that an absolute path target can be supported. Do you think we still need to check if the target is within exec root or not?
- Also, I made the symlink handling "dumb" in the sense that we don't recursively invoke
loadFiles
on its target. Instead, if users want the target to be included, they need to be explicit about that in the input spec. I think this simplifies the implementation, especially if the target is an absolute path out of the exec root. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine to keep symlink as-is here, as I don't remember we have the case handling in tree absolute path symlink.
I'm fine with 2 too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can error out on absolute path symlinks for now, and support them later if needed. However, traversing them is necessary. The users specify an input spec such as foo/,bar/, and it's our responsibility to decide what exactly we should upload based on that. If foo/bla is a directory symlink to /baz, we need to traverse the target to load all the files there into the Merkle tree, otherwise the symlink will be dangling on the bot.
Note that relative path symlinks that escape the input root are an error as well.
go/pkg/client/client.go
Outdated
UnifiedCASOps UnifiedCASOps | ||
UnifiedCASOps UnifiedCASOps | ||
// TreeSymlinkOpts controls how symlinks are handled when constructing a tree. | ||
TreeSymlinkOpts TreeSymlinkOpts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compatibility nit: we keep/pass all structs in this repo by pointer, not value (with the noted exception of Digest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see, done.
go/pkg/client/client.go
Outdated
@@ -170,6 +172,12 @@ func (s UnifiedCASOps) Apply(c *Client) { | |||
c.UnifiedCASOps = s | |||
} | |||
|
|||
func (o *TreeSymlinkOpts) Apply(c *Client) { | |||
if o != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check needs to be removed once c.TreeSymlinkOpts is a pointer and nil is a legal value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
go/pkg/client/tree.go
Outdated
@@ -65,17 +80,27 @@ func shouldIgnore(inp string, t command.InputType, excl []*command.InputExclusio | |||
|
|||
// loadFiles reads all files specified by the given InputSpec (descending into subdirectories | |||
// recursively), and loads their contents into the provided map. | |||
func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs map[string]*fileSysNode, chunkSize int, cache filemetadata.Cache) error { | |||
func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs map[string]*fileSysNode, chunkSize int, cache filemetadata.Cache, opts TreeSymlinkOpts) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, pass by pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
go/pkg/client/tree.go
Outdated
@@ -89,6 +110,16 @@ func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs | |||
}, | |||
} | |||
return nil | |||
} else if t == command.SymlinkInputType { | |||
fs[path] = &fileSysNode{ | |||
// TODO: How to handle a target with absolute path? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can error out on absolute path symlinks for now, and support them later if needed. However, traversing them is necessary. The users specify an input spec such as foo/,bar/, and it's our responsibility to decide what exactly we should upload based on that. If foo/bla is a directory symlink to /baz, we need to traverse the target to load all the files there into the Merkle tree, otherwise the symlink will be dangling on the bot.
Note that relative path symlinks that escape the input root are an error as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can error out on absolute path symlinks for now, and support them later if needed. However, traversing them is necessary... (#229 (comment))
Ack, I've decided to handle this once and for all. I will check if target
is under execRoot
, regardless if it's an absolute path or not (see getTargetRelPath()
and preprocessSymlink()
):
- Escapes
execRoot
: report this as an error. - Inside
execRoot
: we will traverse the target recursively, provided that it is not dangling.
It's not very easy to set up the test for this check in TestComputeMerkleTree
, because of the combination of absolute target path, dangling symlinks, etc. So I added a tree_whitelist_test.go
just to verify getTargetRelPath()
.
More about the tests: Since it is possible now to accept a symlink to an absolute path target, I had to add an execRootPlaceholder
and substitute it with the actual root
path when defining the symlink nodes in repb.Directory
. Let me know if you think this is a bit ad-hoc..
go/pkg/client/tree.go
Outdated
// under execRoot, so this should never fail. | ||
relTarget, err := getTargetRelPath(execRoot, path, meta.Symlink.Target) | ||
if err != nil { | ||
panic(fmt.Sprintf("Invalid target=%q err=%v", relTarget, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used panic
here instead of returning err
, because I think this is an internal invariant. If we ever reach here, it means there's a bug in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We follow the guildeline of "don't panic" in Go code. In particular, it makes sense here because this is a library, and never want to crash the calling executable. If we have a bug, we should surface an error message that will make it easy to catch and fix it.
go/pkg/client/tree_test.go
Outdated
@@ -44,6 +48,16 @@ func mustMarshal(p proto.Message) []byte { | |||
return b | |||
} | |||
|
|||
func prependExecRootPlaceholder(target string) string { | |||
return fmt.Sprintf("%s/%s", execRootPlaceholder, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use filepath.Join
or path.Join
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used in SymlinkNode
, where the API says the target path is always separated by /
: https://github.com/bazelbuild/remote-apis/blob/aa8e718768c2973274c2a23e9dcc27b42afea794/build/bazel/remote/execution/v2/remote_execution.proto#L798-L807
That said, I noticed that the existing tests's inputPath
are also constructed with /
, instead of filepath.Join
. I wonder why this is not a problem on Windows..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this repo doesn't have Windows coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i thought cas
already worked on Win?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repo works on Windows -- there are some tests that are Windows or Linux specific (see e.g. cache_posix_test.go https://github.com/bazelbuild/remote-apis-sdks/blob/master/go/pkg/filemetadata/BUILD.bazel), but this test should work on both.
go/pkg/client/tree_whitebox_test.go
Outdated
tests := []struct { | ||
desc string | ||
target string | ||
isError bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think wantErr
is more common in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, done
rebased |
go/pkg/client/tree.go
Outdated
} | ||
|
||
// NewTreeSymlinkOpts returns a default NewTreeSymlinkOpts object. | ||
func NewTreeSymlinkOpts() *TreeSymlinkOpts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: DefaultTreeSymlinkOpts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
go/pkg/client/tree.go
Outdated
// under execRoot, so this should never fail. | ||
relTarget, err := getTargetRelPath(execRoot, path, meta.Symlink.Target) | ||
if err != nil { | ||
panic(fmt.Sprintf("Invalid target=%q err=%v", relTarget, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We follow the guildeline of "don't panic" in Go code. In particular, it makes sense here because this is a library, and never want to crash the calling executable. If we have a bug, we should surface an error message that will make it easy to catch and fix it.
go/pkg/client/tree_test.go
Outdated
@@ -44,6 +48,16 @@ func mustMarshal(p proto.Message) []byte { | |||
return b | |||
} | |||
|
|||
func prependExecRootPlaceholder(target string) string { | |||
return fmt.Sprintf("%s/%s", execRootPlaceholder, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This repo works on Windows -- there are some tests that are Windows or Linux specific (see e.g. cache_posix_test.go https://github.com/bazelbuild/remote-apis-sdks/blob/master/go/pkg/filemetadata/BUILD.bazel), but this test should work on both.
go/pkg/client/tree.go
Outdated
@@ -89,6 +164,20 @@ func loadFiles(execRoot string, excl []*command.InputExclusion, path string, fs | |||
}, | |||
} | |||
return nil | |||
} else if t == command.SymlinkInputType { | |||
fs[path] = &fileSysNode{ | |||
symlink: &symlinkNode{target: meta.Symlink.Target}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a problem here -- if the symlink target is absolute (and we have Preserve=true), then even when it is actually under the exec root we cannot preserve it as is, because that would fail on the bot (because the bot will map the exec root to a different absolute path).
In this edge case, I think that we will need to convert the symlink target to a relative path instead.
Alternatively, we could disallow absolute symlink targets in general when Preserve=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I suspect that when you fix this, you will no longer need the exec root placeholder workaround in tests.
But if you do want to test absolute symlinks, I think the IsAbsolute attribute of inputPath already does what you want, you don't need the exec root placeholder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. Yeah the test is simplified..
I think the IsAbsolute attribute of inputPath already does what you want, you don't need the exec root placeholder.
Note that IsAbsolute
only affects how the test data are constructed in the filesystem, but doesn't apply to the SymlinkNode
protos. Anyway, the placeholder is removed now.
InputDirectories: 2, | ||
InputFiles: 1, | ||
InputSymlinks: 1, | ||
TotalInputBytes: fooDg.Size + fooDirDg.Size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems low... maybe not counting the symlink nodes? Ditto below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you mean the TotalInputBytes
should count the bytes for symlinks? I didn't do that because 1) the symlink node itself isn't associated with a Digest
, and 2) will counting in both the symlink and the target result in double counting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I meant that it should count the size of the Directory protos correctly; since these include new SymlinkNodes, the size should change. However, I now looked at the code and I fail to understand why did I write it the way it is to begin with; now that I think about it, it would make a lot more sense to make TotalInputBytes simply equal to the sum of all the digest sizes in all the uploaded chunkers.
So please leave it alone, I'll fix it myself in a later CL. Thank you!
InputDirectories: 2, | ||
InputFiles: 1, | ||
InputSymlinks: 1, | ||
TotalInputBytes: fooDg.Size + fooDirDg.Size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I meant that it should count the size of the Directory protos correctly; since these include new SymlinkNodes, the size should change. However, I now looked at the code and I fail to understand why did I write it the way it is to begin with; now that I think about it, it would make a lot more sense to make TotalInputBytes simply equal to the sum of all the digest sizes in all the uploaded chunkers.
So please leave it alone, I'll fix it myself in a later CL. Thank you!
go/pkg/client/tree.go
Outdated
if err != nil { | ||
return err | ||
} | ||
fs[path] = &fileSysNode{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fs[normPath] in order to not reintroduce the bug fixed by #225.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
symlink: &symlinkNode{target: relTarget}, | ||
} | ||
if meta.Symlink.IsDangling || !opts.FollowsTarget { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one thing I wanted to check: is it possible to create dangling symlinks on Windows? I seem to recall there were some problems with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i don't have a Win at this moment to test this. But i think the test case has been creating dangling symlinks for quite a while. Have you noticed that it has failed on Win?
remote-apis-sdks/go/pkg/client/tree_test.go
Lines 65 to 67 in 47f7841
if err := os.Symlink(target, path); err != nil { | |
return err | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right -- I guess that does work and I misremembered. Thank you!
go/pkg/client/tree_test.go
Outdated
@@ -570,6 +687,72 @@ func TestComputeMerkleTree(t *testing.T) { | |||
TotalInputBytes: fooDg.Size + fooDirDg.Size + barDg.Size + barDirDg.Size, | |||
}, | |||
}, | |||
{ | |||
desc: "Directory absoluate symlink (preserved)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: absolute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
symlink: &symlinkNode{target: relTarget}, | ||
} | ||
if meta.Symlink.IsDangling || !opts.FollowsTarget { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right -- I guess that does work and I misremembered. Thank you!
Added
TreeSymlinkOpts
to control the behavior of tree construction.To be backward compatible, symlink is preserved only when
TreeSymlinkOpts.Preserved
is explicitly set totrue
. By default, symlinks are converted into files.Issue: #146