Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git: allow scp-style urls without username #5293

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jedevc
Copy link
Member

@jedevc jedevc commented Sep 2, 2024

According to the git docs:

Or you can use the shorter scp-like syntax for the SSH protocol:

git clone [user@]server:project.git

(note that in practice, the .git suffix is not generally necessary).

Previously, we were doing this potentially wrong - we were requiring the presence of a username for all scp-style URLs. This could result in disparity from the git CLI which successfully manages to parse github.com:moby/buildkit.git (though cloning will generally fail, with an ambiguous username).

However, we aim keep the behavior of git ref as before - this is because making changes to this can effect the parsing of dockerfiles: where before "foo:bar" would be detected as a local source, after, it would be detected as a git repo (despite the fact it correctly parses as one). To avoid this, we simply keep this behavior as before.

Note

This isn't a regression thankfully - this was the behavior defined in the tests in #1901.

if errors.Is(err, ErrUnknownProtocol) {
return nil, err
}
remote, err = ParseURL(ref, NoImplicitSCPUsername())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should protect this with a LLB capability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a cap for this, we add the new pb.CapSourceGitImplicitUsername capability if we get an scp-style url that has no username associated. This means we can't have a new frontend send a username without a username that an old server won't understand.

"net/url"
"regexp"
)

var gitSSHRegex = regexp.MustCompile("^([a-zA-Z0-9-_]+)@([a-zA-Z0-9-.]+):(.*?)(?:#(.*))?$")
var gitSSHRegex = regexp.MustCompile("^(?:([a-zA-Z0-9-_]+)@)?([a-zA-Z0-9-.]+):(.*?)(?:#(.*))?$")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can cause false positives.

Possibilities:

  • What if we require . in hostname
  • Would requiring .git in repo be too awful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, it definitely can cause false positives.

The problem with introducing those additional constraints, is this would be a potentially breaking change. We don't currently require .git, or require a hostname with a . in it. We could make this change using a capability, but the .git change definitely feels like it could break things.

However, because of the NoImplicitSCPUsername option which gets called for ParseGitRef, this change won't affect anything that's dockerfile-facing. It's only in the LLB layer, when we've passed a URL to llb.Git, and so have already decided it's a git url (so a custom frontend that has a way of indicating that this is a git url, e.g. dag.Git("<url>") in dagger can handle these URL types).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the dockerfile case we would already decide if we call llb.HTTP or llb.Git. If we choose git then it can't be anything else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - I think the regexp can potentially cause false positives, but I don't see the issue: buildkit behavior continues working as before (except some previously invalid calls to llb.Git are now valid), and the regexp now expands to match "the spec" (and more importantly the git cli).

According to the git docs:

> Or you can use the shorter scp-like syntax for the SSH protocol:
>
> 	$ git clone [user@]server:project.git

Previously, we were doing this potentially wrong - we were requiring the
presence of a username for all scp-style URLs. This could result in
disparity from the git CLI which successfully manages to parse
`github.com:moby/buildkit.git` (though cloning will generally fail, with
an ambiguous username).

However, we aim keep the behavior of git ref as before - this is because
making changes to this can effect the parsing of dockerfiles: where
before "foo:bar" would be detected as a local source, after, it would be
detected as a git repo (despite the fact it correctly parses as one). To
avoid this, we simply keep this behavior as before.

Signed-off-by: Justin Chadwell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants