Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add apt credentials parser #60

Merged
merged 23 commits into from
Aug 11, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
304 changes: 304 additions & 0 deletions internal/archive/credentials.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
package archive

import (
"bufio"
"errors"
"fmt"
"io"
"net/url"
"os"
"path/filepath"
"sort"
"strings"
)

// credentials contains matched non-empty Username and Password.
// Username is left empty if the search is unsuccessful.
type credentials struct {
Username string
Password string
}

// Empty checks whether c represents unsuccessful search.
func (c credentials) Empty() bool {
return c.Username == ""
}
woky marked this conversation as resolved.
Show resolved Hide resolved

// credentialsQuery contains parsed input URL data used for search.
type credentialsQuery struct {
scheme string
host string
port string
path string
needScheme bool
}

// parseRepoURL parses repoURL into credentialsQuery and fills provided
// credentials with username and password if they are specified in repoURL.
func parseRepoURL(repoURL string) (creds credentials, query *credentialsQuery, err error) {
u, err := url.Parse(repoURL)
if err != nil {
return
}

creds.Username = u.User.Username()
creds.Password, _ = u.User.Password()

if !creds.Empty() {
return
}

host := u.Host
port := u.Port()
if port != "" {
// u.Hostname() would remove brackets from IPv6 address but we
// need it verbatim for string search in netrc file. This is
// also faster because both u.Port() and u.Hostname() parse
// u.Host into port and hostname.
host = u.Host[0 : len(u.Host)-len(port)-1]
}

query = &credentialsQuery{
scheme: u.Scheme,
host: host,
port: port,
path: u.Path,
// If the input URL specifies unencrypted scheme, the scheme in
woky marked this conversation as resolved.
Show resolved Hide resolved
// machine declarations in netrc file is not optional and must
// also match.
needScheme: u.Scheme != "https" && u.Scheme != "tor+https",
}

return
}

// findCredentials searches credentials for repoURL in configuration files in
// directory specified by CHISEL_AUTH_DIR environment variable if it's
// non-empty or /etc/apt/auth.conf.d.
woky marked this conversation as resolved.
Show resolved Hide resolved
func findCredentials(repoURL string) (credentials, error) {
credsDir := "/etc/apt/auth.conf.d"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the other comment in the fake function, this should be a global:

var credentialsDir = "/etc/apt/auth.conf.d"

This will simplify the logic below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I simplify the logic below, then the test will work differently when CHISEL_AUTH_DIR env var is set. To make this change I need another fakeEnv call in the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just removed the fake function and test FindCredentialsInDir directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per earlier comments, the default path should be a global variable, or a constant, according to your preference, and the logic must be tested according to these notes too. This function, and in particular the use of CHISEL_AUTH_DIR, looks untested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the constant to the global scope and added FindCredentials test. @niemeyer Please resolve this thread if appropriate.

if v := os.Getenv("CHISEL_AUTH_DIR"); v != "" {
credsDir = v
}
return findCredentialsInDir(repoURL, credsDir)
}

// findCredentialsInDir searches for credentials for repoURL in configuration
// files in credsDir directory. If the directory does not exist, empty
// credentials structure with nil err is returned.
// Only files that do not begin with dot and have either no or ".conf"
// extension are searched. The files are searched in ascending lexicographic
// order. The first file that contains machine declaration matching repoURL
// ends the search. If no file contain matching machine declaration, empty
// credentials structure with nil err is returned.
func findCredentialsInDir(repoURL string, credsDir string) (creds credentials, err error) {
contents, err := os.ReadDir(credsDir)
if err != nil {
if os.IsNotExist(err) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation: here we are concluding that no credentials exist, which could be reached if CHISEL_AUTH_DIR is set, but pointing to a non-existing directory. This may be by design, but just pointing out that this does not mean that /etc/apt/auth.conf.d does not contain credentials.

Copy link
Contributor Author

@woky woky Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed by my design. :-) We did not specify this, so I did what felt intuitive to me. Warning when CHISEL_AUTH_DIR doesn't exist and ignoring non-existent /etc/apt/auth.conf.d feels inconsistent to me. Would you like a different behavior?

Example from elsewhere: mpv looks up its configs in $MPV_HOME or in ~/.config/mpv/ (among other locations). It doesn't complain if either of those doesn't exist. Similarly, git doesn't complain when $GIT_CONFIG_SYSTEM or /etc/gitconfig doesn't exist but it reads it if it does. In both cases, if the variable is set (or non-empty?), the default location is skipped.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see both of your points. I think it would help to simply have a debug log message to say that CHISEL_AUTH_DIR's path does not exist, regardless of its path.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense to me, the naming of the environment variable suggests an overriding behaviour, which if abused, must yield consequences :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjdcordeiro @flotter I've added a debug message when the credentials directory does not exist, regardless of whether it's the default location or overridden by the variable. See commit 482ff15. Please resolve if you think it addresses your comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a good rule of thumb, any function that is requested to perform a given task must either succeed at its task, or return an error informing that it could not. That's even more true when the function actually returns an error, as the caller will imply that unless there's an error, its intention was successful. The task of this function is to find credentials in a directory, so it must either succeed at that, or return an error informing that it cannot perform its requested task.

The call site, though, may choose to ignore a given scenario. For handling some of those cases we have error results that can be more easily verified, either directly or via a function.

Copy link
Contributor Author

@woky woky Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced ErrCredentialsNotFound error. The function now returns this error when no credentials are found. @niemeyer Please resolve the thread if appropriate.

err = nil
} else {
err = fmt.Errorf("cannot open credentials directory: %w", err)
}
return
}

creds, query, err := parseRepoURL(repoURL)
if err != nil {
err = fmt.Errorf("cannot parse archive URL: %w", err)
return
}
if query == nil { // creds.Empty() == false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be improved a bit so that the intention is more clear. If the point here is that we already have credentials, we should check creds explicitly instead of checking query and assuming credentials are set in a comment.

Copy link
Contributor Author

@woky woky Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with !creds.Empty(). I can't rely on err != nil here as that alone does not tell me whether the URL contained credentials. @niemeyer Please resolve the thread if appropriate.

return
woky marked this conversation as resolved.
Show resolved Hide resolved
}

errs := make([]error, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intention here? Why is this not just var errs []error?

Copy link
Contributor Author

@woky woky Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the change to return ErrCredentialsNotFound when no credentials are found, non-fatal errors are now logged instead so the function no longer collects them, and this variable was removed. @niemeyer Please resolve the thread if appropriate.


confFiles := make([]string, 0, len(contents))
for _, entry := range contents {
woky marked this conversation as resolved.
Show resolved Hide resolved
name := entry.Name()
if strings.HasPrefix(name, ".") {
continue
}
if ext := filepath.Ext(name); ext != "" && ext != ".conf" {
continue
}
info, err := entry.Info()
if err != nil {
errs = append(errs, fmt.Errorf("cannot stat credentials file: %w", err))
continue
}
if !info.Mode().IsRegular() {
continue
}
confFiles = append(confFiles, name)
}
if len(confFiles) == 0 {
err = errors.Join(errs...)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the earlier point on errors here: the loop above may skip all the way through, and we end up with no credentials and no error either.

Copy link
Contributor Author

@woky woky Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented above, I've introduced ErrCredentialsNotFound error. The function now returns this error when no credentials are found. @niemeyer Please resolve the thread if appropriate.

}
sort.Strings(confFiles)

for _, file := range confFiles {
fpath := filepath.Join(credsDir, file)
f, err := os.Open(fpath)
woky marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
errs = append(errs, fmt.Errorf("cannot read credentials file %s: %w", fpath, err))
woky marked this conversation as resolved.
Show resolved Hide resolved
continue
}

if err = findCredsInFile(query, f, &creds); err != nil {
errs = append(errs, fmt.Errorf("cannot parse credentials file %s: %w", fpath, err))
} else if !creds.Empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please have a more clear interface on findCredsInFile, making it return the credentials when it finds them, as usual in Go, instead of having C-style output parameters unnecessarily. We should also not have to test creds for being empty. Per earlier notes, if the error is nil, the intent of the function must have been fulfilled.

Copy link
Contributor Author

@woky woky Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented above, I've introduced ErrCredentialsNotFound error. The function findCredsInFile was renamed to findCredentialsInternal and it now returns either a non-nil pointer to credentials or a non-nil error when an error occurs or credentials are not found. @niemeyer Please resolve the thread if appropriate.

break
}
}

err = errors.Join(errs...)
return
}

type netrcParser struct {
query *credentialsQuery
scanner *bufio.Scanner
creds *credentials
}

// findCredsInFile searches for credentials in netrc file matching query
// and fills creds with matched credentials if there's a match. The first match
// ends the search.
//
// The format of the netrc file is described in [1]. The parser is adapted from
// the Apt parser (see [2]). When the parser is looking for a matching machine
// declaration it disregards the current context and only considers the input
// token. For example when given the following netrc file
//
// machine http://acme.com/foo login u1 password machine
// machine http://acme.com/bar login u2 password p2
//
// and http://acme.com/bar input URL, the second line won't match, because the
// second "machine" will be treated as start of machine declaration. This also
// means unknown tokens are ignored, so comments are not treated specially.
woky marked this conversation as resolved.
Show resolved Hide resolved
//
// When a matching machine declaration is found the search stops on next
// machine token or on end of file. This means that arbitrary number of login
// and password declarations (or in fact, any tokens) can follow a machine
// declaration. The last username and password declaration overrides the
// previous ones. For example when given the following netrc file
//
// machine http://acme.com login a foo login b password c bar login d password e
//
// and the input URL is http://acme.com, the matched username and password will
// be "d" and "e" respectively. Tokens foo and bar will be ignored.
//
// This parser diverges from the Apt parser in the following ways:
// 1. The port specification in machine declaration is optional whether or
// not a path is specified. While the Apt documentation[1] implies the
// same behavior, the code adheres to it only when the machine declaration
// does not specify a path, see line 96 in [2].
// 2. When the input URL has unencrypted scheme and the machine declaration
// does not specify a scheme, it is skipped silently. The Apt parser warns
// the user about it, see line 113 in [2].
//
// References:
//
// [1] https://manpages.debian.org/testing/apt/apt_auth.conf.5.en.html
// [2] https://salsa.debian.org/apt-team/apt/-/blob/d9039b24/apt-pkg/contrib/netrc.cc
// [3] https://salsa.debian.org/apt-team/apt/-/blob/4e04cbaf/methods/aptmethod.h#L560
// [4] https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html
// [5] https://daniel.haxx.se/blog/2022/05/31/netrc-pains/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're still here, thanks for the extensive documentation and references above.

func findCredsInFile(query *credentialsQuery, netrc io.Reader, creds *credentials) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per note above, which of these is input and which of these is output? Let's please clean up the function prototype to reflect that, and have the error result reflect intent, per notes above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented about this function in the above thread. @niemeyer Please resolve the thread if appropriate.

s := bufio.NewScanner(netrc)
s.Split(bufio.ScanWords)
p := netrcParser{
query: query,
scanner: s,
creds: creds,
}
var err error
for state := netrcStart; state != nil; {
state, err = state(&p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this error needs checking, as it'd be surprising and error prone for a state to report a problem and it just be ignored.

Copy link
Contributor Author

@woky woky Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When err != nil then state == nil, the loop breaks and the error is later returned, unless an error happened in scanner, in which case the parser error is consequence of it, and so the scanner error is returned instead. It would be a bug for a state to return both state and err non-nil. Does it make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that logic is true @woky.

Some questions though:

  • if err != nil and state == nil, then do we need to continue to line 221? If not, better check the err here and just return
  • as you say, the above condition is by design and It would be a bug for a state to return both state and err non-nil.. So isn't that the check you should be doing? I.e. if state != nil && err != nil?

Copy link
Contributor Author

@woky woky Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err != nil and state == nil, then do we need to continue to line 221? If not, better check the err here and just return

We need to go to 221 because states return error when scanner.Scan() returns false which can happens on EOF or on an error condition. States treat the false return as EOF. We need to check if it's not an IO error.

as you say, the above condition is by design and It would be a bug for a state to return both state and err non-nil.. So isn't that the check you should be doing? I.e. if state != nil && err != nil?

Those are two things. The state != nil is the algorithm. The state != nil && err != nil is the algorithm AND assertion that checks for bugs. I've added the assertion.

}
if err := p.scanner.Err(); err != nil {
woky marked this conversation as resolved.
Show resolved Hide resolved
return err
}
return err
woky marked this conversation as resolved.
Show resolved Hide resolved
}

type netrcState func(*netrcParser) (netrcState, error)

var netrcStart = netrcInvalid

func netrcInvalid(p *netrcParser) (netrcState, error) {
for p.scanner.Scan() {
if p.scanner.Text() == "machine" {
return netrcMachine, nil
}
}
return nil, nil
}

func netrcMachine(p *netrcParser) (netrcState, error) {
if !p.scanner.Scan() {
return nil, errors.New("syntax error: reached end of file while expecting machine text")
woky marked this conversation as resolved.
Show resolved Hide resolved
}
token := p.scanner.Text()
if i := strings.Index(token, "://"); i != -1 {
if token[0:i] != p.query.scheme {
return netrcInvalid, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this gets back to the start state? Again, the idea of the parser being in an invalid state and just going on is a bit confusing, and it seems to relate to the bug mentioned earlier as it just ignores the current context while looking for special keywords. We'll probably need to sync on this with more bandwidth.

Copy link
Contributor Author

@woky woky Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the format is defined and parsed by Apt. It deliberately ignores current context. We're not at liberty to interpret the file differently.

}
token = token[i+3:]
} else if p.query.needScheme {
return netrcInvalid, nil
}
if !strings.HasPrefix(token, p.query.host) {
return netrcInvalid, nil
}
token = token[len(p.query.host):]
if len(token) > 0 {
if token[0] == ':' {
if p.query.port == "" {
return netrcInvalid, nil
}
token = token[1:]
if !strings.HasPrefix(token, p.query.port) {
return netrcInvalid, nil
}
token = token[len(p.query.port):]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it'll match the port ":90" when looking at ":9000".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check below mitigates that

               if !strings.HasPrefix(p.query.path, token) {
                       return netrcInvalid, nil
               }

}
if !strings.HasPrefix(p.query.path, token) {
return netrcInvalid, nil
}
}
return netrcGoodMachine, nil
}

func netrcGoodMachine(p *netrcParser) (netrcState, error) {
loop:
for p.scanner.Scan() {
switch p.scanner.Text() {
case "login":
return netrcUsername, nil
case "password":
return netrcPassword, nil
case "machine":
break loop
}
}
return nil, nil
}

func netrcUsername(p *netrcParser) (netrcState, error) {
if !p.scanner.Scan() {
return nil, errors.New("syntax error: reached end of file while expecting username text")
}
p.creds.Username = p.scanner.Text()
return netrcGoodMachine, nil
}

func netrcPassword(p *netrcParser) (netrcState, error) {
if !p.scanner.Scan() {
return nil, errors.New("syntax error: reached end of file while expecting password text")
}
p.creds.Password = p.scanner.Text()
woky marked this conversation as resolved.
Show resolved Hide resolved
return netrcGoodMachine, nil
}
Loading