-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add find command #99
Conversation
This commit adds the ``find`` command which can be used to query the chisel releases for available slices. --- Usage: chisel find [find-OPTIONS] [<query>...] The find command queries the chisel releases for available slices. With the --release flag, it queries for slices in a particular branch of chisel-releases repository[1] or a particular directory. If left unspecified, it queries with the release info found in /etc/lsb-release. [1] https://github.com/canonical/chisel-releases [find command options] --release=<branch|dir> Chisel release branch or directory
This commit changes the logic of matching a query against a slice. Following the new behaviour, a slice name will be listed as a match if the query is a sub-string of the slice name, in pkg_slice format. Or, it will be listed if the query is atmost 1 (One) Levenshtein distance [1] away from the slice name, in pkg_slice format. [1] https://en.wikipedia.org/wiki/Levenshtein_distance
3ab898e introduced new changes in behaviour. Fix the tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't find anything to nitpick on. LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just left a couple of nit comments/suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is looking very good, just left some minor nitpicks and items than can be addressed in other PRs. Some general comment:
I think we do a pretty good job in the help message by directing the user to the chisel-releases repository and telling them that releases match branches. However, I think that we do not do such a great job with error messages. For example:
$ ./chisel find libc6 --release ubuntu
error: invalid release reference: "ubuntu"
There is no mention of the expected format (e.g. name-number
), and even with the correct format the error message can be improved, see:
$ ./chisel find libc6 --release ubuntu-11
error: no information for ubuntu-11 release
We could say above what the available releases are or point the user to the repository so he can see the branch names. Because the error messages were not introduced as part of this PR, I think it is best to create an issue and address it in the future, do you agree?
I agree. I will create an issue for this after this PR is closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good when re-reviewing it. I left a couple of nitpicks. The only thing that I think could be improved in general lines is how we fuzzy match the string but I seem to recall from months ago that we would treat that as a separate PR.
cmd/chisel/cmd_find.go
Outdated
return err | ||
} | ||
if slices == nil { | ||
fmt.Fprintf(Stdout, "No matching slices for \"%s\"\n", query) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use fmt.Printf
for simplicity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit sceptical about doing that since the existing codes make use of the Stdout
variable for testing purposes and there are no occurrences of directly using fmt.Print*
:
$ grep -rni "fmt.Print" ./
./internal/jsonwall/jsonwall.go:28:// fmt.Println(app.Name, "version:", app.Version)
./internal/jsonwall/jsonwall.go:37:// fmt.Println(app.Name, "version:", app.Version)
$ grep -rni "fmt.FPrint.*Stdout" ./
./cmd/chisel/cmd_version.go:31: fmt.Fprintf(Stdout, "%s\n", cmd.Version)
./cmd/chisel/cmd_help.go:177: fmt.Fprintln(Stdout, longChiselDescription)
./cmd/chisel/cmd_help.go:178: fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:179: fmt.Fprintln(Stdout, chiselUsage)
./cmd/chisel/cmd_help.go:180: fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:181: fmt.Fprintln(Stdout, chiselHelpCategoriesIntro)
./cmd/chisel/cmd_help.go:185: fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:186: fmt.Fprintln(Stdout, chiselHelpAllFooter)
./cmd/chisel/cmd_help.go:191: fmt.Fprintln(Stdout, chiselHelpFooter)
./cmd/chisel/cmd_help.go:197: fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:205: fmt.Fprintf(Stdout, "%*s: %s\n", maxLen+2, categ.Label, strings.Join(categ.Commands, ", "))
./cmd/chisel/cmd_help.go:230: fmt.Fprintln(Stdout)
./cmd/chisel/cmd_help.go:231: fmt.Fprintf(Stdout, " %s (%s):\n", categ.Label, categ.Description)
./cmd/chisel/cmd_help.go:237: fmt.Fprintf(Stdout, " %*s %s\n", -maxLen, name, cmd.ShortDescription)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's keep it like this if it is the convention. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out stdout
is used in the tests like we discussed weeks ago. This comment can be resolved.
Had a high level discussion with @niemeyer about the search functionality. We agreed that for the first iteration we should try to keep things simple but also make the UX feel natural to the user. To achieve that we are going to use a string distance of 1 to see if two strings match. Additionally, we are going to take each term passed to the CLI (i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking reasonable, thanks. Only a few nitpicks.
cmd/chisel/cmd_find.go
Outdated
return err | ||
} | ||
if len(slices) == 0 { | ||
fmt.Fprintf(Stdout, "No matching slices for \"%s\"\n", strings.Join(cmd.Positional.Query, " ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Stdout/Stderr/ -- It's good practice not to send warnings, errors, and prose in general to Stdout when what is expected is structured output. Another option is to do the exact same output, but present an empty list. What do we do in such cases with snapd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snap uses stderr, I will change it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd/chisel/cmd_find.go
Outdated
w := tabWriter() | ||
fmt.Fprintf(w, "Slice\tPackage\tRelease\n") | ||
for _, s := range slices { | ||
fmt.Fprintf(w, "%s\t%s\t%s\n", s, s.Package, releaseLabel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the typical package_slice syntax I think?
Also, why are we presenting Release in every line when this was an input parameter and will be the same on every line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is conforming to the output format defined in the spec. Example taken from the doc:
Slice Package Release
apache2_bins apache2 ubuntu-22.04
apache2_libs apache2 ubuntu-22.04
....
Let's discuss it and potentially change it.
cmd/chisel/cmd_find.go
Outdated
func match(slice *setup.Slice, query string) bool { | ||
const maxStrDist = 1 | ||
fuzzyMatch := func(str, query string) bool { | ||
return strdist.Distance(str, query, strdist.StandardCost, maxStrDist+1) <= maxStrDist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this should be a normal function instead of a closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer applicable with the latest changes.
cmd/chisel/cmd_find.go
Outdated
} | ||
|
||
// findSlices goes through the release searching for any slices that match | ||
// the query string. It returns a list of slices that match the query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// findSlices returns slices from the provided release that contain all of the query strings (AND).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1186a98.
cmd/chisel/cmd_find.go
Outdated
|
||
// findSlices goes through the release searching for any slices that match | ||
// the query string. It returns a list of slices that match the query. | ||
func findSlices(release *setup.Release, queries []string) (slices []*setup.Slice, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/queries/query/ (it's one query with several required strings, not several independent queries, which would hint at an OR relationship)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1186a98.
cmd/chisel/cmd_find.go
Outdated
fuzzyMatch := func(str, query string) bool { | ||
return strdist.Distance(str, query, strdist.StandardCost, maxStrDist+1) <= maxStrDist | ||
} | ||
return strings.Contains(slice.String(), query) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't look right. For example, why are we happy to match "ab" against "abcdefghij", but then do a fuzzy match between them and request that the delta is only 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, snap find
looks for substrings as well, if you look for "python" you get:
Name Version Publisher Notes Summary
python3-alt 1.0.0 muqtxdir-m - python3 snap packaging
python-ai-toolkit 0.1.5.4 mauringo - Collection of common python packages for Data Analysis and AI
python0 0.9.1 zygoon classic Ancient version of Python for programming archeologists
python36-jamesh 3.6.4 jamesh - Python language interpreter and standard library
python2-jamesh 2.7.14 jamesh - Python language interpreter and standard library
python38 3.8.0 om26er - Python 3.8
python36-arun 3.6.4 arun.prasad - Python language interpreter and standard library
The difference is that they do not use fuzzy matching at all. Do we want to go that route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we will take search for "" for each term in the query. That will allow us to get the substring behavior for free. We will also allow *
and ?
inside the query.
@niemeyer The only thing I am not sure about is whether we still want to allow a distance of 1 with other characters. In previous versions python3.1x
would have matched python3.10
but now that we can write python3.1?
I am not sure there is a need for that anymore (in the current PR the distance ONLY takes into account globs).
cmd/chisel/helpers.go
Outdated
Label: label, | ||
Version: version, | ||
}) | ||
releaseLabel = label + "-" + version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return a releaseLabel that disagrees with release.Label? At the very least, we should not call this the release label, but maybe we should not return it at all? How's that information used, and what's the best approach considering that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I have removed it now that we do not use it for the output so there is no need to think of a clearer name.
cmd/chisel/helpers.go
Outdated
} | ||
|
||
// readOrFetchRelease takes a release branch name or a release directory path. | ||
// It fetches or reads the chisel-release depending on the nature of input and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/readOrFetch/obtain/?
We always read, maybe local, maybe after fetching, but we don't have to try to convey all these details via the function name.
Then, the documentation needs to follow usual Go conventions, and should describe what it does more clearly. For example:
// obtainRelease returns the Chisel release information matching the provided string,
// fetching it if necessary. The provided string should either be a "<name>-<version>" pair,
// or the path to a directory containing a previously fetched release.
... or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 99b9496.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, Alberto. Probably final pass.
cmd/chisel/cmd_cut.go
Outdated
` | ||
|
||
var cutDescs = map[string]string{ | ||
"release": "Chisel release directory", | ||
"release": "Chisel release directory or Ubuntu version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Chisel release name or directory (e.g. ubuntu-22.04)"
The directory case is quite special. We don't need to spend much time on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in d124e25.
cmd/chisel/cmd_cut.go
Outdated
@@ -18,10 +13,13 @@ var shortCutHelp = "Cut a tree with selected slices" | |||
var longCutHelp = ` | |||
The cut command uses the provided selection of package slices | |||
to create a new filesystem tree in the root location. | |||
|
|||
By default it fetches the slices for the latest Ubuntu | |||
version, unless the --release flag is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that true? Isn't to actually to fetch slices for the same Ubuntu release as the current host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are completely right, changed in d124e25.
cmd/chisel/cmd_find.go
Outdated
Globs (* and ?) are allowed in the query. | ||
|
||
By default it fetches the slices for the latest Ubuntu | ||
version, unless the --release flag is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are completely right, changed in d124e25.
cmd/chisel/cmd_find.go
Outdated
` | ||
|
||
var findDescs = map[string]string{ | ||
"release": "Chisel release directory or Ubuntu version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in d124e25.
internal/strdist/strdist.go
Outdated
// ? - Any one character | ||
// * - Any zero or more characters | ||
// ⁑ - Any zero or more characters | ||
func StandardGlobCost(ar, br rune) Cost { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yet, this standard is not what we use in GlobPath. Any reason why we're not just using GlobPath for our match? We won't have slashes in those strings, so it won't affect anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not comfortable having some logic that was treating "/" differently. For example: python3.1x matches python3.10 where x is any character except when that character is a "/".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is a slash supposed to be seen at all, in either of these strings? These are package and slice names.
In either case, the "Standard" in the StandardCost function comes from the fact that this is how one would get a classic Levenshtein distance from the books. There's nothing standard about this function. It's not even what we use ourselves in our own glob path matching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR adds the
find
command which can be used to query the chisel releases for available slices.Examples
$ chisel find --release ubuntu-20.04 python3 2>/dev/null Slice Package Release libpython3.8-minimal_config libpython3.8-minimal ubuntu-20.04 libpython3.8-minimal_libs libpython3.8-minimal ubuntu-20.04 libpython3.8-stdlib_all-os libpython3.8-stdlib ubuntu-20.04 ... ... (trimmed) ... ... python3.8_core python3.8 ubuntu-20.04 python3.8_standard python3.8 ubuntu-20.04 python3.8_utils python3.8 ubuntu-20.04
$ chisel find libc6_libs 2>/dev/null Slice Package Release libc6_libs libc6 ubuntu-22.04
Please let me know what you think of the changes. I think I could use some suggestions with naming overall, and particularly the value of
maxStrDist
constant and logic inmatchSlice
in/cmd/chisel/cmd_find.go
.