Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bbin ls is unnecessarily slow #62

Closed
rads opened this issue Apr 8, 2023 · 11 comments
Closed

bbin ls is unnecessarily slow #62

rads opened this issue Apr 8, 2023 · 11 comments

Comments

@rads
Copy link
Collaborator

rads commented Apr 8, 2023

I haven't done any profiling yet but it's likely slow because we're reading and parsing all the scripts in bbin bin every time to generate the list. This could be improved with #34 but there may be easier fixes outside of that too.

@teodorlu
Copy link
Contributor

Perhaps reading the first 1 MB for each script is a simpler solution?

bbin ls is quite slow on my system too -- right now, it takes 10 seconds to run. I've been assuming that it reads all the binaries on my system to find bbin binaries. Some of those are quite large; so I was optimistically hoping we could cut 99 % of that time by exiting faster on the huge binaries.

@borkdude
Copy link
Contributor

Perhaps it's better to save data as edn in a predictable location so you don't actually have to search anything, but the edn describes where scripts live?

@teodorlu
Copy link
Contributor

An EDN file could possibly get out of sync with the index, for example if a user moves or deletes a bbin-installed script.

What's the right thing to do then? Assume it won't happen, and crash? Silently delete references from the index to scripts that don't exist, and stop tracking them? Support some kind of "reindexing" logic that derives the index from the user's system?

Keeping metadata and scripts apart has advantages, as outlined by @borkdude in #34. I do like the idea of being able to track an EDN file in Git, run a command to have all the scripts available.

@rads
Copy link
Collaborator Author

rads commented Oct 1, 2023

For now I'm going to start with reading the first 5kb of each file, which is an easy change to make. On my system, this improves the time for bbin ls from about 4 seconds to about 1/10th of a second. I'll go ahead and release this after I have some tests written.

https://github.com/babashka/bbin/tree/issue-62

@borkdude
Copy link
Contributor

borkdude commented Oct 1, 2023

An EDN file could possibly get out of sync with the index, for example if a user moves or deletes a bbin-installed script.

I think this is unusual behavior. A user should just not tamper with those scripts but should use bbin to manage those. This argument could be used for any package manager: moving programs that were installed with brew screws the index of that package manager or moving dependencies that were added by maven screws with clj.

@borkdude
Copy link
Contributor

borkdude commented Oct 7, 2023

clojure -Ttools install creates .edn files per installed script in ~/.clojure/tools. I think that could be a good approach for bbin too.

@teodorlu
Copy link
Contributor

teodorlu commented Oct 8, 2023

For now I'm going to start with reading the first 5kb of each file, which is an easy change to make

For my use use, this seems like a free performance improvement, which I like!

EDIT: The following comment has been moved to #34 (comment).

After some hammock time, I think @borkdude's proposal of having one EDN file per installed script and an operation in bbin to install a script for each EDN file is a way to go.

Why: because implementation complexity and User can track EDN files in Git, and have bbin ensure the scripts are installed on all computers are important, and bbin uninstall safety/atomicity can still be solved in the future.


I tried to map this out with a Rich Hickey style decision matrix (as advocated for in Design in Practice). Screenshot:

image

Full document: https://docs.google.com/spreadsheets/d/1AWiYHYteuTtTGCaHWg0DNuX2rTXXJTzZzl0pbOzFsWU/edit#gid=0

(send me a Google account if you want write access, eg to add a new column for a different approach, or to add a new row for a different criterium).

I think "metadata by script install name" is the best place to start, then consider "metadata in transaction log" or "metadata by script hexdigest" later if safe/atomic uninstall is desired.


@rads I don't want to step on your toes here! I feel like I perhaps went a little overboard with design, I've been wanting to try out some of the things Rich proposed in his latest talk for a while. You know the bbin implementation in more detail than I do, and I might be missing something. Please correct me if that's the case!

@rads
Copy link
Collaborator Author

rads commented Oct 8, 2023

I'm enjoying the discussion about the script metadata, but I think we should continue in #34 since my short-term fix will address the speed issue with bbin ls.

@rads
Copy link
Collaborator Author

rads commented Oct 8, 2023

P.S. @teodorlu, you aren't stepping on my toes. 😄 I appreciate that you've spent time thinking about this!

@borkdude
Copy link
Contributor

borkdude commented Oct 8, 2023

Agree with @rads, @teodorlu amazing table :) Perhaps post it at the other issue then?

@teodorlu
Copy link
Contributor

teodorlu commented Oct 9, 2023

[...] but I think we should continue in #34 since my short-term fix will address the speed issue with bbin ls.

[...] Perhaps post it at the other issue then?

Agreed. Moved to #34 (comment).

@rads rads closed this as completed in dda48c5 Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants