Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add multiprocessing and author canonicalization #104

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

thehesiod
Copy link

@thehesiod thehesiod commented Nov 20, 2024

time to use those idle CPUs :)

given the majority of the time is spent on the git-blame executable it's nearly linear growth for quite some time. Running on 16 core machine with fast ssd yields pretty tremendous speed increase. I tried being as frugal as possible with communication between the processes.

@thehesiod thehesiod changed the title add multiprocessing support add multiprocessing and author canonicalization Nov 21, 2024
@@ -368,7 +507,6 @@ def run(args):
if isinstance(args.gitdir, str):
args.gitdir = [args.gitdir]
# strip `/`, `.git`
gitdirs = [i.rstrip(os.sep) for i in args.gitdir]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nop as detected by pycharm


for auth, stats in getattr(old, 'iteritems', old.items)():
i = auth_stats.setdefault(auth2em[auth],
{"loc": 0, "files": set(), "commits": 0, "ctimes": []})
auth_email = list(auth2em[auth])[0] # TODO: count most used email?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they should all be returned

Comment on lines +258 to +266
# if since:
# # Strip boundary messages,
# # preventing user with nearest commit to boundary owning the LOC
# blame_out = RE_BLAME_BOUNDS.sub('', blame_out)
#
# if until:
# # Strip boundary messages,
# # preventing user with nearest commit to boundary owning the LOC
# blame_out = RE_BLAME_BOUNDS.sub('', blame_out)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use some help here

@thehesiod thehesiod marked this pull request as ready for review November 22, 2024 03:24
@thehesiod thehesiod marked this pull request as draft November 22, 2024 03:39
@thehesiod thehesiod marked this pull request as ready for review November 22, 2024 06:31
Copy link
Owner

@casperdcl casperdcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using threads (e.g. concurrent.futures) instead of processes?

@thehesiod
Copy link
Author

What about using threads (e.g. concurrent.futures) instead of processes?

actually ya since the work is being done already by a separate process. I'll work on swapping it over

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants