Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only run semgrep for codemods with initial results #107

Merged
merged 1 commit into from
Oct 27, 2023
Merged

Conversation

drdavella
Copy link
Member

@drdavella drdavella commented Oct 27, 2023

Overview

Only run semgrep for codemods with initial results

Description

  • This is an optimization that is intended to reduce runtime for the case where many/all codemods are requested
  • The idea is to run semgrep once up-front to gather a list of applicable codemods and then to only run semgrep for those codemods that were identified in the initial pass
  • There are further optimizations available here but this is a simple change intended to improve performance for this particular case
  • Unfortunately this comes at the expense of doubling the semgrep cost of running a single codemod. We will address this later as it is not currently the bottleneck in performance.

@codecov
Copy link

codecov bot commented Oct 27, 2023

Codecov Report

Merging #107 (deb18c6) into main (9b80e96) will increase coverage by 0.16%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
+ Coverage   95.71%   95.87%   +0.16%     
==========================================
  Files          62       62              
  Lines        2523     2524       +1     
==========================================
+ Hits         2415     2420       +5     
+ Misses        108      104       -4     
Files Coverage Δ
src/codemodder/codemodder.py 96.80% <100.00%> (+0.18%) ⬆️
src/codemodder/sarifs.py 96.55% <100.00%> (+10.83%) ⬆️
src/codemodder/semgrep.py 94.73% <100.00%> (ø)

@drdavella drdavella marked this pull request as ready for review October 27, 2023 16:03
Copy link
Contributor

@andrecsilva andrecsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very skeptical this will have any actual impact on performance, otherwise it looks fine.

@drdavella drdavella merged commit a58f737 into main Oct 27, 2023
11 checks passed
@drdavella drdavella deleted the optimize-semgrep branch October 27, 2023 16:41
@drdavella
Copy link
Member Author

@andrecsilva I gathered performance data that indicates that our runtime is completely dominated by invoking semgrep in certain environments. Anything we do to reduce the number of times we invoke semgrep is going to have a huge impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants