-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass include/exclude args to semgrep run #80
Conversation
Codecov Report
@@ Coverage Diff @@
## main #80 +/- ##
==========================================
+ Coverage 95.70% 95.72% +0.01%
==========================================
Files 48 48
Lines 1957 1964 +7
==========================================
+ Hits 1873 1880 +7
Misses 84 84
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but just as a sanity check how does it behave when you pass a line include/exclude with a line number (e.g. path/to/code.py:42
)
good catch, for some reason I thought they were separate flags, need to test that out. |
cc783bb
to
791a81b
Compare
if not pat.startswith("*") | ||
else parent_path + pat | ||
for pat in patterns | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to combine and avoid doing multiple list comp operations but it was honestly easier to iterate and easier to understand separating it. We can refactor later on with a generator or something.
Also, I don't feel super confident with this if not pat.startswith("*")
logic, but tests show it works right now. happy to revisit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
I think you can use a combination of glob
and pathlib
to solve issues with globs and relative paths
full_path = (Path(parent_path) / Path(pat)).resolve()
files = glob.glob(str(full_path))
There are two caveats: (1) resolve will also make it absolute, we may need to handle the current directory carefully, (2) glob will return actual files in the filesystem which is slower than fnmatch
files = match_files( | ||
dir_structure, exclude_paths=["**/samples/**", "**/more_samples/**"] | ||
) | ||
self._assert_expected(files, expected) | ||
|
||
def test_include_test_overridden_by_default_excludes(self, mocker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are using parent_dir as "." which does not work as expected. We will revisit
@andrecsilva @drdavella fyi I've request re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I'm just becoming more confused by this. I thought that each pattern should just be joined to the parent path and used that way consistently both internally and when passed to semgrep. If that's not working for some reason, I think we need to revisit the design.
|
||
# TODO: handle case when parent path is "." | ||
patterns = [ | ||
str(Path(parent_path) / Path(pat)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably faster with os.path.join
# TODO: handle case when parent path is "." | ||
patterns = [ | ||
str(Path(parent_path) / Path(pat)) | ||
if not pat.startswith("*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition doesn't make sense to me. Do tests fail without it? This means that the behavior is going to be entirely dependent on the presence or absence of a trailing slash in parent_path
, which I don't think is what we want. If my pattern is *.py
I want it to mean /path/to/foo/*.py
and not /path/to/foo*.py
.
I'm wondering whether removing this also correctly handles the .
case, although that would also be fixed by normalizing all of the given paths using os.path.abspath
.
command.extend( | ||
itertools.chain.from_iterable( | ||
map( | ||
lambda f: ["--exclude", f"{execution_context.directory}{f}"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there should be a path separator here.
) | ||
) | ||
if execution_context.path_include: | ||
# Note: parent path is not passed with --include |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this should be the case.
@clavedeluna I appreciate your work on this and am sorry that it became a bit of a tar bit. However, I think we're going to pursue a slightly different optimization here so I'm closing this PR (for now). |
Overview
Like codemodder, semgrep also has include/exclude cli args that we are now going to pass along to improve performance
Description
--include=foo.* --include=bar.*'
, just like we do with--config
new scope
while working on this PR new scope was added and some ticketed:
codemodder tests/samples
then a correct includes path should be--path-include=unverified_request.py
NOT--path-include=tests/samples/unverified_request.py
, for examplefnmatch
behavior, if a pattern is like"**.py"
we concat without a "/"Follow up work:
codemodder . ...args
. We don't correctly handle this type of relative path right now so I will ticket it.