-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup process_uc_wildcards
#193
Merged
Merged
Changes from all commits
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
ef30184
Remove rows with duplicate query cols and cleanup handling of TFM var…
SamRWest b892397
made output parsing regex more robust
SamRWest 0eb5bdd
fixed parse_result
SamRWest 158ce33
Add loguru, poe and poe shortcuts
SamRWest 783cb88
support merging tables (as VEDA appears to) where come columns are op…
SamRWest df82670
Fixed indentation
SamRWest 6cfe42e
WIP prototype of more efficient uc_wildcards transform
SamRWest ed82d3d
Working prototype, ~10-20x speedup
SamRWest e12dfe6
Switched to ireland for unit test data
SamRWest dc07fae
formatting
SamRWest d199eeb
cleanup
SamRWest 5cdda5f
extra --debug logic
SamRWest f5da625
Merge branch 'main' into feature/wildcard_speedup
SamRWest a4d43fa
post merge fixes
SamRWest e388130
fix import
SamRWest 33fb41f
Corrected debug logic
SamRWest fcd11cd
remove shell=True in dd_to_csv run on non-windows OSes
SamRWest ff268dc
addressed review comments from @olejandro
SamRWest a2cbeb2
switched to lru_cache
SamRWest f015daa
logging tweaks
SamRWest b00e68e
Merged with main
SamRWest c7a1ab7
Merge branch 'main' into feature/wildcard_speedup
SamRWest ab76d8d
added extra check in matchers
SamRWest 8dea49f
Merge remote-tracking branch 'origin/feature/wildcard_speedup' into f…
SamRWest File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,3 +22,4 @@ docs/api/ | |
.coverage | ||
/out.txt | ||
*.log | ||
/profile.* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,10 +14,11 @@ | |
import git | ||
import pandas as pd | ||
import yaml | ||
from loguru import logger | ||
from tabulate import tabulate | ||
|
||
from dd_to_csv import main | ||
from xl2times import utils | ||
from xl2times.__main__ import parse_args, run | ||
from xl2times.utils import max_workers | ||
|
||
logger = utils.get_logger() | ||
|
@@ -146,7 +147,8 @@ def run_benchmark( | |
# First convert ground truth DD to csv | ||
if not skip_csv: | ||
shutil.rmtree(csv_folder, ignore_errors=True) | ||
if os.name != "nt": | ||
if not debug: | ||
# run as subprocess if not in --debug mode | ||
res = subprocess.run( | ||
[ | ||
"python", | ||
|
@@ -157,6 +159,7 @@ def run_benchmark( | |
stdout=subprocess.PIPE, | ||
stderr=subprocess.STDOUT, | ||
text=True, | ||
shell=True if os.name == "nt" else False, | ||
) | ||
if res.returncode != 0: | ||
# Remove partial outputs | ||
|
@@ -166,9 +169,12 @@ def run_benchmark( | |
sys.exit(5) | ||
else: | ||
# If debug option is set, run as a function call to allow stepping with a debugger. | ||
from dd_to_csv import main | ||
|
||
main([dd_folder, csv_folder]) | ||
try: | ||
main([dd_folder, csv_folder]) | ||
except Exception: | ||
logger.exception(f"dd_to_csv failed on {benchmark['name']}") | ||
shutil.rmtree(csv_folder, ignore_errors=True) | ||
sys.exit(5) | ||
|
||
elif not path.exists(csv_folder): | ||
logger.error(f"--skip_csv is true but {csv_folder} does not exist") | ||
|
@@ -189,22 +195,12 @@ def run_benchmark( | |
else: | ||
args.append(xl_folder) | ||
start = time.time() | ||
res = None | ||
if not debug: | ||
res = subprocess.run( | ||
["xl2times"] + args, | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.STDOUT, | ||
text=True, | ||
) | ||
else: | ||
# If debug option is set, run as a function call to allow stepping with a debugger. | ||
from xl2times.__main__ import run, parse_args | ||
|
||
summary = run(parse_args(args)) | ||
# Call the conversion function directly | ||
summary = run(parse_args(args)) | ||
|
||
# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above). | ||
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0) | ||
# pack the results into a namedtuple pretending to be a return value from a subprocess call (as above). | ||
res = namedtuple("stdout", ["stdout", "stderr", "returncode"])(summary, "", 0) | ||
|
||
runtime = time.time() - start | ||
|
||
|
@@ -255,8 +251,13 @@ def run_all_benchmarks( | |
debug=debug, | ||
) | ||
|
||
with ProcessPoolExecutor(max_workers=max_workers) as executor: | ||
results = list(executor.map(run_a_benchmark, benchmarks)) | ||
if debug: | ||
# bypass process pool and call benchmarks directly if --debug is set. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would love to have this documented in the CLI help for |
||
results = [run_a_benchmark(b) for b in benchmarks] | ||
else: | ||
with ProcessPoolExecutor(max_workers=max_workers) as executor: | ||
results = list(executor.map(run_a_benchmark, benchmarks)) | ||
|
||
logger.info("\n\n" + tabulate(results, headers, floatfmt=".1f") + "\n") | ||
|
||
if skip_regression: | ||
|
@@ -302,9 +303,10 @@ def run_all_benchmarks( | |
) | ||
sys.exit(8) | ||
|
||
# Re-run benchmarks on main | ||
# Re-run benchmarks on main - check it out and pull | ||
repo.heads.main.checkout() | ||
logger.info("Running benchmarks on main", end="", flush=True) | ||
origin.pull("main") # if main already exists, make sure it's up to date | ||
logger.info("Running benchmarks on main") | ||
run_a_benchmark = partial( | ||
run_benchmark, | ||
benchmarks_folder=benchmarks_folder, | ||
|
@@ -441,19 +443,20 @@ def run_all_benchmarks( | |
"--debug", | ||
action="store_true", | ||
default=False, | ||
help="Run each benchmark as a function call to allow a debugger to stop at breakpoints in benchmark runs.", | ||
help="Run each benchmark as a direct function call (disables subprocesses) to allow a debugger to stop at breakpoints " | ||
"in benchmark runs.", | ||
) | ||
args = args_parser.parse_args() | ||
|
||
spec = yaml.safe_load(open(args.benchmarks_yaml)) | ||
benchmarks_folder = spec["benchmarks_folder"] | ||
benchmark_names = [b["name"] for b in spec["benchmarks"]] | ||
if len(set(benchmark_names)) != len(benchmark_names): | ||
logger.error(f"Found duplicate name in benchmarks YAML file") | ||
logger.error("Found duplicate name in benchmarks YAML file") | ||
sys.exit(11) | ||
|
||
if args.dd and args.times_dir is None: | ||
logger.error(f"--times_dir is required when using --dd") | ||
logger.error("--times_dir is required when using --dd") | ||
sys.exit(12) | ||
|
||
if args.run is not None: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, thinking about it again, perhaps there is one reason to use subprocess -- at least in the CI should we check that
xl2times
works as expected from the command line? But on the other hand,run(parse_args(
is pretty much the same as the CLI invocation, and CI is probably faster without subprocess...I'm undecided, so would love your thoughts. And we can leave it as is in this PR and discuss in an issue, maybe?