Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replaces #72 since similar changes have gotten pulled into
other PRs and I don't want merge conflicts.
Benchmark:
time cat data/huwikisource-latest-pages-meta-current.xml | mycommand > /dev/null
System: Macbook Pro mid-2014 edition, 2.2 GHz Core i7
Times: real / CPU/ sys
Bash optimization history:
0. Baseline: 30.472s / 34.201s / 0.953s - output md5sum: 8ed673317ddae1eca0f0f76bb4b4605e
Need to reconcile that difference here, it was not that way before
Note that with Mac we don't get parallel sort here. If I run the second one in an ubuntu:16.04 docker machine on the same host.
Bash TODOS:
Python optimizations: