"Enhancements: Batch Processing, Crosswalk Table, and Matching Toggle" #73
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request introduces enhancements to the Harmony Python library by addressing two key improvements:
Batch Processing in matcher.py (Task 1):
Implemented batch processing to handle large datasets more efficiently.
Introduced an environment variable, BATCH_SIZE, which allows configurable batch sizes for processing.
Unit Tests for Batch Processing (Task 2):
Added comprehensive unit tests to ensure the batch processing functionality works as expected under various scenarios (e.g., empty lists, single-item batches, large datasets).
Verified edge cases to maintain reliability and backward compatibility.
No new third-party dependencies were added to requirements.txt or pyproject.toml.
Fixes # (issue)
Type of change
Please delete options that are not relevant.
Testing
The following tests were conducted to validate the changes:
Unit Tests:
Verified that items are correctly divided into batches based on the specified BATCH_SIZE.
Tested various edge cases such as empty lists, single-item inputs, and batch sizes larger than the dataset.
Ensured the batch processing logic integrates smoothly with existing functionality.
Manual Verification:
Ran unit tests on the Harmony Python library to confirm no regressions or unintended behavior.
Validated that the Harmony API continues to function correctly with these changes.
Since the Harmony Python package is used by the Harmony API (which is itself used by the R library and the web app), we need to avoid making any changes that break the Harmony API. Please also run the Harmony API unit tests and check that the API still runs with your changes to the Python package: https://github.com/harmonydata/harmonyapi
Test Configuration
Checklist
Optionally: feel free to paste your Discord username in this format:
discordapp.com/users/yourID
in your pull request description, then we can know to tag you in the Harmony Discord server when we announce the PR.