Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
CU-8693wtx4b: Add code and notebook to compare two models (#13)
* CU-8693wtx4b: Add code and notebook to compare two models * CU-8693wtx4: Add working output (synthetic data) to notebook * CU-8693wtx4: Show cat data in raw format * CU-8693wtx4: Add documentation to notebook * CU-8693wtx4: Add per-cui output to code and notebook * CU-8693wtx4: Add test to GHA workflow * CU-8693wtx4: Remove old comments/commented code * CU-8693wtx4: Add option to filter tally results by CUI; along with relevant tests. * CU-8693wtx4: Add option to filter by CUI when comparing * CU-8693wtx4: Add option to filter by CUI when comparing * CU-8693wtx4: Fix typing issue * CU-8693wtx4: Fix typing issue [#2] * CU-8693wtx4: Add typing temp-issue to non-related files * CU-8693wtx4: Add typing temp-issue to non-related files [#2] * CU-8693wtx4b: Remove unnecessary whitespace * CU-8693wtx4b: Add option to iterate over annotation pairs * CU-8693wtx4b: Add tests for iteration over annotation pairs * CU-8693wtx4b: Add option to iteration over subset of annotation pairs (per document ids) * CU-8693wtx4b: Add some documentation for annotation pair iteration * CU-8693wtx4b: Add annotation pair iteration to notebook * CU-8693wtx4b: Add option (defaulted to True) to omit identical to annotation pair iteration * CU-8693wtx4b: Update notebook to omit identical in annotation pair iteration * CU-8693wtx4b: Update notebook with some extra documentation * CU-8693wtx4b: Add further assertion to iteration with filter when testing * CU-8693wtx4b: Fix non-overlapping annotations * CU-8693wtx4b: Add more comprehensive test for non-overlapping annotations * CU-8693wtx4b: Allow none-valued dicst in dict comparison for output * CU-8693wtx4b: Add tests for none-valued dicts in output * CU-8693wtx4b: Add better handling of empty dicst in comparison in output * CU-8693wtx4b: Remove debug output * CU-8693wtx4b: Add more useful nulled dicts along with tests for them * CU-8693wtx4b: Update notebook with recent changes * CU-8693wtx4b: Remove debug output * CU-8693wtx4b: Fix typing issues * CU-8693wtx4b: Remove ignore file from mct_analysis * CU-8693wtx4b: Add option to recognise children and grandchildren as * CU-8693wtx4b: Fix typo * CU-8693wtx4b: Fix typo (#2) * CU-8693wtx4b: Avoid annotating twice over the dataset * CU-8693wtx4b: Fix issue with raw annotations being empty at tally time * CU-869475m5q: Allow differentiating CUIs that one of the two models does not include * CU-869475m5q: Update notebook with latest output * CU-869475h56: Allow CUI filter to be specified by a file with a list of CUIs; Also make sure CUI filtering is used during annotation * CU-869475h56: Update notebook with file based CUI filter information * CU-869475m5q: Fix tests: add necessary per model CUI sets where applicable * CU-869475h38: Allow including children for per-cui view * CU-869475h38: Make per-cui names easier to read * CU-869475h38: Update notebook * CU-86948wv58: Add method to get CSV output of annotations * CU-86948wv58: Improve documentation * CU-86948wv58: Update model comparison with CSV output part * CU-86948wv58: Fix CSV output relative start and end for annotations * CU-86948wv58: Fix typing for annotations when creating CSV * CU-86948wv58: Fix typing when creating sub-text for CSV * CU-869498jf4: Add option to add children to CUI filter along with tests for getting said filter * CU-869498jf4: Update notebook with notes regarding children in filter * CU-8693wtx4b: Fix whitespace * CU-869498jf4: Add more output regarding filter after adding children * CU-869498jf4: Add children from both models if possible * CU-869475h0e: Add option to compare to a base model + supervised trainig * CU-869475h0e: Add tests for base model + supervised training; Add fake test data * CU-869475h0e: Add note for supervised training-based comparison * CU-869475h0e: Add missing notes for supervised training-based comparison * CU-869475h78: Add markdown to dict comparison * CU-869475h78: Fix notebook output for per-annotation differences * CU-869475h78: Automatically use markdown when notebook output required * CU-869475h78: Automatically detect when in notebook (by default) * CU-869475h78: Update notebook with tabular / markdown output * CU-8694ezhrn: Load documents as iterator instead of holding them in memory * CU-8694ezhrn: Add option to omit raw text * CU-8694ezhrn: Add tests when omitting raw text * CU-8694ezhrn: Propgate keeping raw option properly * CU-8694ezhrn: Propgate keeping raw option properly (2) * CU-8694ezhrn: Do not store copies of existing dicts in annotation pairs * CU-8693wtx4: Improve memory usage by saving the annotation differences on disk (by default) and reading them 100 at a time * CU-8693wtx4: Allow filtering by comparison type when converting to CSV * CU-8693wtx4: Add tests for filtering by comparison type when iterating to save as CSV * CU-8693wtx4: Fix model type in case of using DB for annotation pairs * CU-8694ggtgm: Improve initial documentation for model input * CU-8694ggtgm: Add more accurate differences for two approaches in initial documentation * CU-8694ggtgm: Add code for supervised training comparison (but commented) * CU-8694ggtgm: Add widget input * CU-8694ggtgm: Small documentation fix * CU-8694ggtgm: Add new dependencies to requirementss file * CU-8694ggtgm: Add input data validation * CU-8694ggtgm: Move input data validation to its own module * CU-8694ggtgm: Add further validation for medcat models * CU-8694ggtgm: Add further validation for documents file * CU-8694ggtgm: Add wildcard MCT export validation * CU-8694ggtgm: Replace use of 3.9+ exclusive string method
- Loading branch information