Releases: nedap/deidentify
Releases · nedap/deidentify
v0.5.1
Merged pull requests:
- Fix version and tag creation in release script #31 (jantrienes)
v0.5.0
Merged pull requests:
- Update dependencies in environment.yml #30 (jantrienes)
- Remove upper bound on torch version #29 (jantrienes)
- Fix whitespace token issue with newer flair versions #28 (jantrienes)
- Fix call to run_deduce with "ons" corpus #27 (jantrienes)
v0.4.0
Merged pull requests:
- Invalidate semaphore cache when conda env changed #26 (jantrienes)
- Add dateinfer and nameparser to setup.py #25 (jantrienes)
- Add surrogate generation demo to README #24 (jantrienes)
- Add autopep8 and isort to dev requirements and remove pylint bound #23 (jantrienes)
- Gracefully handle surrogate replacement for shuffle without choices #22 (jantrienes)
- Add utility API to generate surrogates for a set of annotated documents #21 (jantrienes)
v0.3.3
Closed issues:
- Question about hardware required for model training #14
- Semaphore 2 : missing auto_cancel parameter in semaphore.yml #12
Merged pull requests:
- Fix BiLSTM-CRF with consecutive whitespace tokens #20 (jantrienes)
- Pin torch version in environment.yml #19 (jantrienes)
- Add efficiency benchmark #18 (jantrienes)
- Add requirements-dev.txt to list explicit dev requirements #17 (jantrienes)
- Add example on available tags per tagger #16 (jantrienes)
- Add brat annotation config #15 (jantrienes)
- Add semaphore auto_cancel for branches other than master #13 (jantrienes)
v0.3.2
v0.3.1
v0.3.0
Merged pull requests:
- Remove non-PyPI dependencies #9 (jantrienes)
- Add utility function to mask sensitive PHI with placeholder #8 (jantrienes)
- Add tagger parameter to configure pipeline verbosity #7 (jantrienes)
- Update example text in README #6 (jantrienes)
- Add a default cache path to store downloaded models #5 (jantrienes)
- Add LICENSE #4 (jantrienes)
- Clarify docs regarding experiment environment #3 (jantrienes)
- Add documentation #2 (jantrienes)
- Add model download script #1 (jantrienes)
model_crf_ons_tuned-v0.1.0
Evaluation:
python deidentify/evaluation/evaluate_run.py nl data/corpus/ons/test/ data/corpus/ons/test/ output/predictions/ons/crf_regularization_rs/test/
entity_level tp: 2817 - fp: 247 - fn: 820 - tn: 0 - precision: 0.9194 - recall: 0.7745 - accuracy: 0.7253 - f1-score: 0.8408
Address tp: 118 - fp: 31 - fn: 38 - tn: 0 - precision: 0.7919 - recall: 0.7564 - accuracy: 0.6310 - f1-score: 0.7737
Age tp: 15 - fp: 10 - fn: 26 - tn: 0 - precision: 0.6000 - recall: 0.3659 - accuracy: 0.2941 - f1-score: 0.4546
Care_Institute tp: 129 - fp: 26 - fn: 87 - tn: 0 - precision: 0.8323 - recall: 0.5972 - accuracy: 0.5331 - f1-score: 0.6954
Date tp: 711 - fp: 54 - fn: 92 - tn: 0 - precision: 0.9294 - recall: 0.8854 - accuracy: 0.8296 - f1-score: 0.9069
Email tp: 10 - fp: 1 - fn: 0 - tn: 0 - precision: 0.9091 - recall: 1.0000 - accuracy: 0.9091 - f1-score: 0.9524
Hospital tp: 6 - fp: 1 - fn: 4 - tn: 0 - precision: 0.8571 - recall: 0.6000 - accuracy: 0.5455 - f1-score: 0.7059
ID tp: 9 - fp: 3 - fn: 16 - tn: 0 - precision: 0.7500 - recall: 0.3600 - accuracy: 0.3214 - f1-score: 0.4865
Initials tp: 96 - fp: 16 - fn: 82 - tn: 0 - precision: 0.8571 - recall: 0.5393 - accuracy: 0.4948 - f1-score: 0.6620
Internal_Location tp: 19 - fp: 6 - fn: 36 - tn: 0 - precision: 0.7600 - recall: 0.3455 - accuracy: 0.3115 - f1-score: 0.4750
Name tp: 1623 - fp: 85 - fn: 318 - tn: 0 - precision: 0.9502 - recall: 0.8362 - accuracy: 0.8011 - f1-score: 0.8896
Organization_Company tp: 57 - fp: 14 - fn: 79 - tn: 0 - precision: 0.8028 - recall: 0.4191 - accuracy: 0.3800 - f1-score: 0.5507
Other tp: 0 - fp: 0 - fn: 4 - tn: 0 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 16 - fp: 0 - fn: 0 - tn: 0 - precision: 1.0000 - recall: 1.0000 - accuracy: 1.0000 - f1-score: 1.0000
Profession tp: 5 - fp: 0 - fn: 37 - tn: 0 - precision: 1.0000 - recall: 0.1190 - accuracy: 0.1190 - f1-score: 0.2127
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 0 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token_level tp: 4497 - fp: 281 - fn: 898 - tn: 1811034 - precision: 0.9412 - recall: 0.8335 - accuracy: 0.7923 - f1-score: 0.8841
Address tp: 193 - fp: 35 - fn: 53 - tn: 120833 - precision: 0.8465 - recall: 0.7846 - accuracy: 0.6868 - f1-score: 0.8144
Age tp: 28 - fp: 13 - fn: 33 - tn: 121040 - precision: 0.6829 - recall: 0.4590 - accuracy: 0.3784 - f1-score: 0.5490
Care_Institute tp: 248 - fp: 35 - fn: 99 - tn: 120732 - precision: 0.8763 - recall: 0.7147 - accuracy: 0.6492 - f1-score: 0.7873
Date tp: 1809 - fp: 69 - fn: 62 - tn: 119174 - precision: 0.9633 - recall: 0.9669 - accuracy: 0.9325 - f1-score: 0.9651
Email tp: 10 - fp: 1 - fn: 0 - tn: 121103 - precision: 0.9091 - recall: 1.0000 - accuracy: 0.9091 - f1-score: 0.9524
Hospital tp: 10 - fp: 2 - fn: 4 - tn: 121098 - precision: 0.8333 - recall: 0.7143 - accuracy: 0.6250 - f1-score: 0.7692
ID tp: 9 - fp: 3 - fn: 16 - tn: 121086 - precision: 0.7500 - recall: 0.3600 - accuracy: 0.3214 - f1-score: 0.4865
Initials tp: 99 - fp: 13 - fn: 86 - tn: 120916 - precision: 0.8839 - recall: 0.5351 - accuracy: 0.5000 - f1-score: 0.6666
Internal_Location tp: 33 - fp: 6 - fn: 59 - tn: 121016 - precision: 0.8462 - recall: 0.3587 - accuracy: 0.3367 - f1-score: 0.5038
Name tp: 1891 - fp: 81 - fn: 324 - tn: 118818 - precision: 0.9589 - recall: 0.8537 - accuracy: 0.8236 - f1-score: 0.9032
Organization_Company tp: 104 - fp: 23 - fn: 104 - tn: 120883 - precision: 0.8189 - recall: 0.5000 - accuracy: 0.4502 - f1-score: 0.6209
Other tp: 0 - fp: 0 - fn: 5 - tn: 121109 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 38 - fp: 0 - fn: 0 - tn: 121076 - precision: 1.0000 - recall: 1.0000 - accuracy: 1.0000 - f1-score: 1.0000
Profession tp: 22 - fp: 0 - fn: 52 - tn: 121040 - precision: 1.0000 - recall: 0.2973 - accuracy: 0.2973 - f1-score: 0.4583
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 121110 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token_blind tp: 4602 - fp: 176 - fn: 793 - tn: 115543 - precision: 0.9632 - recall: 0.8530 - accuracy: 0.8261 - f1-score: 0.9048
ENT tp: 4602 - fp: 176 - fn: 793 - tn: 115543 - precision: 0.9632 - recall: 0.8530 - accuracy: 0.8261 - f1-score: 0.9048
model_bilstmcrf_ons_large-v0.1.0
Evaluation:
python deidentify/evaluation/evaluate_run.py nl data/corpus/ons/test/ data/corpus/ons/test/ output/predictions/ons/bilstmcrf_flair-0.4.3/test/
entity_level tp: 3184 - fp: 255 - fn: 453 - tn: 0 - precision: 0.9259 - recall: 0.8754 - accuracy: 0.8181 - f1-score: 0.8999
Address tp: 136 - fp: 15 - fn: 20 - tn: 0 - precision: 0.9007 - recall: 0.8718 - accuracy: 0.7953 - f1-score: 0.8860
Age tp: 30 - fp: 9 - fn: 11 - tn: 0 - precision: 0.7692 - recall: 0.7317 - accuracy: 0.6000 - f1-score: 0.7500
Care_Institute tp: 145 - fp: 50 - fn: 71 - tn: 0 - precision: 0.7436 - recall: 0.6713 - accuracy: 0.5451 - f1-score: 0.7056
Date tp: 731 - fp: 52 - fn: 72 - tn: 0 - precision: 0.9336 - recall: 0.9103 - accuracy: 0.8550 - f1-score: 0.9218
Email tp: 10 - fp: 1 - fn: 0 - tn: 0 - precision: 0.9091 - recall: 1.0000 - accuracy: 0.9091 - f1-score: 0.9524
Hospital tp: 7 - fp: 2 - fn: 3 - tn: 0 - precision: 0.7778 - recall: 0.7000 - accuracy: 0.5833 - f1-score: 0.7369
ID tp: 11 - fp: 5 - fn: 14 - tn: 0 - precision: 0.6875 - recall: 0.4400 - accuracy: 0.3667 - f1-score: 0.5366
Initials tp: 111 - fp: 14 - fn: 67 - tn: 0 - precision: 0.8880 - recall: 0.6236 - accuracy: 0.5781 - f1-score: 0.7327
Internal_Location tp: 31 - fp: 10 - fn: 24 - tn: 0 - precision: 0.7561 - recall: 0.5636 - accuracy: 0.4769 - f1-score: 0.6458
Name tp: 1864 - fp: 69 - fn: 77 - tn: 0 - precision: 0.9643 - recall: 0.9603 - accuracy: 0.9274 - f1-score: 0.9623
Organization_Company tp: 78 - fp: 26 - fn: 58 - tn: 0 - precision: 0.7500 - recall: 0.5735 - accuracy: 0.4815 - f1-score: 0.6500
Other tp: 0 - fp: 0 - fn: 4 - tn: 0 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 16 - fp: 0 - fn: 0 - tn: 0 - precision: 1.0000 - recall: 1.0000 - accuracy: 1.0000 - f1-score: 1.0000
Profession tp: 11 - fp: 1 - fn: 31 - tn: 0 - precision: 0.9167 - recall: 0.2619 - accuracy: 0.2558 - f1-score: 0.4074
SSN tp: 0 - fp: 1 - fn: 0 - tn: 0 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 0 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token_level tp: 4930 - fp: 269 - fn: 464 - tn: 1811032 - precision: 0.9483 - recall: 0.9140 - accuracy: 0.8706 - f1-score: 0.9308
Address tp: 222 - fp: 10 - fn: 24 - tn: 120857 - precision: 0.9569 - recall: 0.9024 - accuracy: 0.8672 - f1-score: 0.9289
Age tp: 48 - fp: 12 - fn: 13 - tn: 121040 - precision: 0.8000 - recall: 0.7869 - accuracy: 0.6575 - f1-score: 0.7934
Care_Institute tp: 269 - fp: 58 - fn: 78 - tn: 120708 - precision: 0.8226 - recall: 0.7752 - accuracy: 0.6642 - f1-score: 0.7982
Date tp: 1829 - fp: 53 - fn: 42 - tn: 119189 - precision: 0.9718 - recall: 0.9776 - accuracy: 0.9506 - f1-score: 0.9747
Email tp: 10 - fp: 1 - fn: 0 - tn: 121102 - precision: 0.9091 - recall: 1.0000 - accuracy: 0.9091 - f1-score: 0.9524
Hospital tp: 11 - fp: 2 - fn: 3 - tn: 121097 - precision: 0.8462 - recall: 0.7857 - accuracy: 0.6875 - f1-score: 0.8148
ID tp: 11 - fp: 5 - fn: 13 - tn: 121084 - precision: 0.6875 - recall: 0.4583 - accuracy: 0.3793 - f1-score: 0.5500
Initials tp: 113 - fp: 12 - fn: 72 - tn: 120916 - precision: 0.9040 - recall: 0.6108 - accuracy: 0.5736 - f1-score: 0.7290
Internal_Location tp: 50 - fp: 11 - fn: 42 - tn: 121010 - precision: 0.8197 - recall: 0.5435 - accuracy: 0.4854 - f1-score: 0.6536
Name tp: 2156 - fp: 76 - fn: 59 - tn: 118822 - precision: 0.9659 - recall: 0.9734 - accuracy: 0.9411 - f1-score: 0.9696
Organization_Company tp: 130 - fp: 29 - fn: 78 - tn: 120876 - precision: 0.8176 - recall: 0.6250 - accuracy: 0.5485 - f1-score: 0.7084
Other tp: 0 - fp: 0 - fn: 5 - tn: 121108 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 38 - fp: 0 - fn: 0 - tn: 121075 - precision: 1.0000 - recall: 1.0000 - accuracy: 1.0000 - f1-score: 1.0000
Profession tp: 40 - fp: 0 - fn: 34 - tn: 121039 - precision: 1.0000 - recall: 0.5405 - accuracy: 0.5405 - f1-score: 0.7017
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 121109 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token_blind tp: 5035 - fp: 165 - fn: 360 - tn: 115554 - precision: 0.9683 - recall: 0.9333 - accuracy: 0.9056 - f1-score: 0.9505
ENT tp: 5035 - fp: 165 - fn: 360 - tn: 115554 - precision: 0.9683 - recall: 0.9333 - accuracy: 0.9056 - f1-score: 0.9505
model_bilstmcrf_ons_fast-v0.1.0
Evaluation:
python deidentify/evaluation/evaluate_run.py nl data/corpus/ons/test/ data/corpus/ons/test/ output/predictions/ons/bilstmcrf_dutch-fast-flair-embeddings/test/
entity_level tp: 3158 - fp: 319 - fn: 479 - tn: 0 - precision: 0.9083 - recall: 0.8683 - accuracy: 0.7983 - f1-score: 0.8878
Address tp: 132 - fp: 27 - fn: 24 - tn: 0 - precision: 0.8302 - recall: 0.8462 - accuracy: 0.7213 - f1-score: 0.8381
Age tp: 29 - fp: 8 - fn: 12 - tn: 0 - precision: 0.7838 - recall: 0.7073 - accuracy: 0.5918 - f1-score: 0.7436
Care_Institute tp: 132 - fp: 60 - fn: 84 - tn: 0 - precision: 0.6875 - recall: 0.6111 - accuracy: 0.4783 - f1-score: 0.6471
Date tp: 731 - fp: 60 - fn: 72 - tn: 0 - precision: 0.9241 - recall: 0.9103 - accuracy: 0.8470 - f1-score: 0.9171
Email tp: 10 - fp: 0 - fn: 0 - tn: 0 - precision: 1.0000 - recall: 1.0000 - accuracy: 1.0000 - f1-score: 1.0000
Hospital tp: 6 - fp: 1 - fn: 4 - tn: 0 - precision: 0.8571 - recall: 0.6000 - accuracy: 0.5455 - f1-score: 0.7059
ID tp: 10 - fp: 6 - fn: 15 - tn: 0 - precision: 0.6250 - recall: 0.4000 - accuracy: 0.3226 - f1-score: 0.4878
Initials tp: 115 - fp: 23 - fn: 63 - tn: 0 - precision: 0.8333 - recall: 0.6461 - accuracy: 0.5721 - f1-score: 0.7279
Internal_Location tp: 25 - fp: 10 - fn: 30 - tn: 0 - precision: 0.7143 - recall: 0.4545 - accuracy: 0.3846 - f1-score: 0.5555
Name tp: 1862 - fp: 89 - fn: 79 - tn: 0 - precision: 0.9544 - recall: 0.9593 - accuracy: 0.9172 - f1-score: 0.9568
Organization_Company tp: 72 - fp: 29 - fn: 64 - tn: 0 - precision: 0.7129 - recall: 0.5294 - accuracy: 0.4364 - f1-score: 0.6076
Other tp: 0 - fp: 0 - fn: 4 - tn: 0 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 16 - fp: 3 - fn: 0 - tn: 0 - precision: 0.8421 - recall: 1.0000 - accuracy: 0.8421 - f1-score: 0.9143
Profession tp: 15 - fp: 3 - fn: 27 - tn: 0 - precision: 0.8333 - recall: 0.3571 - accuracy: 0.3333 - f1-score: 0.5000
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 0 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token_level tp: 4897 - fp: 354 - fn: 498 - tn: 1810961 - precision: 0.9326 - recall: 0.9077 - accuracy: 0.8518 - f1-score: 0.9200
Address tp: 216 - fp: 25 - fn: 30 - tn: 120843 - precision: 0.8963 - recall: 0.8780 - accuracy: 0.7970 - f1-score: 0.8871
Age tp: 47 - fp: 11 - fn: 14 - tn: 121042 - precision: 0.8103 - recall: 0.7705 - accuracy: 0.6528 - f1-score: 0.7899
Care_Institute tp: 251 - fp: 71 - fn: 96 - tn: 120696 - precision: 0.7795 - recall: 0.7233 - accuracy: 0.6005 - f1-score: 0.7503
Date tp: 1827 - fp: 69 - fn: 44 - tn: 119174 - precision: 0.9636 - recall: 0.9765 - accuracy: 0.9418 - f1-score: 0.9700
Email tp: 10 - fp: 0 - fn: 0 - tn: 121104 - precision: 1.0000 - recall: 1.0000 - accuracy: 1.0000 - f1-score: 1.0000
Hospital tp: 10 - fp: 2 - fn: 4 - tn: 121098 - precision: 0.8333 - recall: 0.7143 - accuracy: 0.6250 - f1-score: 0.7692
ID tp: 10 - fp: 6 - fn: 15 - tn: 121083 - precision: 0.6250 - recall: 0.4000 - accuracy: 0.3226 - f1-score: 0.4878
Initials tp: 126 - fp: 13 - fn: 59 - tn: 120916 - precision: 0.9065 - recall: 0.6811 - accuracy: 0.6364 - f1-score: 0.7778
Internal_Location tp: 43 - fp: 15 - fn: 49 - tn: 121007 - precision: 0.7414 - recall: 0.4674 - accuracy: 0.4019 - f1-score: 0.5733
Name tp: 2154 - fp: 94 - fn: 61 - tn: 118805 - precision: 0.9582 - recall: 0.9725 - accuracy: 0.9329 - f1-score: 0.9653
Organization_Company tp: 117 - fp: 38 - fn: 91 - tn: 120868 - precision: 0.7548 - recall: 0.5625 - accuracy: 0.4756 - f1-score: 0.6446
Other tp: 0 - fp: 0 - fn: 5 - tn: 121109 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
Phone_fax tp: 38 - fp: 3 - fn: 0 - tn: 121073 - precision: 0.9268 - recall: 1.0000 - accuracy: 0.9268 - f1-score: 0.9620
Profession tp: 45 - fp: 7 - fn: 29 - tn: 121033 - precision: 0.8654 - recall: 0.6081 - accuracy: 0.5556 - f1-score: 0.7143
URL_IP tp: 3 - fp: 0 - fn: 1 - tn: 121110 - precision: 1.0000 - recall: 0.7500 - accuracy: 0.7500 - f1-score: 0.8571
token_blind tp: 5036 - fp: 215 - fn: 359 - tn: 115504 - precision: 0.9591 - recall: 0.9335 - accuracy: 0.8977 - f1-score: 0.9461
ENT tp: 5036 - fp: 215 - fn: 359 - tn: 115504 - precision: 0.9591 - recall: 0.9335 - accuracy: 0.8977 - f1-score: 0.9461