Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 639 Bytes

README.md

File metadata and controls

7 lines (4 loc) · 639 Bytes

ER-panama-papers

Repository containing 4 possible approaches for entity resolution on one of ICIJ datasets containing personal information of people involved in Panama Papers leak.

The 4 proposed methods are Fuzzy 1 feat, TFIDF 1 feat, TFIDF 4 feats, TFIDF 4 feats+FUZZY 1 feat (Hybrid method) and are explained in the report inside the repository and you can find also the notebooks to run this experiments.

You can also find "check_one_person_4_feats.py" and "check_one_person_1_feat.ipynb" which are simple and interactive frameworks to search personal information inside our subset of the Panama Papers in less than 3 seconds!