You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When doing NER/NEL to UMLS/CUI entities, is there any way to configure the nlp pipe to exclude candidates by a predefined filtering list of CUIs or TUIs? e.g. to exclude any detected CUIs with TUI: T079 (Temporal Concept)?
Currently I'm doing it by post-hoc filtering, which is both inelegant, inneffecient, and doesn't help remove noisy detections. i.e., if the linker returns the first detected entity froma text, then post-hoc filtering to remove the TUI means I miss the relevant entities.
Current code extract:
`nlp.add_pipe("scispacy_linker",
config={"resolve_abbreviations": True,
"linker_name": "umls",
"max_entities_per_mention": 4, #5
"threshold":0.87 ## default is 0.8, paper mentions 0.99 as thresh
})
#...
EXCLUDE_TUIS_LIST = ["T079","T093"] #List of umls cui semtypes to exclude.
for f in icu_feature_terms["name"]:
print(f)
doc =nlp(f)
linker = nlp.get_pipe("scispacy_linker")
if len(doc.ents)>0:
for j,entity in enumerate(doc.ents):
print(f"Entity #{j}:{entity}")
list_feature_cuis = [i[0] for i in entity._.kb_ents]
## add tui filt
s1 = len(list_feature_cuis)
# print(s1)
tui_filter_mask = [linker.kb.cui_to_entity[c][3][0] not in EXCLUDE_TUIS_LIST for c in list_feature_cuis]
list_feature_cuis = list(compress(list_feature_cuis,tui_filter_mask))
list_cuis_nomenclatures = [linker.kb.cui_to_entity[i[0]][1] for i in entity._.kb_ents]
# linker = nlp.get_pipe("scispacy_linker") #ORIG
list_cuis_nomenclatures = list(compress(list_cuis_nomenclatures,tui_filter_mask))
num_candidates = len(list_feature_cuis)
for c in list_feature_cuis:
TUIs_list.append(linker.kb.cui_to_entity[c][3][0]) # c[0]][3][0])
for cui in list_feature_cuis:
novel_cols_candidates_names.extend([f]*(num_candidates))
novel_candidate_cuis.extend(list_feature_cuis)
novel_candidate_cuis_nomenclatures.extend(list_cuis_nomenclatures)
else:
no_entities_list.append(f)
print(f"No Entity candidates for {f}")
`
The text was updated successfully, but these errors were encountered:
Hi, this is not something exists right now, although is a reasonable feature request if you wanted to give implementing it a go! Otherwise, I recommend doing what you are doing and post hoc filtering (setting the threshold such that you get enough candidates after filtering)
When doing NER/NEL to UMLS/CUI entities, is there any way to configure the nlp pipe to exclude candidates by a predefined filtering list of CUIs or TUIs? e.g. to exclude any detected CUIs with TUI: T079 (Temporal Concept)?
Currently I'm doing it by post-hoc filtering, which is both inelegant, inneffecient, and doesn't help remove noisy detections. i.e., if the linker returns the first detected entity froma text, then post-hoc filtering to remove the TUI means I miss the relevant entities.
Current code extract:
`nlp.add_pipe("scispacy_linker",
config={"resolve_abbreviations": True,
"linker_name": "umls",
"max_entities_per_mention": 4, #5
"threshold":0.87 ## default is 0.8, paper mentions 0.99 as thresh
})
#...
EXCLUDE_TUIS_LIST = ["T079","T093"] #List of umls cui semtypes to exclude.
novel_cols_candidates_names = []
no_entities_list = []
novel_candidate_cuis = []
novel_candidate_cuis_nomenclatures = []
TUIs_list = []
for f in icu_feature_terms["name"]:
print(f)
doc =nlp(f)
linker = nlp.get_pipe("scispacy_linker")
`
The text was updated successfully, but these errors were encountered: