You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i believe the bug is related to document splitting, specifically data.py line 222: text": doc['text'][low_offset: high_offset],
I changed the code so the text field only contains its repsective offset text. I have not verified, but when constructing the results text it will take the text field from the first chunk.
This is then corresponded to line 42 in inference.py: text = documents[doc][0]["text"]
Taking only the first document as text
We can either fix the code in inference or in data.
I believe merge #2 introduced a bug where some entities are missing their associated text in the output file. Here's a comparison:
New output file:
Old output file (from commit 11da870):
The text was updated successfully, but these errors were encountered: