Skip to content

Commit

Permalink
Fs encoding fix (#320)
Browse files Browse the repository at this point in the history
* Force encoding to utf-8

* Update change log
---------

Co-authored-by: Adam REMAKI
  • Loading branch information
Aremaki authored Sep 9, 2024
1 parent 1dbccec commit 21833eb
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
1 change: 1 addition & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
### Fixed

- Numbers are now only detected without trying to remove the pollution in between digits, ie `55 @ 77777` could be detected as a full number before, but not anymore.
- Fix fsspec open file encoding to "utf-8".

### Changed

Expand Down
4 changes: 2 additions & 2 deletions edsnlp/data/standoff.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def parse_standoff_file(
relations = []
events = {}

with fs.open(txt_path, "r") as f:
with fs.open(txt_path, "r", encoding="utf-8") as f:
text = f.read()

if not len(ann_paths):
Expand All @@ -86,7 +86,7 @@ def parse_standoff_file(
}

for ann_file in ann_paths:
with fs.open(ann_file, "r") as f:
with fs.open(ann_file, "r", encoding="utf-8") as f:
for line_idx, line in enumerate(f):
try:
if line.startswith("T"):
Expand Down

0 comments on commit 21833eb

Please sign in to comment.