Skip to content

Commit

Permalink
Adding symbols with overwrite=True in encode_line and add_file_to_dic…
Browse files Browse the repository at this point in the history
…tionary

After fixing the behaviour of add_symbol, two of the unit tests were failing because they called the function with the default value of overwrite (False).
lydianish authored Sep 21, 2023
1 parent 0968083 commit eed21c0
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions fairseq/data/dictionary.py
Original file line number Diff line number Diff line change
@@ -337,7 +337,7 @@ def encode_line(

for i, word in enumerate(words):
if add_if_not_exist:
idx = self.add_symbol(word)
idx = self.add_symbol(word, overwrite=True)
else:
idx = self.index(word)
if consumer is not None:
@@ -367,7 +367,7 @@ def _add_file_to_dictionary_single_worker(
def add_file_to_dictionary(filename, dict, tokenize, num_workers):
def merge_result(counter):
for w, c in sorted(counter.items()):
dict.add_symbol(w, c)
dict.add_symbol(w, c, overwrite=True)

local_file = PathManager.get_local_path(filename)
offsets = find_offsets(local_file, num_workers)

0 comments on commit eed21c0

Please sign in to comment.