Skip to content

Commit

Permalink
fix overwrite bug when adding symbol to dictionary
Browse files Browse the repository at this point in the history
This bug ignored the tokens that were meant to be overwritten and appends them to the end of the dictionary symbols.

For example, a dictionary with 50K tokens that already has `<s>`, `</s>`, `<pad>` and `<unk>` with the #fairseq:overwrite tag will end up having 50004 tokens when loaded.
  • Loading branch information
lydianish authored Sep 15, 2023
1 parent b5d89cd commit 3ed453a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion fairseq/data/dictionary.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ def unk_string(self, escape=False):

def add_symbol(self, word, n=1, overwrite=False):
"""Adds a word to the dictionary"""
if word in self.indices and not overwrite:
if word in self.indices and overwrite:
idx = self.indices[word]
self.count[idx] = self.count[idx] + n
return idx
Expand Down

0 comments on commit 3ed453a

Please sign in to comment.