You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are many cases where a token is better off invisible to a sequence tagger or shift-reduce parser (ex. the bullet in a bulleted list, ®, °, etc.). If such symbols have not been seen in training, they may have unexpected effects on the output sequence. It would be convenient to provide masked tokens for specific steps in the annotation pipeline using a regular expression or list of strings.
The text was updated successfully, but these errors were encountered:
valveil= text.indices
.filter { index =>valchar= text.charAt(index)
char =='®'|| char =='*'
}
.map { index => index to index }
valveiledText=newVeiledText(text, veil)
valdocument= processor.annotate(veiledText.mkDocument(processor))
There are many cases where a token is better off invisible to a sequence tagger or shift-reduce parser (ex. the bullet in a bulleted list, ®, °, etc.). If such symbols have not been seen in training, they may have unexpected effects on the output sequence. It would be convenient to provide masked tokens for specific steps in the annotation pipeline using a regular expression or list of strings.
The text was updated successfully, but these errors were encountered: