Clarification about alignment for transformers #368
Unanswered
kirianguiller
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone, thanks for all of your amazing work on Marian :).
I open this discussion (and hope it's in the right place) because I'm a little confuse with the alignment functionnality of Marian transformer.
So far, my understanding is that if you want to have a transformer model that does alignment, you need to preprocess alignment on your training corpus and feed it (with the --guided-alignment parameter).
However, because the transformer need to be fed sub tokens (preprocessed by sentencepiece or other tokenizer), I am assuming that the preprocessed alignment NEED to be base on the sub tokens.
Therefore, the pipeline for training would also NEED to be the following :
Am I correct ? Or is there a less cumbersome pipeline ? Or a pipeline that would only use token level word alignment ?
For a word level alignment, I guess it is simply impossible to get for the following reason :
Thanks in advance for your help !
Beta Was this translation helpful? Give feedback.
All reactions