GoLang implementation of Neural Machine Translation of Rare Words with Subword Units. It contains preprocessing scripts to segment text into subword units. The primary purpose is to facilitate the reproduction of our experiments on Neural Machine Translation with subword units.
go get github.com/khaibin/go-subwordnmt
package main
import (
"fmt"
"github.com/khaibin/go-subwordnmt"
)
func main() {
bpe := subwordnmt.FastBPE("path/to/codes", "path/to/vocab")
result1 := bpe.ApplyString([]string{
"Roasted barramundi fish",
"Centrally managed over a client-server architecture",
})
fmt.Println(result1)
result2 := bpe.Apply([][]string{
{"Roasted", "barramundi", "fish"},
{"Centrally", "managed", "over", "a", "client-server", "architecture"},
})
fmt.Println(result2)
}
The segmentation methods are described in:
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.