Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

How to extract bilingual dictionary from parallel data? #1

Open
HuihuiChyan opened this issue Nov 3, 2022 · 1 comment
Open

How to extract bilingual dictionary from parallel data? #1

HuihuiChyan opened this issue Nov 3, 2022 · 1 comment

Comments

@HuihuiChyan
Copy link

Thank you for your inspiring work. However, I notice that you assume there is little parallel data, and you contruct synthetic parallel data with CRISS. So I wonder what is the best practice if I have a lot of parallel data, and want to induce a bilingual dictionary?
Thank you in advance!

@sidaw
Copy link
Contributor

sidaw commented Nov 4, 2022

If you have sufficient parallel data you can use them as input to our method for extracting entries.
You could also use SimAlign directly. The difference is there will more noise in the SimAlign outputs (but higher recall), whereas our method optimizes for a higher precision dictionary on top of SimAlign outputs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants