We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I followed
python data.py <data_dir> python main.py <model_name> 256 0.02 cat data.dev.txt | python punctuator.py <model_path> <model_output_path>
I used the europarl-v7.de-en.de dataset and took
europarl-v7.de-en.de
1800 lines for ep.dev.txt 1800 lines for ep.test.txt 7200 lines for ep.train.txt
with data.dev.txt being a long string on one line from kaldi, a speech-to-text engine. It's all lowercase, sometimes wrong words and no punctuation.
data.dev.txt
<model_output_path> is equal to data.dev.txt
<model_output_path>
Is the solution to train more lines or do I have to preprocess data.dev.txt? If the latter, how?
The text was updated successfully, but these errors were encountered:
Push
Sorry, something went wrong.
I think you need to have sentences on a new line and many more samples. I used https://www.statmt.org/wmt14/training-monolingual-europarl-v7/europarl-v7.fr.gz and modified run.sh to use this file. Run it and see what output .txt files look like.
No branches or pull requests
I followed
I used the
europarl-v7.de-en.de
dataset and tookwith
data.dev.txt
being a long string on one line from kaldi, a speech-to-text engine. It's all lowercase, sometimes wrong words and no punctuation.<model_output_path>
is equal todata.dev.txt
Is the solution to train more lines or do I have to preprocess
data.dev.txt
? If the latter, how?The text was updated successfully, but these errors were encountered: