eof Error #41

ankur220693 · 2019-02-13T12:32:45Z

python3.4 train_pos_mai.py

Using TensorFlow backend.
Generate new embeddings files for a dataset
Read file: komninos_english_embeddings.gz
Traceback (most recent call last):
File "Train_POS.py", line 48, in
pickleFile = perpareDataset(embeddingsPath, datasets)
File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 42, in perpareDataset
embeddings, word2Idx = readEmbeddings(embeddingsPath, datasets, frequencyThresholdUnknownTokens, reducePretrainedEmbeddings)
File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 135, in readEmbeddings
for line in embeddingsIn:
File "/usr/lib64/python3.4/gzip.py", line 389, in read1
while self.extrasize <= 0 and self._read():
File "/usr/lib64/python3.4/gzip.py", line 449, in _read
self._read_eof()
File "/usr/lib64/python3.4/gzip.py", line 482, in _read_eof
crc32, isize = struct.unpack("<II", self._read_exact(8))
File "/usr/lib64/python3.4/gzip.py", line 286, in _read_exact
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

nreimers · 2019-02-13T12:41:54Z

Appears like you have an incomplete file downloaded. Maybe removing and trying again solves it.

Also test it with Python 3.6, I sadly can't help with any old Python versions

ankur220693 · 2019-02-13T12:53:56Z

thanks

ankur220693 · 2019-02-18T13:20:27Z

Unable to fetch this error?

python3.6 train_pos_mai.py
Using TensorFlow backend.
Generate new embeddings files for a dataset
Read file: maiwiki-20180920-stub-articles.xml
Traceback (most recent call last):
File "train_pos_mai.py", line 48, in
pickleFile = perpareDataset(embeddingsPath, datasets)
File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 42, in perpareDataset
embeddings, word2Idx = readEmbeddings(embeddingsPath, datasets, frequencyThresholdUnknownTokens, reducePretrainedEmbeddings)
File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 156, in readEmbeddings
vector = np.array([float(num) for num in split[1:]])
File "/home/Ankur_JRF/Backup_Ubuntu/LSTM/bilstm/util/preprocessing.py", line 156, in
vector = np.array([float(num) for num in split[1:]])
ValueError: could not convert string to float: 'xmlns="http://www.mediawiki.org/xml/export-0.10/"'

nreimers · 2019-02-18T13:47:18Z

What type of embeddings file do you use?

The system expects an input file which is similar to the GloVe representation of embeddings.

Each line a token, followed by e.g. 300 floats (space separated).

It appears like your embedding file is some form of XML file? If yes, you would need to convert it to a format like the GloVe embeddings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eof Error #41

eof Error #41

ankur220693 commented Feb 13, 2019

nreimers commented Feb 13, 2019

ankur220693 commented Feb 13, 2019

ankur220693 commented Feb 18, 2019

nreimers commented Feb 18, 2019

eof Error #41

eof Error #41

Comments

ankur220693 commented Feb 13, 2019

nreimers commented Feb 13, 2019

ankur220693 commented Feb 13, 2019

ankur220693 commented Feb 18, 2019

nreimers commented Feb 18, 2019