embeddings = np.array(embeddings) MemoryError #36

ghost · 2019-01-28T12:37:08Z

Hi,

I get the memory error when I run NER model by using word2vec embeddings from this link (http://evexdb.org/pmresources/vec-space-models/). But I am able to run Elmo-Bilstm model with these embeddings without getting any error. Is there any way to fix this issue? My embeddings file is 13.2 GB whereas I have 16 GB of RAM.

nreimers · 2019-01-28T12:57:33Z

13.2 GB for an embedding file is extremely large. Are you sure you all need these embeddings?

You often get really good performances with much smaller embedding files, e.g. with Komninos embeddings:
https://public.ukp.informatik.tu-darmstadt.de/reimers/embeddings/

Or with the GloVe embeddings.

Some embeddings contain many unneccessary embeddings. The original word2vec embeddings for example also contain embeddings for bigrams (which cannot be used in this architecture). The Komninos embeddings you get from his webpage also contain embedding information for dependency relations (which also cannot be used with this architecture).

If you sill want to use your linked embeddings:

The perpareDataset method in util.py has an argument:
reducePretrainedEmbeddings=False

Set this argument to True.

With this argument, only the needed embeddings are loaded from disk and stored in memory. Further word embeddings, that do not appear in train/dev/test, are not loaded.

ghost · 2019-01-29T11:05:28Z

I was able to run the code by setting this argument "reducePretrainedEmbedding=True"

But I am wondering, why I was able to to do NER with ELMo and word2vec embeddings(13.2 GB file) without setting that argument to true. Can you help me understand that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embeddings = np.array(embeddings) MemoryError #36

embeddings = np.array(embeddings) MemoryError #36

ghost commented Jan 28, 2019

nreimers commented Jan 28, 2019

ghost commented Jan 29, 2019 •

edited by ghost

Loading

embeddings = np.array(embeddings) MemoryError #36

embeddings = np.array(embeddings) MemoryError #36

Comments

ghost commented Jan 28, 2019

nreimers commented Jan 28, 2019

ghost commented Jan 29, 2019 • edited by ghost Loading

ghost commented Jan 29, 2019 •

edited by ghost

Loading