Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

embeddings = np.array(embeddings) MemoryError #36

Open
ghost opened this issue Jan 28, 2019 · 2 comments
Open

embeddings = np.array(embeddings) MemoryError #36

ghost opened this issue Jan 28, 2019 · 2 comments

Comments

@ghost
Copy link

ghost commented Jan 28, 2019

Hi,

I get the memory error when I run NER model by using word2vec embeddings from this link (http://evexdb.org/pmresources/vec-space-models/). But I am able to run Elmo-Bilstm model with these embeddings without getting any error. Is there any way to fix this issue? My embeddings file is 13.2 GB whereas I have 16 GB of RAM.

@nreimers
Copy link
Member

13.2 GB for an embedding file is extremely large. Are you sure you all need these embeddings?

You often get really good performances with much smaller embedding files, e.g. with Komninos embeddings:
https://public.ukp.informatik.tu-darmstadt.de/reimers/embeddings/

Or with the GloVe embeddings.

Some embeddings contain many unneccessary embeddings. The original word2vec embeddings for example also contain embeddings for bigrams (which cannot be used in this architecture). The Komninos embeddings you get from his webpage also contain embedding information for dependency relations (which also cannot be used with this architecture).

If you sill want to use your linked embeddings:

The perpareDataset method in util.py has an argument:
reducePretrainedEmbeddings=False

Set this argument to True.

With this argument, only the needed embeddings are loaded from disk and stored in memory. Further word embeddings, that do not appear in train/dev/test, are not loaded.

@ghost
Copy link
Author

ghost commented Jan 29, 2019

I was able to run the code by setting this argument "reducePretrainedEmbedding=True"

But I am wondering, why I was able to to do NER with ELMo and word2vec embeddings(13.2 GB file) without setting that argument to true. Can you help me understand that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant