Adding new embeddings to a trained model. #58

S4ltedF1sh · 2020-02-16T15:04:39Z

Hi, I'm currently using this model for poems sentiment analysis. I trained the model with certain amount of poems, with each line is used as a token and each line has its own embedding in the embedding file. The problem is that after the training, I want to use it on other unseen poems (their embedding are not in the embedding file). When I tried to add their embeddings to the embedding file and ran the model, it just returned this error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,4] = 4837 is not in [0, 4827) [[Node: word_embeddings/embedding_lookup = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](word_embeddings/embeddings/read, _arg_words_input_0_0, word_embeddings/embedding_lookup/axis)]]

which I assume that after the training, the embedding size of the model is fixed and you can't add any further embedding. So I want to ask how can I add new embeddings to the model or how can I use the model to predict unseen poems?

The text was updated successfully, but these errors were encountered:

S4ltedF1sh · 2020-02-16T16:42:51Z

this is the image of the full error: https://imgur.com/a/4v8VoSJ

nreimers · 2020-02-17T08:49:13Z

Hi @S4ltedF1sh
this is not quite straight forward.

The model is loaded here:

emnlp2017-bilstm-cnn-crf/neuralnets/BiLSTM.py

Line 611 in b709f58

def loadModel(modelPath):

What you need to call on your bilstm-models is this function:

bilstm.setMappings(new_mappings, new_embeddings)

It is important to call it before the method buildModel is invoked.

Best
Nils Reimers

S4ltedF1sh · 2020-02-17T14:33:51Z

Hi @nreimers ,
many thanks for the quick answer, however I'm not really sure by what you mean:

It is important to call it before the method buildModel is invoked.

As I understand correctly, the buildmodel method is only called once before the training starts, and isn't invoked while loading a trained model. So where should I call the setmapping method when I load my trained model? Or is it only possible to add more embeddings before the training? I checked the code and threre is a cap for the maximum features which I assume is the index of the token in the embeddding file (line 105, BiLSTM.py, buildmodel function):

tokens = Embedding(input_dim=self.embeddings.shape[0], output_dim=self.embeddings.shape[1], weights=[self.embeddings], trainable=False, name='word_embeddings')(tokens_input)

So because of this input_dim=self.embeddings.shape[0] I think it's capped at the current size of the embedding file and you can't add anymore embeddings after the training. Is it right?

Many thanks in advance,
Minh Vu Pham

nreimers · 2020-02-22T10:08:00Z

Hi @S4ltedF1sh
The quoted line creates a keras embedding layer with the size of your numpy self.embeddings matrix. If you add new embeddings to self.embeddings, it will also be used by keras in the embedding layer.

However, it is important that you add these new embeddings before tokens = Embedding(...) is invoked.

This buildMethod is invoked when training or inference is started.

Best
Nils Reimers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding new embeddings to a trained model. #58

Adding new embeddings to a trained model. #58

S4ltedF1sh commented Feb 16, 2020

S4ltedF1sh commented Feb 16, 2020

nreimers commented Feb 17, 2020

S4ltedF1sh commented Feb 17, 2020

nreimers commented Feb 22, 2020

Adding new embeddings to a trained model. #58

Adding new embeddings to a trained model. #58

Comments

S4ltedF1sh commented Feb 16, 2020

S4ltedF1sh commented Feb 16, 2020

nreimers commented Feb 17, 2020

S4ltedF1sh commented Feb 17, 2020

nreimers commented Feb 22, 2020