You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a custom pretrained T5 model that is predicting the solution to quadratic equations, so it's output is of different size than input (in all the examples I saw that they are the same). I'm trying to visualize attention like this:
tokenizer = AutoTokenizer.from_pretrained("my-repo/content") model = AutoModelForSeq2SeqLM.from_pretrained("my-repo/content", output_attentions=True)
So encoder_text length is 18, decoder_text length is 79.
For some reason when I get all attentions from the outputs they come in a form of tuple (cross attention is even a tuple of tuples).
I can't seem to figure out how to use this function correctly and why my attentions are of wrong dimensions.
I have a custom pretrained T5 model that is predicting the solution to quadratic equations, so it's output is of different size than input (in all the examples I saw that they are the same). I'm trying to visualize attention like this:
tokenizer = AutoTokenizer.from_pretrained("my-repo/content")
model = AutoModelForSeq2SeqLM.from_pretrained("my-repo/content", output_attentions=True)
encoder_input_ids = tokenizer("7*x^2+3556*x+451612=0", return_tensors="pt", add_special_tokens=True).input_ids
outputs = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_length=80, min_length=10, output_attentions=True, return_dict_in_generate=True)
For example predicted sequence is: "D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0".
with tokenizer.as_target_tokenizer(): decoder_input_ids = tokenizer("D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0", return_tensors="pt", add_special_tokens=True).input_ids
encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])
So
encoder_text
length is 18,decoder_text
length is 79.For some reason when I get all attentions from the outputs they come in a form of tuple (cross attention is even a tuple of tuples).
I can't seem to figure out how to use this function correctly and why my attentions are of wrong dimensions.
model_view( cross_attention = outputs.cross_attentions, encoder_attention = encoder_attention, decoder_attention = decoder_attention, encoder_tokens = encoder_text, decoder_tokens = decoder_text)
Is the problem because the output length is different from the input?
The text was updated successfully, but these errors were encountered: