Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing Attention Masks #24

Open
leffff opened this issue Sep 14, 2023 · 3 comments
Open

Passing Attention Masks #24

leffff opened this issue Sep 14, 2023 · 3 comments

Comments

@leffff
Copy link
Contributor

leffff commented Sep 14, 2023

Hi! Is there a way to pass an attention mask like in transformers library or src_key_padding_mask in nn.Transformer? So that the model wouldn't "pay attention" to paddings?

@leffff
Copy link
Contributor Author

leffff commented Sep 14, 2023

Moreover how do you recommend to pool the output embeddings into a single vector? For example BERT uses a [CLS] token, that aggregates the information from the whole sequence. As I understood the last vector in the sequence encodes the information (like in RNNs).

@Jamie-Stirling
Copy link
Owner

Hi!

Thanks for your interest in this implementation. Please see the work of the original authors for more information, I'm best-placed to answer implementation-specific questions since I'm not an author. That said I can have a go at answering your questions.

Regarding padding, if your padding is placed after the input tokens, there's no need to mask the retention mechanism itself, since information can only flow forwards anyway. You'll probably want to mask out the losses during training though.

Regarding getting an embedding of an entire sequence, the recurrent state S (for the recurrent representation) and R (for chunk-wise) should share a large amount of mutual information with the preceding tokens, so they may serve as a useful vector (or rather matrix) representation of a sequence. That remains to be further investigated though.

@leffff
Copy link
Contributor Author

leffff commented Sep 14, 2023

Thanks for the answer, now it's clear to me! I'll just take the last non PAD token!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants