Need understanding number of tokens #22

miguelcarvtalka · 2024-11-18T18:30:54Z

How do you ensure that the number of tokens don't surpass the max token length defined for the model? In the case of the Llama 3.2 1B decoder model, the max token length seems to be 16k, but from reading the paper no where do you have specified a max number of tokens for the video - everything seems to be threshold based meaning that it seems to be entirely possible to exceed the context window even after STC, right? What do you do in case even after STC the context still exceeds the max defined in the config file?

xiaoqian-shen · 2024-11-19T09:56:31Z

Hi @miguelcarvtalka,
If the number of tokens after compression still exceed the context length, we will force truncate the exceed tokens in each sliding window, as implemented here. And in our reported result, we set the model_max_length to be 8k (8192) for fair comparison with baselines.

miguelcarvtalka · 2024-11-19T15:44:35Z

Thank you for your reply! Another question: is there a way for the model to understand which tokens are low res images and which tokens are a result of the STC module? Meaning is there a way for the model to distinguish whether a token belongs to a full image or not?

For example, just out of the top of my head you could have included extra learned tokens that delimit the full image or the output of the STC module (the tokens that changed from the first frame in the window), you could also sum learned embeddings there...

xiaoqian-shen · 2024-11-24T11:01:25Z

@miguelcarvtalka Thank you for sharing your thoughts! You can keep track of the indices of the frames where the tokens have been reduced. However, I’m not entirely clear on the meaning of the last sentence, "sum learned embeddings."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need understanding number of tokens #22

Need understanding number of tokens #22

miguelcarvtalka commented Nov 18, 2024 •

edited

Loading

xiaoqian-shen commented Nov 19, 2024

miguelcarvtalka commented Nov 19, 2024 •

edited

Loading

xiaoqian-shen commented Nov 24, 2024

Need understanding number of tokens #22

Need understanding number of tokens #22

Comments

miguelcarvtalka commented Nov 18, 2024 • edited Loading

xiaoqian-shen commented Nov 19, 2024

miguelcarvtalka commented Nov 19, 2024 • edited Loading

xiaoqian-shen commented Nov 24, 2024

miguelcarvtalka commented Nov 18, 2024 •

edited

Loading

miguelcarvtalka commented Nov 19, 2024 •

edited

Loading