About Evaluate.py #35

jun0wanan · 2024-09-12T10:03:25Z

Hi,
code：
`
def joint_embed(

    self,

    input_ids: torch.Tensor = None,

    frames: torch.Tensor = None,

):

    if frames is None:

        return self.get_input_embeddings()(input_ids)

    if input_ids is None:

        return self.visual_embed(frames)

    inputs_embeds = self.get_input_embeddings()(input_ids.clamp(max=self.vocab_size-1))

    v_mask = input_ids == self.config.v_placeholder_id

    if v_mask.any():

        inputs_embeds[v_mask] = self.visual_embed(frames)

    return inputs_embeds

`

I found that when I run the evaluate.py code separately, it causes the frame to be None, which leads to entering the first if condition.
I want to ask if this is correct? Should it not enter this condition?

Process: I directly ran evaluate.py using the model you provided, and I just wanted to check the metrics :)

Hope to your reply , thank you!

The text was updated successfully, but these errors were encountered:

jun0wanan · 2024-09-12T11:57:40Z

I have another question. I noticed that the demo uses the class LiveInfer:. How is this class different from the one used before? Why was it separated into its own class?😊

Hope to your reply , thank you!

chenjoya · 2024-09-15T20:31:48Z

I have another question. I noticed that the demo uses the class LiveInfer:. How is this class different from the one used before? Why was it separated into its own class?😊

Hope to your reply , thank you!

Hi, this is just used during inference, more compatible with frame-by-frame streaming inference. Instead, the training and evaluation are forward in parallel.

chenjoya · 2024-09-15T20:32:50Z

Hi, code： ` def joint_embed(
    self,

    input_ids: torch.Tensor = None,

    frames: torch.Tensor = None,

):

    if frames is None:

        return self.get_input_embeddings()(input_ids)

    if input_ids is None:

        return self.visual_embed(frames)

    inputs_embeds = self.get_input_embeddings()(input_ids.clamp(max=self.vocab_size-1))

    v_mask = input_ids == self.config.v_placeholder_id

    if v_mask.any():

        inputs_embeds[v_mask] = self.visual_embed(frames)

    return inputs_embeds
`

I found that when I run the evaluate.py code separately, it causes the frame to be None, which leads to entering the first if condition. I want to ask if this is correct? Should it not enter this condition?

Process: I directly ran evaluate.py using the model you provided, and I just wanted to check the metrics :)

Hope to your reply , thank you!

Could you give the script you run? It seems that the frames are not properly passed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Evaluate.py #35

About Evaluate.py #35

jun0wanan commented Sep 12, 2024 •

edited

Loading

jun0wanan commented Sep 12, 2024 •

edited

Loading

chenjoya commented Sep 15, 2024

chenjoya commented Sep 15, 2024

About Evaluate.py #35

About Evaluate.py #35

Comments

jun0wanan commented Sep 12, 2024 • edited Loading

jun0wanan commented Sep 12, 2024 • edited Loading

chenjoya commented Sep 15, 2024

chenjoya commented Sep 15, 2024

jun0wanan commented Sep 12, 2024 •

edited

Loading

jun0wanan commented Sep 12, 2024 •

edited

Loading