Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Evaluate.py #35

Open
jun0wanan opened this issue Sep 12, 2024 · 3 comments
Open

About Evaluate.py #35

jun0wanan opened this issue Sep 12, 2024 · 3 comments

Comments

@jun0wanan
Copy link

jun0wanan commented Sep 12, 2024

Hi,
code:
`
def joint_embed(

    self,

    input_ids: torch.Tensor = None,

    frames: torch.Tensor = None,

):

    if frames is None:

        return self.get_input_embeddings()(input_ids)

    if input_ids is None:

        return self.visual_embed(frames)

    inputs_embeds = self.get_input_embeddings()(input_ids.clamp(max=self.vocab_size-1))

    v_mask = input_ids == self.config.v_placeholder_id

    if v_mask.any():

        inputs_embeds[v_mask] = self.visual_embed(frames)

    return inputs_embeds

`

I found that when I run the evaluate.py code separately, it causes the frame to be None, which leads to entering the first if condition.
I want to ask if this is correct? Should it not enter this condition?

Process: I directly ran evaluate.py using the model you provided, and I just wanted to check the metrics :)

Hope to your reply , thank you!

@jun0wanan
Copy link
Author

jun0wanan commented Sep 12, 2024

I have another question. I noticed that the demo uses the class LiveInfer:. How is this class different from the one used before? Why was it separated into its own class?😊

Hope to your reply , thank you!

@chenjoya
Copy link
Collaborator

I have another question. I noticed that the demo uses the class LiveInfer:. How is this class different from the one used before? Why was it separated into its own class?😊

Hope to your reply , thank you!

Hi, this is just used during inference, more compatible with frame-by-frame streaming inference. Instead, the training and evaluation are forward in parallel.

@chenjoya
Copy link
Collaborator

Hi, code: ` def joint_embed(

    self,

    input_ids: torch.Tensor = None,

    frames: torch.Tensor = None,

):

    if frames is None:

        return self.get_input_embeddings()(input_ids)

    if input_ids is None:

        return self.visual_embed(frames)

    inputs_embeds = self.get_input_embeddings()(input_ids.clamp(max=self.vocab_size-1))

    v_mask = input_ids == self.config.v_placeholder_id

    if v_mask.any():

        inputs_embeds[v_mask] = self.visual_embed(frames)

    return inputs_embeds

`

I found that when I run the evaluate.py code separately, it causes the frame to be None, which leads to entering the first if condition. I want to ask if this is correct? Should it not enter this condition?

Process: I directly ran evaluate.py using the model you provided, and I just wanted to check the metrics :)

Hope to your reply , thank you!

Could you give the script you run? It seems that the frames are not properly passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants