Question about frame_difference metric calculation #31

yankee624 · 2024-08-29T06:25:01Z

Hi, thanks for the great work!

I have a question about the following line:

Line 123 in 755e265

    
           to_append_num_frames = min(next_turn_num_frames, turn_num_frames - 1) # avoid bias. current as center, two equal left/right side

This line caps the frame_difference to a certain value. (The comment says 'avoiding the bias' but I don't quite understand it. I would really appreciate more detailed explanation.)

Specifically, when the model fails to reply(=output eos) before current turn ends:

next_turn_num_frames: it test whether the model can reply until the next turn, not the next-next turn or more future turns. The model may be able to reply in more future turns, so I don't get why you set the maximum to the next turn.
number of frames in current turn - 1: I also don't understand this. In extreme case, if the current turn has 1 frame, it just sets frame_diff to 0.
- For example, let's assume the following frames and responses:
  [a] [a] [b] "b appeared" [c] "c appeared" [c] [c] [d] "d appeared"
  where [k] is the frame with content k, and the response is in the "".
  Turn 1: [a] [a] [b] "b appeared" (num frames = 3)
  Turn 2: [c] "c appeared" (num frames = 1)
  Turn 3: [c] [c] [d] "d appeared" (num frames = 3)
Let's say we're currently in Turn 2. Then, even if the model fails to reply in the first (and the only) frame [c], the frame_diff becomes 0 becuase number of frames in current turn - 1 = 0. I think we have to consider whether model can reply in the frames in Turn 3.

The text was updated successfully, but these errors were encountered:

chenjoya · 2024-08-29T08:31:15Z

Hi, 'avoiding the bias' is to avoid the model uses a simple behavior, e.g. always predict after the ground-truth timestamp. The motivation is to ensure the chance equally: (1) predicting in advance, (2) predicting afterwards.

The case you provided is correct. The problem does exist. Many thanks for the careful checking! Maybe we can make the evaluation temporal boundary still be the last response between the next response, no min() operation here.

BTW, do you have any other advice to improve that? I can update the results on a new arxiv version.

yankee624 · 2024-08-30T18:54:13Z

Thank you for the reply!
I think 'avoiding the bias' may make sense in the training time, where we can train the model to produce early/late response equally. But the frame_diff metric is calculated in evaluation time, where the model is already trained to output response in a specific way, and we are forecully capping the maximum frame_diff. This seemed a bit unnatural. As you said, using to_append_num_frames = next_turn_num_frames without the min() operation seems better. (although this still caps the limit the frame_diff to next_turn_num_frames... It may be helpful to report something like "frame_diff compared to turn_num_frames")

Another thought: I can't think of a scenario where early response is useful. For example, as shown in Figure 4 of the paper, the user would query something like "Remind me when yellow card appears", then, early response is totally useless since yellow card didn't appear yet. Only the reponse after yellow card occurrence is useful. So I thought only the just-in-time or late responses should be considered for frame_diff calculation. But I'm not sure about this. If there's a scenario where early response can also be useful, please remind me of that.

chenjoya · 2024-09-05T08:26:31Z

Hi, many thanks for the discussion! I think limiting by next_turn_num_frames is reasonable, since the next turn may have new context that needs the model to output. Therefore, it is not suitable for us to calculate the frame_diff using the frames from the next turn.

However, I agree with you that we can evaluate the model performance without using min(). I will update the results here first and add a table in the supplement to suggest that. Currently, I think the metrics are still fair for comparing ablations since all variants use the same evaluation.

An early response can also be useful. For example, when a dangerous situation is about to occur, it would be better for the model to report in advance rather than with a delay.

yankee624 · 2024-09-05T23:31:34Z

Thank you so much for the discussion! I really appreciate your help.
Still, for the early response scenario, i think the model should report at the moment "when a dangerous situation is about to occur (when some dangerous signals are captured)", not earlier than that. Earlier response means that the model is reporting even when there is no signal of danger. So I thought only the "just-in-time" and "late" responses are meaningful, not the "early" response.
I will think more about the scenarios. Thank you again for the discussion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about frame_difference metric calculation #31

Question about frame_difference metric calculation #31

yankee624 commented Aug 29, 2024 •

edited

Loading

chenjoya commented Aug 29, 2024

yankee624 commented Aug 30, 2024 •

edited

Loading

chenjoya commented Sep 5, 2024 •

edited

Loading

yankee624 commented Sep 5, 2024

Question about frame_difference metric calculation #31

Question about frame_difference metric calculation #31

Comments

yankee624 commented Aug 29, 2024 • edited Loading

chenjoya commented Aug 29, 2024

yankee624 commented Aug 30, 2024 • edited Loading

chenjoya commented Sep 5, 2024 • edited Loading

yankee624 commented Sep 5, 2024

yankee624 commented Aug 29, 2024 •

edited

Loading

yankee624 commented Aug 30, 2024 •

edited

Loading

chenjoya commented Sep 5, 2024 •

edited

Loading