Cannot reproduce COIN dataset result #26

bluehawk2k · 2024-08-12T07:25:03Z

Hi, here is another issue about reproducibility of COIN dataset result.
I've also tried to reproduce your result of COIN dataset with using 8 A100 GPUs.
However, the evaluation result gives too low performance comparing with your result reported in your paper.

Do you have any idea about this issue?

leebebeto · 2024-08-12T11:26:24Z

Hi, I also found a part that may have potential bugs in the evaluation part when using Coin dataset.
In evaluation_loop() of ./transformers/trainer.py, the labels seem to be wrong.
Following is the screenshot of the logits and labels.
I think the labels should also look like logits, indicating the gt_ids, not 0.
Because of the wrong labels, I also get very low accuracy.

Thanks in advance!

chenjoya · 2024-08-12T17:56:33Z

Hi @bluehawk2k , it seems that the loss is too large and the model is not converged. Could you show your training scripts? Sometimes on different devices, the training in the same learning rate may not be stable again, you can decrease the learning rate or extend the training epochs a little bit. When my model is converged, the loss is in 0.0x scale.

@leebebeto the labels here is not the real label, it is used as the dataset index.

chenjoya · 2024-08-12T17:57:54Z

Recently I am busy with other projects. But please feel free to leave your questions here and I will solve them as soon as possible.

leebebeto · 2024-08-13T01:30:00Z

Thank you for the reply. The labels are indeed the sample indexes rather than actual gt labels.

By the way, I also have loss value converging to 0.x scale rather than 0.0x scale.
Regarding the performances, the indexes of self.mapping_categories using predictions and answers are significantly different, so I have a very low accuracy. I think the training indeed did not converge.

I used your released training script https://github.com/showlab/videollm-online/blob/main/scripts/coin/live1%2B.sh
Could you please double check this released training script. Or do I have to decrease learning rates or extend training epochs from your released training script?

Thank you!

chenjoya · 2024-08-15T05:57:05Z

Hi, currently I am working in a new cluster. I finished all data preparing and now start to train COIN. Please wait several hours and I will get back to you!

chenjoya · 2024-08-15T18:52:46Z

Hi, the bug in COIN evaluation has been fixed. I reproduced from scratch in a new cluster.

The main reason should be the unstable training. My current env is torch 2.4.0, cuda 12.4, transformers 4.44.0. When I try to use lr = 2e-4, it cannot converge, which I also meet in other scenarios (but previously it can converge...). I try to decrease the lr / 2 = 1e-4, still 5 epoch, then get the results

Lower than the paper due to lr / 2. Will update once I get a good epoch and lr parameters.

Meanwhile, I made some updates on the COIN dataset and removed some useless codes. Recommend to pull them.

chenjoya · 2024-08-26T12:31:27Z

Hi, sorry that I recently busy with other projects, do not have many GPUs to re-implement. In my experience, the key to get high accuracy on COIN is (1) avoid training loss spike. This is very important, please check the log. If that happens, you can stop and try the next config 😂 (2) Try lr as high as possible but ensure loss spike is not appeared.

Yesterday I tried some simple parameters, lr = 0.00015, stream_loss_weight=0.5, and I have got the similar results to the paper:

‘eval_coin_step_test_accuracy’: 62.7394328517924
’eval_coin_next_test_accuracy’: 48.5977275995973
‘eval_coin_task_test_accuracy’: 92.22408026755853
‘eval_coin_procedure_test_accuracy’: 48.84329225761451
‘eval_coin_taskprocedure_test_accuracy’: 53.641383318802674

Training with longer epochs should match the paper. But now my GPUs are not free again. Will update once it has been done.

Many thanks for your patience!

yankee624 · 2024-09-04T23:37:37Z

@chenjoya Do you think the training script for ego4d narration (https://github.com/showlab/videollm-online/blob/main/scripts/ego4d/narration/live1%2B.sh) should also be fixed? I'm trying to train the model on ego4d_refined_narration_stream_train, but the loss isn't going down after it reaches 0.3~0.4 in the middle of epoch 1. Is this loss value typical for ego4d dataset?

chenjoya · 2024-09-05T08:08:04Z

hello @yankee624 this is very good since there is no loss spike. The loss in training COIN is low because it just contains single video-text pairs, instead of multiple video-text streams in Ego4D narration.

yankee624 · 2024-09-05T23:03:11Z

@chenjoya Thank you so much for confirming! The metrics seem good so I guess it's trained well!

nguyentthong · 2024-10-14T08:19:24Z

hi @chenjoya , can I ask how many test samples you evaluate on? Because when I evaluated on 512 samples, the test step accuracy is 65%. However, the test step accuracy on 10k samples is only 10%.

chenjoya · 2024-10-15T05:30:45Z

hi @nguyentthong , I tested on the standard test set of COIN dataset. Do you get the correct video ids?

nguyentthong · 2024-10-15T06:53:51Z

hi @chenjoya , I use the official data in this link: https://github.com/coin-dataset/annotations/blob/master/COIN.json

My loss landscape is as follows:

Is this a normal loss landscape? Moreover, can you release the checkpoint trained on the COIN dataset so that it is easier to examine the problem?

nguyentthong · 2024-10-15T07:37:43Z

Additionally, I use the preprocess scripts here: https://github.com/showlab/videollm-online/tree/main/data/preprocess

I suppose this is applicable for both COIN and Ego4D datasets. Can you tell me whether my assumption is correct? @chenjoya

nguyentthong mentioned this issue Oct 13, 2024

Cannot reproduce COIN result #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce COIN dataset result #26

Cannot reproduce COIN dataset result #26

bluehawk2k commented Aug 12, 2024

leebebeto commented Aug 12, 2024

chenjoya commented Aug 12, 2024

chenjoya commented Aug 12, 2024

leebebeto commented Aug 13, 2024 •

edited

Loading

chenjoya commented Aug 15, 2024

chenjoya commented Aug 15, 2024 •

edited

Loading

chenjoya commented Aug 26, 2024 •

edited

Loading

yankee624 commented Sep 4, 2024 •

edited

Loading

chenjoya commented Sep 5, 2024

yankee624 commented Sep 5, 2024

nguyentthong commented Oct 14, 2024

chenjoya commented Oct 15, 2024

nguyentthong commented Oct 15, 2024 •

edited

Loading

nguyentthong commented Oct 15, 2024

Cannot reproduce COIN dataset result #26

Cannot reproduce COIN dataset result #26

Comments

bluehawk2k commented Aug 12, 2024

leebebeto commented Aug 12, 2024

chenjoya commented Aug 12, 2024

chenjoya commented Aug 12, 2024

leebebeto commented Aug 13, 2024 • edited Loading

chenjoya commented Aug 15, 2024

chenjoya commented Aug 15, 2024 • edited Loading

chenjoya commented Aug 26, 2024 • edited Loading

yankee624 commented Sep 4, 2024 • edited Loading

chenjoya commented Sep 5, 2024

yankee624 commented Sep 5, 2024

nguyentthong commented Oct 14, 2024

chenjoya commented Oct 15, 2024

nguyentthong commented Oct 15, 2024 • edited Loading

nguyentthong commented Oct 15, 2024

leebebeto commented Aug 13, 2024 •

edited

Loading

chenjoya commented Aug 15, 2024 •

edited

Loading

chenjoya commented Aug 26, 2024 •

edited

Loading

yankee624 commented Sep 4, 2024 •

edited

Loading

nguyentthong commented Oct 15, 2024 •

edited

Loading