Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce COIN dataset result #26

Open
bluehawk2k opened this issue Aug 12, 2024 · 14 comments
Open

Cannot reproduce COIN dataset result #26

bluehawk2k opened this issue Aug 12, 2024 · 14 comments

Comments

@bluehawk2k
Copy link

Hi, here is another issue about reproducibility of COIN dataset result.
I've also tried to reproduce your result of COIN dataset with using 8 A100 GPUs.
However, the evaluation result gives too low performance comparing with your result reported in your paper.
image
Do you have any idea about this issue?

@leebebeto
Copy link

Hi, I also found a part that may have potential bugs in the evaluation part when using Coin dataset.
In evaluation_loop() of ./transformers/trainer.py, the labels seem to be wrong.
Following is the screenshot of the logits and labels.
I think the labels should also look like logits, indicating the gt_ids, not 0.
Because of the wrong labels, I also get very low accuracy.

image

Thanks in advance!

@chenjoya
Copy link
Collaborator

Hi @bluehawk2k , it seems that the loss is too large and the model is not converged. Could you show your training scripts? Sometimes on different devices, the training in the same learning rate may not be stable again, you can decrease the learning rate or extend the training epochs a little bit. When my model is converged, the loss is in 0.0x scale.

@leebebeto the labels here is not the real label, it is used as the dataset index.

@chenjoya
Copy link
Collaborator

Recently I am busy with other projects. But please feel free to leave your questions here and I will solve them as soon as possible.

@leebebeto
Copy link

leebebeto commented Aug 13, 2024

Thank you for the reply. The labels are indeed the sample indexes rather than actual gt labels.

By the way, I also have loss value converging to 0.x scale rather than 0.0x scale.
Regarding the performances, the indexes of self.mapping_categories using predictions and answers are significantly different, so I have a very low accuracy. I think the training indeed did not converge.

I used your released training script https://github.com/showlab/videollm-online/blob/main/scripts/coin/live1%2B.sh
Could you please double check this released training script. Or do I have to decrease learning rates or extend training epochs from your released training script?

Thank you!

@chenjoya
Copy link
Collaborator

Hi, currently I am working in a new cluster. I finished all data preparing and now start to train COIN. Please wait several hours and I will get back to you!

image

@chenjoya
Copy link
Collaborator

chenjoya commented Aug 15, 2024

Hi, the bug in COIN evaluation has been fixed. I reproduced from scratch in a new cluster.

The main reason should be the unstable training. My current env is torch 2.4.0, cuda 12.4, transformers 4.44.0. When I try to use lr = 2e-4, it cannot converge, which I also meet in other scenarios (but previously it can converge...). I try to decrease the lr / 2 = 1e-4, still 5 epoch, then get the results

image

Lower than the paper due to lr / 2. Will update once I get a good epoch and lr parameters.

Meanwhile, I made some updates on the COIN dataset and removed some useless codes. Recommend to pull them.

@chenjoya
Copy link
Collaborator

chenjoya commented Aug 26, 2024

Hi, sorry that I recently busy with other projects, do not have many GPUs to re-implement. In my experience, the key to get high accuracy on COIN is (1) avoid training loss spike. This is very important, please check the log. If that happens, you can stop and try the next config 😂 (2) Try lr as high as possible but ensure loss spike is not appeared.

Yesterday I tried some simple parameters, lr = 0.00015, stream_loss_weight=0.5, and I have got the similar results to the paper:

image

‘eval_coin_step_test_accuracy’: 62.7394328517924
’eval_coin_next_test_accuracy’: 48.5977275995973
‘eval_coin_task_test_accuracy’: 92.22408026755853
‘eval_coin_procedure_test_accuracy’: 48.84329225761451
‘eval_coin_taskprocedure_test_accuracy’: 53.641383318802674

Training with longer epochs should match the paper. But now my GPUs are not free again. Will update once it has been done.

Many thanks for your patience!

@yankee624
Copy link

yankee624 commented Sep 4, 2024

@chenjoya Do you think the training script for ego4d narration (https://github.com/showlab/videollm-online/blob/main/scripts/ego4d/narration/live1%2B.sh) should also be fixed? I'm trying to train the model on ego4d_refined_narration_stream_train, but the loss isn't going down after it reaches 0.3~0.4 in the middle of epoch 1. Is this loss value typical for ego4d dataset?

image

@chenjoya
Copy link
Collaborator

chenjoya commented Sep 5, 2024

hello @yankee624 this is very good since there is no loss spike. The loss in training COIN is low because it just contains single video-text pairs, instead of multiple video-text streams in Ego4D narration.

@yankee624
Copy link

@chenjoya Thank you so much for confirming! The metrics seem good so I guess it's trained well!

@nguyentthong
Copy link

hi @chenjoya , can I ask how many test samples you evaluate on? Because when I evaluated on 512 samples, the test step accuracy is 65%. However, the test step accuracy on 10k samples is only 10%.

@chenjoya
Copy link
Collaborator

hi @nguyentthong , I tested on the standard test set of COIN dataset. Do you get the correct video ids?

@nguyentthong
Copy link

nguyentthong commented Oct 15, 2024

hi @chenjoya , I use the official data in this link: https://github.com/coin-dataset/annotations/blob/master/COIN.json

My loss landscape is as follows:

image

Is this a normal loss landscape? Moreover, can you release the checkpoint trained on the COIN dataset so that it is easier to examine the problem?

@nguyentthong
Copy link

Additionally, I use the preprocess scripts here: https://github.com/showlab/videollm-online/tree/main/data/preprocess

I suppose this is applicable for both COIN and Ego4D datasets. Can you tell me whether my assumption is correct? @chenjoya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants