Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low Accuracy on 80 Tasks After Fine-Tuning Meta-Llama-3-8B-Instruct (19/400 = 4.75%) #3

Open
bingwork opened this issue Nov 29, 2024 · 5 comments

Comments

@bingwork
Copy link

I used 80 tasks from the file task_info_selected.csv in this repository, and fine-tuned Meta-Llama-3-8B-Instruct using the train.sh(train.sh.txt below) script generated from this repository. Then, I generated the predict.sh(predict.sh.txt below) script for inference following the instructions in the same repository here. However, the final result I got is a Competition Accuracy of 19 / 400 = 0.0475, meaning it only got 19 tasks correct. Can you help me identify what might be wrong?

/workspace/wubing/marc/test_time_train.py, for using the selected 80 tasks, I updated the code blow.
if args.num_tasks is not None: if args.num_tasks_selected: import pandas as pd df = pd.read_csv('/workspace/wubing/marc/task_info_selected.csv') selected_tasks = df['task_id'].to_list() arc_test_tasks = [task for task in arc_test_tasks if task.name.replace("-0", "") in selected_tasks] print("Use selected tasks as ttt paper") else: arc_test_tasks = arc_test_tasks[: args.num_tasks]

predict.log
train.log
predict.sh.txt
train.sh.txt

@ekinakyurek
Copy link
Owner

Which checkpoints do you use in Meta-Llama-3-8B-Instruct folder?

@ekinakyurek
Copy link
Owner

ekinakyurek commented Nov 30, 2024

The expected value should be around 36 tasks for this model. I attached the logs of the inference run and the resulting predictions for what you're trying to get per my understanding. I also attached one of the tasks' TTT loss logs. The checkpoint used in this run should be same as: https://huggingface.co/ekinakyurek/marc-8B-finetuned-llama3/tree/main (I can additionally verify if necessary)

0a1d4ef5_tt.txt
8B_grids_no_lm_generated_model_tti.zip

==========

We also have some verification notebooks now on kaggle. These use modal branch and BARC checkpoints. This should be fully replicable but requires modal account and some credits. However, you can just take a look and see the logs settings etc. You can also see the induction part (from the BARC team) details as well.

For BARC checkpoints make sure the torchtune tokenizer is in BARC mode --- requires editing to tokenizer file under your torchtune installation: https://github.com/ekinakyurek/torchtune/blob/efd85e000e83dcf6803c623cf83943e4a817377a/torchtune/models/llama3/_tokenizer.py#L51-L55

Here are the notebooks:
Problem 0-99: (5033.9s)
https://www.kaggle.com/code/xu3cpn/dev-ensemble-induction-and-transduction?scriptVersionId=209665817
Problem 100-199: (4017.6s)
https://www.kaggle.com/code/xu3cpn/dev-ensemble-induction-and-transduction?scriptVersionId=209687715
Problem 200-299: (5571.9s)
https://www.kaggle.com/code/xu3cpn/dev-ensemble-induction-and-transduction?scriptVersionId=209747304
Problem 300-399: (4822.4s)
https://www.kaggle.com/code/xu3cpn/dev-ensemble-induction-and-transduction?scriptVersionId=209779088

Score: 251.5/400 = 62.875

@bingwork
Copy link
Author

bingwork commented Dec 1, 2024

Which checkpoints do you use in Meta-Llama-3-8B-Instruct folder?

I am using the downloaded safetensors from Meta-Llama-3-8B-Instruct.

@bingwork
Copy link
Author

bingwork commented Dec 1, 2024

For task 0a1d4ef5, my log is in 0a1d4ef5_log_1732848905.txt.txt
I noticed that my loss values are quite different from yours. Here are mine:

Step 1: loss: 1.4690, lr: 7.14e-06, tokens_per_second_per_gpu: 6844.40
...
Step 144: loss: 0.0259, lr: 0.0, tokens_per_second_per_gpu: 6964.37
Below are yours:

Step 1: loss: 0.3976, lr: 7.14e-06, tokens_per_second_per_gpu: 4387.33
...
Step 144: loss: 7.03e-05, lr: 0.0, tokens_per_second_per_gpu: 7082.98

@ekinakyurek
Copy link
Owner

Which checkpoints do you use in Meta-Llama-3-8B-Instruct folder?

I am using the downloaded safetensors from Meta-Llama-3-8B-Instruct.

Okay that’s the problem I guess. Can you use our finetuned checkpoints as in the paper.

https://huggingface.co/ekinakyurek/marc-8B-finetuned-llama3/tree/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants