You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your work!
I'm now reproducing your paper, but I'm having some difficulties. When I was training language modeling tasks using the default parameters in readme, I encountered unstable training. The details are as follows:
0/200 [train] loss=5.945 [val] loss=5.917, pp=371.43, acc=0.185491 [time per itr] 1403.69ms [lr] 0.00003
0/400 [train] loss=5.655 [val] loss=5.477, pp=239.12, acc=0.196609 [time per itr] 1223.88ms [lr] 0.00005
0/600 [train] loss=5.285 [val] loss=5.259, pp=192.26, acc=0.200577 [time per itr] 1213.27ms [lr] 0.00010
0/800 [train] loss=5.326 [val] loss=5.250, pp=190.52, acc=0.197866 [time per itr] 1206.03ms [lr] 0.00015
0/1000 [train] loss=4.970 [val] loss=5.168, pp=175.63, acc=0.202474 [time per itr] 1197.92ms [lr] 0.00022
0/1200 [train] loss=5.088 [val] loss=5.093, pp=162.88, acc=0.206467 [time per itr] 1198.33ms [lr] 0.00031
0/1400 [train] loss=nan [val] loss=nan, pp=nan, acc=0.000956 [time per itr] 1183.48ms [lr] 0.00041
0/1600 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001068 [time per itr] 1086.21ms [lr] 0.00052
0/1800 [train] loss=nan [val] loss=nan, pp=nan, acc=0.000971 [time per itr] 1090.91ms [lr] 0.00063
0/2000 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001470 [time per itr] 1092.99ms [lr] 0.00075
0/2200 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001216 [time per itr] 1090.09ms [lr] 0.00088
0/2400 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001114 [time per itr] 1089.34ms [lr] 0.00101
0/2600 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001083 [time per itr] 1090.32ms [lr] 0.00114
0/2800 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001165 [time per itr] 1089.56ms [lr] 0.00127
0/3000 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001149 [time per itr] 1085.54ms [lr] 0.00139
0/3200 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001246 [time per itr] 1087.70ms [lr] 0.00151
You can see that the learning rate keeps climbing and eventually stays at 0.0200. Because of the company firewall, I couldn't access the Internet while running code, so instead of using the tiktoken library for tokenizer, I used the GPT2Tokenizer provided by the transformers library. (I downloaded the vocab.json and merges.txt files to the local host and uploaded them to the server.) The modified code(in /root/xxx/landmark-attention/lm_benchmark/data/pg19/prepare.py) is as follows:
”from transformers import GPT2Tokenizer
vocab_file_path = "/root/xxx/landmark-attention/lm_benchmark/data/pg19/vocab.json"
merges_file_path = "/root/xxx/landmark-attention/lm_benchmark/data/pg19/merges.txt"
gpt2_tokenizer = GPT2Tokenizer(
vocab_file=vocab_file_path,
merges_file=merges_file_path,
)
def _read_directory(path):
...(keep the same)
with open(os.path.join(path, filename), 'r') as f:
texts.extend(gpt2_tokenizer.encode(f.read()))
texts.append(gpt2_tokenizer.eos_token_id)
“
I want to know if this is related to training instability, because I have not made any other changes other than that, thank you very much for your reply!
The text was updated successfully, but these errors were encountered:
Thank you very much for your reply! I'm using float32 for training. The problem was about torch.compile, the training is stable without using torch.compile.
Thanks for your work!
I'm now reproducing your paper, but I'm having some difficulties. When I was training language modeling tasks using the default parameters in readme, I encountered unstable training. The details are as follows:
0/200 [train] loss=5.945 [val] loss=5.917, pp=371.43, acc=0.185491 [time per itr] 1403.69ms [lr] 0.00003
0/400 [train] loss=5.655 [val] loss=5.477, pp=239.12, acc=0.196609 [time per itr] 1223.88ms [lr] 0.00005
0/600 [train] loss=5.285 [val] loss=5.259, pp=192.26, acc=0.200577 [time per itr] 1213.27ms [lr] 0.00010
0/800 [train] loss=5.326 [val] loss=5.250, pp=190.52, acc=0.197866 [time per itr] 1206.03ms [lr] 0.00015
0/1000 [train] loss=4.970 [val] loss=5.168, pp=175.63, acc=0.202474 [time per itr] 1197.92ms [lr] 0.00022
0/1200 [train] loss=5.088 [val] loss=5.093, pp=162.88, acc=0.206467 [time per itr] 1198.33ms [lr] 0.00031
0/1400 [train] loss=nan [val] loss=nan, pp=nan, acc=0.000956 [time per itr] 1183.48ms [lr] 0.00041
0/1600 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001068 [time per itr] 1086.21ms [lr] 0.00052
0/1800 [train] loss=nan [val] loss=nan, pp=nan, acc=0.000971 [time per itr] 1090.91ms [lr] 0.00063
0/2000 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001470 [time per itr] 1092.99ms [lr] 0.00075
0/2200 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001216 [time per itr] 1090.09ms [lr] 0.00088
0/2400 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001114 [time per itr] 1089.34ms [lr] 0.00101
0/2600 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001083 [time per itr] 1090.32ms [lr] 0.00114
0/2800 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001165 [time per itr] 1089.56ms [lr] 0.00127
0/3000 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001149 [time per itr] 1085.54ms [lr] 0.00139
0/3200 [train] loss=nan [val] loss=nan, pp=nan, acc=0.001246 [time per itr] 1087.70ms [lr] 0.00151
You can see that the learning rate keeps climbing and eventually stays at 0.0200. Because of the company firewall, I couldn't access the Internet while running code, so instead of using the tiktoken library for tokenizer, I used the GPT2Tokenizer provided by the transformers library. (I downloaded the vocab.json and merges.txt files to the local host and uploaded them to the server.) The modified code(in /root/xxx/landmark-attention/lm_benchmark/data/pg19/prepare.py) is as follows:
”from transformers import GPT2Tokenizer
vocab_file_path = "/root/xxx/landmark-attention/lm_benchmark/data/pg19/vocab.json"
merges_file_path = "/root/xxx/landmark-attention/lm_benchmark/data/pg19/merges.txt"
gpt2_tokenizer = GPT2Tokenizer(
vocab_file=vocab_file_path,
merges_file=merges_file_path,
)
def _read_directory(path):
...(keep the same)
with open(os.path.join(path, filename), 'r') as f:
texts.extend(gpt2_tokenizer.encode(f.read()))
texts.append(gpt2_tokenizer.eos_token_id)
“
I want to know if this is related to training instability, because I have not made any other changes other than that, thank you very much for your reply!
The text was updated successfully, but these errors were encountered: