We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
通常对话数据只在模型的回复上面进行梯度下降。比如把非模型输出的label index变成-100。 类似这样:
<start>user 你好<end> <start> assistant 请问有什么帮助你的?<end>
transformer只在assistant后面进行梯度下降。 rwkv这种结构是不是不太适合这种只在回复上面训练的方式?如果适合,请问在RWKV-LM训练的时候是怎么设定的?
The text was updated successfully, but these errors were encountered:
Sorry, something went wrong.
RWKV的训练方式跟GPT、Llama完全一致。 在User上进行训练似乎没有副作用
感谢。主要很多时候user会犯错,模型要纠正user的错误。不想把错误也学进去。
Please consider using a loss with mask, you can skip user's tokens when calculating cross-entropy
Also, you can use: https://github.com/JL-er/RWKV-PEFT
No branches or pull requests
通常对话数据只在模型的回复上面进行梯度下降。比如把非模型输出的label index变成-100。
类似这样:
transformer只在assistant后面进行梯度下降。
rwkv这种结构是不是不太适合这种只在回复上面训练的方式?如果适合,请问在RWKV-LM训练的时候是怎么设定的?
The text was updated successfully, but these errors were encountered: