Replies: 1 comment 1 reply
-
Introducing randomness may potentially make the training process more unstable, and the convergence performance may be affected? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The formula used in the loss calculation:
and the formula used for the final txt generation with temperature, top_k and sampling;
differs for the management of randomness that manipulate the final logits, creating esentially two distinct path for the training and the use of the model.
I am wondering if we incorporate the
text_generation
(modified) logit calculation in the training loss calculation could be benefit for the performance of the model.Beta Was this translation helpful? Give feedback.
All reactions