loss function calculation and text_generation #98

nicolaleo · 2024-04-01T08:29:39Z

nicolaleo
Apr 1, 2024

The formula used in the loss calculation:

def calc_loss_batch:
    logits = model(input_batch)
    loss = torch.nn.functional.cross_entropy(logits, target_batch.flatten())

and the formula used for the final txt generation with temperature, top_k and sampling;

def generate(model, idx, max_new_tokens, context_size, temperature, top_k=None):
   ... logits = model(idx_cond) ...

differs for the management of randomness that manipulate the final logits, creating esentially two distinct path for the training and the use of the model.

I am wondering if we incorporate the text_generation (modified) logit calculation in the training loss calculation could be benefit for the performance of the model.

Intelligence-Manifesto · 2024-04-01T13:08:20Z

Intelligence-Manifesto
Apr 1, 2024

Introducing randomness may potentially make the training process more unstable, and the convergence performance may be affected？

1 reply

rasbt Jun 9, 2024
Maintainer

For some reason, I missed this thread earlier. It's actually an interesting point to introduce more randomness in the training to prevent overfitting perhaps. I think in this case it would essentially be analogous to synthetic data / data augmentation etc. but on the model side. I am not quite sure about it handles in practice since we have to differentiate that, but it's not a bad idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss function calculation and text_generation #98

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

loss function calculation and text_generation #98

nicolaleo Apr 1, 2024

Replies: 1 comment · 1 reply

Intelligence-Manifesto Apr 1, 2024

rasbt Jun 9, 2024 Maintainer

nicolaleo
Apr 1, 2024

Replies: 1 comment 1 reply

Intelligence-Manifesto
Apr 1, 2024

rasbt Jun 9, 2024
Maintainer