How to get hidden_state value? #13

taewhankim · 2023-11-15T18:16:44Z

Thanks for great paper!

I am curious how to get hidden_state value?

and why it has more than double dim size up? I thought it would be (b,50, 768), not a (b,140,768)

Could you explain why is it?

Thanks!!!
gpt2.py 87lines

hidden_states: Optional[Tuple[torch.FloatTensor]]

https://github.com/RitaRamo/smallcap/blob/513f4f795950328129014eb37f011d686ab6ed24/src/gpt2.py#L87C13-L87C13

The text was updated successfully, but these errors were encountered:

YovaKem · 2023-11-25T04:45:04Z

The size of the hidden_states matrix is batch size X sequence length X hidden size, so 140 is the sequence length.
I'm not sure what you mean by how you get this value. Do you want to have it as an output of the model or are you asking how the value is computed?

taewhankim · 2024-01-19T12:54:10Z

Thanks for reply!
I asked a question because I didn't know about the gpt2 structure, but I have now solved it.
It's my bad. Sorry

But I have another question.
When running vanilla code, could you tell why the loss value changes every time, training even though the seed is fixed?

As far as I know, huggningface's trainer has a fixed seed of 42(I even fixed the seeds with code separately). But in this project, the loss value changes every time I run it, so the metric result changes every time. Could you tell me why?

Same code, difference loss & lr results
As training progresses, the difference in loss & lr values increases.

checkpoint-8856/trainer_state.json
case 1:

  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.00022583559169e-05,
      "loss": 2.397,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null

case 2

  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.000338753387534e-05,
      "loss": 2.4005,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null

case 3

{
  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.000338753387534e-05,
      "loss": 2.4008,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null
}

taewhankim closed this as completed Nov 16, 2023

taewhankim reopened this Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get hidden_state value? #13

How to get hidden_state value? #13

taewhankim commented Nov 15, 2023 •

edited

Loading

YovaKem commented Nov 25, 2023

taewhankim commented Jan 19, 2024 •

edited

Loading

How to get hidden_state value? #13

How to get hidden_state value? #13

Comments

taewhankim commented Nov 15, 2023 • edited Loading

YovaKem commented Nov 25, 2023

taewhankim commented Jan 19, 2024 • edited Loading

taewhankim commented Nov 15, 2023 •

edited

Loading

taewhankim commented Jan 19, 2024 •

edited

Loading