Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get hidden_state value? #13

Open
taewhankim opened this issue Nov 15, 2023 · 2 comments
Open

How to get hidden_state value? #13

taewhankim opened this issue Nov 15, 2023 · 2 comments

Comments

@taewhankim
Copy link

taewhankim commented Nov 15, 2023

Thanks for great paper!

I am curious how to get hidden_state value?

and why it has more than double dim size up? I thought it would be (b,50, 768), not a (b,140,768)

Could you explain why is it?

Thanks!!!
gpt2.py 87lines

hidden_states: Optional[Tuple[torch.FloatTensor]]
image

https://github.com/RitaRamo/smallcap/blob/513f4f795950328129014eb37f011d686ab6ed24/src/gpt2.py#L87C13-L87C13

@YovaKem
Copy link
Collaborator

YovaKem commented Nov 25, 2023

The size of the hidden_states matrix is batch size X sequence length X hidden size, so 140 is the sequence length.
I'm not sure what you mean by how you get this value. Do you want to have it as an output of the model or are you asking how the value is computed?

@taewhankim
Copy link
Author

taewhankim commented Jan 19, 2024

Thanks for reply!
I asked a question because I didn't know about the gpt2 structure, but I have now solved it.
It's my bad. Sorry

But I have another question.
When running vanilla code, could you tell why the loss value changes every time, training even though the seed is fixed?

As far as I know, huggningface's trainer has a fixed seed of 42(I even fixed the seeds with code separately). But in this project, the loss value changes every time I run it, so the metric result changes every time. Could you tell me why?

Same code, difference loss & lr results
As training progresses, the difference in loss & lr values ​​increases.

checkpoint-8856/trainer_state.json
case 1:

  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.00022583559169e-05,
      "loss": 2.397,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null

case 2

  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.000338753387534e-05,
      "loss": 2.4005,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null

case 3

{
  "best_metric": null,
  "best_model_checkpoint": null,
  "epoch": 1.0,
  "global_step": 8856,
  "is_hyper_param_search": false,
  "is_local_process_zero": true,
  "is_world_process_zero": true,
  "log_history": [
    {
      "epoch": 1.0,
      "learning_rate": 9.000338753387534e-05,
      "loss": 2.4008,
      "step": 8856
    }
  ],
  "max_steps": 88560,
  "num_train_epochs": 10,
  "total_flos": 0.0,
  "trial_name": null,
  "trial_params": null
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants