-
I am wondering why the code for training only the last layer works, because it only passes the code: for param in model.parameters():
param.requires_grad = False
...
if args.trainable_layers == "last_layer":
pass
... I have set it more specifically, with the same results: for param in model.parameters():
param.requires_grad = False
...
if args.trainable_layers == "last_layer":
# Get the last layer
last_layer = list(model.children())[-1]
# Make the last layer trainable
for param in last_layer.parameters():
param.requires_grad = True
... What do you also think about adding test for the two last layers + two last transformer blocks, like from your article about finetuning LLMs: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 13 replies
-
Oh that's simply because we first make all layers untrainable, and then we replace the last layer with |
Beta Was this translation helpful? Give feedback.
-
That was a good suggestion, the performance of the last two blocks is quite good! |
Beta Was this translation helpful? Give feedback.
Oh that's simply because we first make all layers untrainable, and then we replace the last layer with
nn.Linear
, andnn.Linear
is trainable by default. And that's becausenn.Linear
usesnn.Parameter
, which hasrequires_grad=True
by default.