Questions about discriminative_fine_tuning #5

wlhgtc · 2020-06-12T07:51:16Z

In Section 5.4.3 " We find that assign a lower learn- ing rate to the lower layer is effective to fine-tuning BERT, and an appropriate setting is ξ=0.95 and lr=2.0e-5."
Compared to the code in https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier.py#L812
Seem that you divide the bert layer into 3 part (4 layers for one part) and set different learning rate for each part.
Some questions about it:

How could the decay factor 0.95 match the number 2.6 in code ?
And the last classify layer seem not be contained , no need to set lr for it ?

xuyige · 2020-06-25T14:31:30Z

Thank you for your issue!

The number 2.6 was set for the beginning experiments, after that, we use run_classifier_discriminative.py for discriminative fine-tuning.
The link to run_classifier_discriminative.py is https://github.com/xuyige/BERT4doc-Classification/blob/master/codes/fine-tuning/run_classifier_discriminative.py
The classifier layer is contained in run_classifier_discriminative.py.

wlhgtc · 2020-06-28T04:12:54Z

Thanks for your reply, I will try it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about discriminative_fine_tuning #5

Questions about discriminative_fine_tuning #5

wlhgtc commented Jun 12, 2020

xuyige commented Jun 25, 2020

wlhgtc commented Jun 28, 2020

Questions about discriminative_fine_tuning #5

Questions about discriminative_fine_tuning #5

Comments

wlhgtc commented Jun 12, 2020

xuyige commented Jun 25, 2020

wlhgtc commented Jun 28, 2020