You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I notice that in the 'train_nasty.py', when the KL divergence is computed, normal teacher's output (output_stu) is regarded as input and nasty teacher's output (output_tch) is regarded as target. However, in general KD, the fixed model (teacher) is usually regarded as the target and the model that needs update is regarded as the input.
I wonder why you adopt an opposite order in KL loss function. Is there any point here? Thanks!
The text was updated successfully, but these errors were encountered:
Thanks for your asking. I am sorry for the ambiguous variable names. Nevertheless, the name of variables does not affect the results of our paper.
Since we aim to build a nasty teacher model, I just set the name of the outputs from the nasty teacher (the model we want to update) as output_tch (https://github.com/VITA-Group/Nasty-Teacher/blob/main/train_nasty.py#L56).
I set it as "output_stu" simply because at the very beginning of this project, I tried to use a student network here and co-train them together, but later I found that this idea didn't work and I just kept the variable names the same for my other ideas.
Hi, I notice that in the 'train_nasty.py', when the KL divergence is computed, normal teacher's output (output_stu) is regarded as input and nasty teacher's output (output_tch) is regarded as target. However, in general KD, the fixed model (teacher) is usually regarded as the target and the model that needs update is regarded as the input.
I wonder why you adopt an opposite order in KL loss function. Is there any point here? Thanks!
The text was updated successfully, but these errors were encountered: