Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The asymmetry of KL divergence. #2

Open
xuguodong03 opened this issue Apr 8, 2021 · 1 comment
Open

The asymmetry of KL divergence. #2

xuguodong03 opened this issue Apr 8, 2021 · 1 comment

Comments

@xuguodong03
Copy link

Hi, I notice that in the 'train_nasty.py', when the KL divergence is computed, normal teacher's output (output_stu) is regarded as input and nasty teacher's output (output_tch) is regarded as target. However, in general KD, the fixed model (teacher) is usually regarded as the target and the model that needs update is regarded as the input.

I wonder why you adopt an opposite order in KL loss function. Is there any point here? Thanks!

@HowieMa
Copy link
Collaborator

HowieMa commented Apr 8, 2021

Thanks for your asking. I am sorry for the ambiguous variable names. Nevertheless, the name of variables does not affect the results of our paper.

Since we aim to build a nasty teacher model, I just set the name of the outputs from the nasty teacher (the model we want to update) as output_tch (https://github.com/VITA-Group/Nasty-Teacher/blob/main/train_nasty.py#L56).
I set it as "output_stu" simply because at the very beginning of this project, I tried to use a student network here and co-train them together, but later I found that this idea didn't work and I just kept the variable names the same for my other ideas.

Maybe I should change the name of the output from the fixed model (output_stu in https://github.com/VITA-Group/Nasty-Teacher/blob/main/train_nasty.py#L64) to output_adv to make things clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants