The asymmetry of KL divergence. #2

xuguodong03 · 2021-04-08T11:21:20Z

Hi, I notice that in the 'train_nasty.py', when the KL divergence is computed, normal teacher's output (output_stu) is regarded as input and nasty teacher's output (output_tch) is regarded as target. However, in general KD, the fixed model (teacher) is usually regarded as the target and the model that needs update is regarded as the input.

I wonder why you adopt an opposite order in KL loss function. Is there any point here? Thanks!

HowieMa · 2021-04-08T17:13:11Z

Thanks for your asking. I am sorry for the ambiguous variable names. Nevertheless, the name of variables does not affect the results of our paper.

Since we aim to build a nasty teacher model, I just set the name of the outputs from the nasty teacher (the model we want to update) as output_tch (https://github.com/VITA-Group/Nasty-Teacher/blob/main/train_nasty.py#L56).
I set it as "output_stu" simply because at the very beginning of this project, I tried to use a student network here and co-train them together, but later I found that this idea didn't work and I just kept the variable names the same for my other ideas.

Maybe I should change the name of the output from the fixed model (output_stu in https://github.com/VITA-Group/Nasty-Teacher/blob/main/train_nasty.py#L64) to output_adv to make things clear.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The asymmetry of KL divergence. #2

The asymmetry of KL divergence. #2

xuguodong03 commented Apr 8, 2021

HowieMa commented Apr 8, 2021

The asymmetry of KL divergence. #2

The asymmetry of KL divergence. #2

Comments

xuguodong03 commented Apr 8, 2021

HowieMa commented Apr 8, 2021