You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.
Third, I find that BigGAN-pytorch omit the first relu activation in the first DBlock by setting preactivation=False, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP and BigGAN-pytorch, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.
Do you think these discrepancy would have a significant influence on the performance of BigGAN?
Thanks
The text was updated successfully, but these errors were encountered:
that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.
Are you sure about that? The logic of doing the conv_sc stuff appears to be the same in both compare_gan and BigGAN-Pytorch: check channels, and if not, not.
You may have a point about the pooling/convolution order. Have you tried swapping them? I hope it doesn't make a difference. (mooch noted that compare_gan never converged to the quality of the original BigGAN or BigGAN-Pytorch, but that no one knew why; we found the same thing, the final quality, no matter how many runs we did, was never nearly as good as it should be. Convolution-then-pooling instead of pooling-then-convolution doesn't seem like it ought to matter that much... but who knows?) Do you have a diff for that or have you tried running it?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi, I find that there are some details in the implementation of BigGAN worth paying attention to.
First, I notice that the default moments used for batchnorm during inference are the accumulated values:
compare_gan/example_configs/biggan_imagenet128.gin
Line 30 in 3af50e3
compare_gan/compare_gan/architectures/arch_ops.py
Lines 299 to 304 in e0b739f
Does it mean that the hyperparameter
decay
for batchnorm is not used at all?compare_gan/example_configs/biggan_imagenet128.gin
Line 28 in 3af50e3
Second, I also notice that the shortcuts are added only when
in_channels !=out_channels
:compare_gan/compare_gan/architectures/resnet_biggan.py
Line 339 in 3af50e3
which is different from
BigGAN-pytorch
:https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L388
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L427
that uses shortcuts all the time and the shortcuts are learnable when
in_channels !=out_channels
or when the block is an upsampling or downsampling block.Third, I find that
BigGAN-pytorch
omit the first relu activation in the first DBlock by settingpreactivation=False
, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP andBigGAN-pytorch
, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.Do you think these discrepancy would have a significant influence on the performance of BigGAN?
Thanks
The text was updated successfully, but these errors were encountered: