Details in the implementation of BigGAN #50

tsc2017 · 2020-05-15T06:26:32Z

Hi, I find that there are some details in the implementation of BigGAN worth paying attention to.

First, I notice that the default moments used for batchnorm during inference are the accumulated values:

compare_gan/example_configs/biggan_imagenet128.gin

Line 30 in 3af50e3

standardize_batch.use_moving_averages = False

compare_gan/compare_gan/architectures/arch_ops.py

Lines 299 to 304 in e0b739f

    
           if use_moving_averages: 
        
             mean, variance = _moving_moments_for_inference( 
        
                 mean=mean, variance=variance, is_training=is_training, decay=decay) 
        
           else: 
        
             mean, variance = _accumulated_moments_for_inference( 
        
                 mean=mean, variance=variance, is_training=is_training)

Does it mean that the hyperparameter decay for batchnorm is not used at all?

compare_gan/example_configs/biggan_imagenet128.gin

Line 28 in 3af50e3

standardize_batch.decay = 0.9

Second, I also notice that the shortcuts are added only when in_channels !=out_channels:

compare_gan/compare_gan/architectures/resnet_biggan.py

Line 339 in 3af50e3

add_shortcut=in_channels != out_channels,

which is different from BigGAN-pytorch:
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L388
https://github.com/ajbrock/BigGAN-PyTorch/blob/98459431a5d618d644d54cd1e9fceb1e5045648d/layers.py#L427
that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Third, I find that BigGAN-pytorch omit the first relu activation in the first DBlock by setting preactivation=False, which is consistent with the implementation of WGAN-GP(I guess since the range you use for the imput of D is [0,1] instead of [-1, 1], the first relu does not harm). Also, in the shortcut connecting of the first DBlock in WGAN-GP and BigGAN-pytorch, pooling comes before convolution, while in this repo, convolution comes before pooling, as in the other DBlocks.

Do you think these discrepancy would have a significant influence on the performance of BigGAN?

Thanks

The text was updated successfully, but these errors were encountered:

Baran-phys · 2020-10-28T21:32:36Z

same question.

gwern · 2020-11-18T15:05:34Z

that uses shortcuts all the time and the shortcuts are learnable when in_channels !=out_channels or when the block is an upsampling or downsampling block.

Are you sure about that? The logic of doing the conv_sc stuff appears to be the same in both compare_gan and BigGAN-Pytorch: check channels, and if not, not.

You may have a point about the pooling/convolution order. Have you tried swapping them? I hope it doesn't make a difference. (mooch noted that compare_gan never converged to the quality of the original BigGAN or BigGAN-Pytorch, but that no one knew why; we found the same thing, the final quality, no matter how many runs we did, was never nearly as good as it should be. Convolution-then-pooling instead of pooling-then-convolution doesn't seem like it ought to matter that much... but who knows?) Do you have a diff for that or have you tried running it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Details in the implementation of BigGAN #50

Details in the implementation of BigGAN #50

tsc2017 commented May 15, 2020 •

edited

Loading

Baran-phys commented Oct 28, 2020

gwern commented Nov 18, 2020

Details in the implementation of BigGAN #50

Details in the implementation of BigGAN #50

Comments

tsc2017 commented May 15, 2020 • edited Loading

Baran-phys commented Oct 28, 2020

gwern commented Nov 18, 2020

tsc2017 commented May 15, 2020 •

edited

Loading