Skip to content
This repository has been archived by the owner on Mar 15, 2024. It is now read-only.

No 3D-Awareness #19

Open
fentopa opened this issue Mar 27, 2022 · 9 comments
Open

No 3D-Awareness #19

fentopa opened this issue Mar 27, 2022 · 9 comments

Comments

@fentopa
Copy link

fentopa commented Mar 27, 2022

Hi,

thanks for the work! I have trained the model on FFHQ (default settings) and another dataset you did not use and which is not public yet. I have tuned a bit the camera parameters for the other dataset. I have observed that the image quality is great but when sampling from different camera poses with render_rotation_camera, it is basically always the middle view. So there is no 3D rotation. In the paper you mentioned that you observed that this might happen due to only using NeRF path regularization. Actually with some training seeds it works a bit (still not as good as the images you have shown) and with some not at all. So it is also unstable. Any ideas to prevent this from happening, especially when using a new dataset? So do I have to be careful with specific parameters like any of the camera parameters?

@MultiPath
Copy link
Contributor

Can you comment the training script? I can re-run on my side. I did not find this issue serious previously.

@fentopa
Copy link
Author

fentopa commented Mar 27, 2022

Thanks for the quick answer! I used the normal training command (python run_train.py outdir=${OUTDIR} data=${DATASET} spec=paper512 model=stylenerf_ffhq) and using the normal code. On FFHQ it is ok, so no need to rerun, but on another dataset I am using I mainly get flat outputs (since the beginning of training throughout the whole training progress). So no 3D-awareness. Is there any advice to prevent this from happening on new datasets? Maybe you encountered during developing the architecture that some things lead to flat outputs, like specific parameters etc.?

@MultiPath
Copy link
Contributor

Is that dataset public available?

Basically in this version we may still need to manually set the hyper-parameters of the cameras. In config you can see range_u range_v which defines the camera distribution.

@fentopa
Copy link
Author

fentopa commented Mar 27, 2022

Unfortunately it is not publicly available. The dataset only has faces, so I thought FFHQ parameters might work good. I have already increase range_u to -0.6 0.6 which helped a little bit (more than that led to bad results) , but outputs are still much more flat than with FFHQ. But I wasn't sure how to tune the parameters (how should I set u and v?), if you can help me out here to improve the 3D-awareness with your model. I have to say that I use lower resolution downsampled images (64x64) and only use 64dim latent code and hidden dimension to reduce the model capacity. Is that maybe a problem as well?

@MultiPath
Copy link
Contributor

Did you also try uniform distribution instead of Gaussian for the camera?

@fentopa
Copy link
Author

fentopa commented Mar 27, 2022

Not yet, I will try it out thanks! I will get back when I tried this. So you think it has to be down to the camera parameters?

@MultiPath
Copy link
Contributor

Based on my experience camera matters a lot. I also have another configuration which contains unpublished codes which might be helpful. But we can see if tuning the camera will help or not

@fentopa
Copy link
Author

fentopa commented Apr 6, 2022

I played around with the camera parameters, also uniform distribution did not help. I have the feeling that when learning on the coarsest scale the 3D-awareness is fine but then when the progressive training continues on the finer scales, the 3D-awareness gets worse. Did you have similar experience? Is it because the nerf path regularization is only done on the coarsest scale? I am training on 32x32 and then 64x64, so only training on two scales progressively, using less than 32x32 did not help. Don't you think it is related to the nerf path regularization?

Edit: Or is the reason the 2D upsampling? Because I am leaving resolution_vol on 32, so this might introduce the problem. When training in the first step on 32x32, the results are good, then when continuing on 64x64 the 3D-awareness is gone and I have flat outputs. Also directly training on 64x64 leads to bad results. Does it make sense to use more n_reg_samples or train longer on 32x32? I am afraid when I will go even beyond 64x64 it will get worse. It seems like the first stage works fine but the nerf path regularization does not prevent the flat outputs. I also do not have enough memory to use more than resolution_vol=32

@KyriaAnnwyn
Copy link

@fentopa how long did it take to train on FFHQ? Could you please share your trained pkl?

I also have the same issue on different dataset. I guess it could be because ffhq has a lot of different poses of the person. Mine has mostly frontal images.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants