Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsic/Extrinsic parameters between different datasets #22

Open
hedjm opened this issue Sep 20, 2020 · 2 comments
Open

Intrinsic/Extrinsic parameters between different datasets #22

hedjm opened this issue Sep 20, 2020 · 2 comments

Comments

@hedjm
Copy link

hedjm commented Sep 20, 2020

Thank you for this great work, you said in your comment here(#15 (comment)_) "Here the Adaptive Graph U-Net is exactly learning this transformation for a very specific camera and angle condition." but in the paper, in the last paragraph of the introduction, you mentioned that you pretrained the 2D to 3D GraphUNET on synthetic data (Obman) which have a totally different intrinsic/extrinsic parameters, would you please clarify this?

Thank you again for your work.

@bardiadoosti
Copy link
Owner

Hi Mohamed,
Yes GraphUNet's performance is conditioned on the camera parameters of a particular dataset and a model trained with one dataset will not work on another dataset with different conditions. But we should consider that pre-training does not have to be with a dataset with exactly the same conditions. First, Obman can be helpful because of transfer learning. In transfer learning the model is trained with a different objective function which does not directly relate to the desired task (e.g. using a model pre-trained with ImageNet for a completely different task). Also in this case, Obman helped more to pre-train the image encoder and helped us to get the better initial 2D estimations. Of course it may be helpful in the graph parts as well.

@hedjm
Copy link
Author

hedjm commented Sep 21, 2020

@bardiadoosti Thank you for replying to my comment.

I was refering to this paragraph in the introduction: "we are not limited to
training on only annotated real images, but can instead pretrain the 2D to 3D network separately with synthetic images
rendered from 3D meshes of hands interacting with objects
(e.g. ObMan dataset [10]).
", What I don't agree about in this sentence is that you did not mention anything about pre-training the image encoder, what I understand is that you are only training the 2D to 3D network (GraphUnet). Also, when we do finetuning we need to be careful when choosing the learning rate, freeze some layers' weights, etc. Furthermore, as far as I know, the 2D in the Obman dataset are located in images of size 256x256 which is much smaller than the 2D in the FPAD dataset (you are not resizing the 2D to 224x224 in your case). Anyway, I am very interested in your amazing work and I like it, I find that you opened many opportunities for further improvements.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants