You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for this great work, you said in your comment here(#15 (comment)_) "Here the Adaptive Graph U-Net is exactly learning this transformation for a very specific camera and angle condition." but in the paper, in the last paragraph of the introduction, you mentioned that you pretrained the 2D to 3D GraphUNET on synthetic data (Obman) which have a totally different intrinsic/extrinsic parameters, would you please clarify this?
Thank you again for your work.
The text was updated successfully, but these errors were encountered:
Hi Mohamed,
Yes GraphUNet's performance is conditioned on the camera parameters of a particular dataset and a model trained with one dataset will not work on another dataset with different conditions. But we should consider that pre-training does not have to be with a dataset with exactly the same conditions. First, Obman can be helpful because of transfer learning. In transfer learning the model is trained with a different objective function which does not directly relate to the desired task (e.g. using a model pre-trained with ImageNet for a completely different task). Also in this case, Obman helped more to pre-train the image encoder and helped us to get the better initial 2D estimations. Of course it may be helpful in the graph parts as well.
I was refering to this paragraph in the introduction: "we are not limited to
training on only annotated real images, but can instead pretrain the 2D to 3D network separately with synthetic images
rendered from 3D meshes of hands interacting with objects
(e.g. ObMan dataset [10]).", What I don't agree about in this sentence is that you did not mention anything about pre-training the image encoder, what I understand is that you are only training the 2D to 3D network (GraphUnet). Also, when we do finetuning we need to be careful when choosing the learning rate, freeze some layers' weights, etc. Furthermore, as far as I know, the 2D in the Obman dataset are located in images of size 256x256 which is much smaller than the 2D in the FPAD dataset (you are not resizing the 2D to 224x224 in your case). Anyway, I am very interested in your amazing work and I like it, I find that you opened many opportunities for further improvements.
Thank you for this great work, you said in your comment here(#15 (comment)_) "Here the Adaptive Graph U-Net is exactly learning this transformation for a very specific camera and angle condition." but in the paper, in the last paragraph of the introduction, you mentioned that you pretrained the 2D to 3D GraphUNET on synthetic data (Obman) which have a totally different intrinsic/extrinsic parameters, would you please clarify this?
Thank you again for your work.
The text was updated successfully, but these errors were encountered: