Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training on 3DMatch dataset is not working #21

Open
ramdrop opened this issue Apr 29, 2022 · 5 comments
Open

Training on 3DMatch dataset is not working #21

ramdrop opened this issue Apr 29, 2022 · 5 comments

Comments

@ramdrop
Copy link

ramdrop commented Apr 29, 2022

My system configuration:

  • System: Ubuntu 18.04
  • PyTorch 1.9.0 + CUDA 11.1, A100
(torch-points3d-s_H0q_C5-py3.9) (base) torch-points3d$ pip list
Package                           Version
--------------------------------- ------------
absl-py                           1.0.0
addict                            2.4.0
aiohttp                           3.7.4.post0
alabaster                         0.7.12
antlr4-python3-runtime            4.8
anyio                             3.5.0
appdirs                           1.4.4
argon2-cffi                       21.1.0
async-timeout                     3.0.1
attrs                             21.2.0
autobahn                          21.3.1
Automat                           20.2.0
Babel                             2.9.1
backcall                          0.2.0
backports.entry-points-selectable 1.1.0
beautifulsoup4                    4.11.1
bleach                            4.1.0
cachetools                        4.2.2
certifi                           2021.5.30
cffi                              1.14.6
chardet                           4.0.0
charset-normalizer                2.0.6
click                             8.1.2
constantly                        15.1.0
cryptography                      3.4.8
cycler                            0.10.0
debugpy                           1.4.3
decorator                         5.1.0
defusedxml                        0.7.1
deprecation                       2.1.0
distlib                           0.3.3
docker-pycreds                    0.4.0
docutils                          0.17.1
entrypoints                       0.3
fastjsonschema                    2.15.3
filelock                          3.1.0
gdown                             4.4.0
gitdb                             4.0.7
GitPython                         3.1.27
google-auth                       1.35.0
google-auth-oauthlib              0.4.6
googledrivedownloader             0.4
graphql-core                      1.1
grpcio                            1.44.0
h5py                              3.6.0
hydra-core                        1.0.5
hyperlink                         21.0.0
idna                              3.2
imageio                           2.18.0
imagesize                         1.2.0
importlib-metadata                4.11.3
incremental                       21.3.0
install                           1.3.5
ipykernel                         6.4.1
ipython                           7.28.0
ipython-genutils                  0.2.0
ipywidgets                        7.7.0
isodate                           0.6.0
jedi                              0.18.0
Jinja2                            3.1.1
joblib                            1.0.1
json5                             0.9.6
jsonpatch                         1.32
jsonpointer                       2.1
jsonschema                        4.4.0
jupyter-client                    7.0.3
jupyter-core                      4.8.1
jupyter-packaging                 0.12.0
jupyter-server                    1.16.0
jupyterlab                        3.3.4
jupyterlab-pygments               0.1.2
jupyterlab-server                 2.13.0
jupyterlab-widgets                1.0.2
kiwisolver                        1.3.2
laspy                             2.1.2
lazy-object-proxy                 1.6.0
llvmlite                          0.38.0
Mako                              1.2.0
Markdown                          3.3.6
MarkupSafe                        2.0.1
matplotlib                        3.4.3
matplotlib-inline                 0.1.3
MinkowskiEngine                   0.5.4
mistune                           0.8.4
multidict                         5.1.0
nbclassic                         0.3.7
nbclient                          0.5.4
nbconvert                         6.5.0
nbformat                          5.3.0
nest-asyncio                      1.5.1
networkx                          2.6.3
ninja                             1.10.2.3
notebook                          6.4.4
notebook-shim                     0.1.0
numba                             0.55.1
numpy                             1.19.5
oauthlib                          3.1.1
omegaconf                         2.0.6
open3d                            0.15.2
packaging                         21.0
pandas                            1.4.2
pandocfilters                     1.5.0
param                             1.11.1
parso                             0.8.2
pathtools                         0.1.2
pexpect                           4.8.0
pickleshare                       0.7.5
Pillow                            8.3.2
pip                               22.0.4
platformdirs                      2.4.0
plyfile                           0.7.4
prometheus-client                 0.11.0
promise                           2.3
prompt-toolkit                    3.0.20
protobuf                          3.20.1
psutil                            5.9.0
ptyprocess                        0.7.0
pyasn1                            0.4.8
pyasn1-modules                    0.2.8
pycparser                         2.20
pycuda                            2020.1
Pygments                          2.10.0
pyparsing                         2.4.7
pyquaternion                      0.9.9
pyrsistent                        0.18.0
PySocks                           1.7.1
python-dateutil                   2.8.2
python-louvain                    0.16
pytools                           2022.1.4
pytorch-metric-learning           1.3.0
pytz                              2021.1
pyvista                           0.34.1
PyWavelets                        1.3.0
PyYAML                            5.4.1
pyzmq                             22.3.0
rdflib                            6.1.1
requests                          2.26.0
requests-oauthlib                 1.3.0
rsa                               4.7.2
scikit-image                      0.19.2
scikit-learn                      1.0
scipy                             1.6.1
scooby                            0.5.12
Send2Trash                        1.8.0
sentry-sdk                        1.5.10
setproctitle                      1.2.3
setuptools                        62.1.0
shortuuid                         1.0.8
six                               1.16.0
smmap                             4.0.0
sniffio                           1.2.0
snowballstemmer                   2.1.0
soupsieve                         2.3.2.post1
sphinxcontrib-applehelp           1.0.2
sphinxcontrib-devhelp             1.0.2
sphinxcontrib-htmlhelp            2.0.0
sphinxcontrib-jsmath              1.0.1
sphinxcontrib-qthelp              1.0.3
sphinxcontrib-serializinghtml     1.1.5
tensorboard                       2.8.0
tensorboard-data-server           0.6.1
tensorboard-plugin-wit            1.8.1
terminado                         0.12.1
testpath                          0.5.0
threadpoolctl                     2.2.0
tifffile                          2022.4.22
tinycss2                          1.1.1
tomlkit                           0.10.2
torch                             1.9.0+cu111
torch-cluster                     1.6.0
torch-geometric                   1.7.2
torch-points-kernels              0.7.0
torch-scatter                     2.0.9
torch-sparse                      0.6.12
torch-spline-conv                 1.2.1
torchaudio                        0.9.0
torchfile                         0.1.0
torchnet                          0.0.4
torchsparse                       1.4.0
torchvision                       0.10.0+cu111
tornado                           6.1
tqdm                              4.64.0
traitlets                         5.1.0
Twisted                           21.7.0
txaio                             21.2.1
typing-extensions                 3.10.0.2
urllib3                           1.26.7
visdom                            0.1.8.9
vtk                               9.1.0
wandb                             0.12.15
wcwidth                           0.2.5
webencodings                      0.5.1
websocket-client                  1.2.1
Werkzeug                          2.1.1
wheel                             0.37.1
widgetsnbextension                3.6.0
wrapt                             1.12.1
wslink                            1.0.7
yacs                              0.1.8
yapf                              0.32.0
yarl                              1.6.3
zipp                              3.8.0
zope.interface                    5.4.0

I launched a training by running command:

poetry run python train.py task=registration models=registration/ms_svconv_base model_name=MS_SVCONV_B2cm_X2_3head data=registration/fragment3dmatch_sparse training=sparse_fragment_reg tracker_options.make_submission=True training.epochs=200 eval_frequency=10

during training, I found the feat_match_ratio on val and test set remains zero even after ~50 epochs, see the following records for more details:
https://wandb.ai/ramdrop/registration/reports/-humanpose1-MS-SVConv---VmlldzoxOTE5Mjg1?accessToken=a1b84890nit3x8cacs2aja05u9zglukq9hb616ym39jbav31ekztml4qihed1t19

@humanpose1
Copy link
Owner

humanpose1 commented Apr 30, 2022

I am sorry. I forgot to specify it. The model is stuck in a local minimum. To train MS-SVConv with 3 head you must train MS-SVConv with one head and transfer the weights to 3 heads.

poetry run python train.py task=registration models=registration/ms_svconv_base model_name=MS_SVCONV_B2cm_X2_1head data=registration/fragment3dmatch_sparse training=sparse_fragment_reg tracker_options.make_submission=True training.epochs=20 eval_frequency=10

Then, the command.
poetry run python train.py task=registration models=registration/ms_svconv_base model_name=MS_SVCONV_B2cm_X2_3head data=registration/fragment3dmatch_sparse training=sparse_fragment_reg tracker_options.make_submission=True training.wandb.log=True training.batch_size=4 tracker_options.make_submission=True models.path_pretrained= "PATH TO THE .pt model of MS-SVConv with one head"

@ramdrop
Copy link
Author

ramdrop commented May 1, 2022

No problem. I tried the first command (train MS-SVConv with one head)

poetry run python train.py task=registration models=registration/ms_svconv_base model_name=MS_SVCONV_B2cm_X2_1head data=registration/fragment3dmatch_sparse training=sparse_fragment_reg tracker_options.make_submission=True training.epochs=20 eval_frequency=10

but the training results still did not make sense after 10 epochs (hit ratio and feature maching ratio remain zero):
train
for more training details: https://wandb.ai/ramdrop/registration/reports/training-results--VmlldzoxOTI3MzE4?accessToken=vjoy1lurnsnd5a050ym1lj9ht6rumez302ofoiq9ggjttw7fgob6e21cqomz2ivy

@humanpose1
Copy link
Owner

humanpose1 commented May 1, 2022

hydra-config.zip
This is the exact conf file (for the training and not for the fragment generation)

For MS-SVConv with 3 heads: https://wandb.ai/humanpose1/registration/reports/MS-SVConv-3-head-3DMatch--VmlldzoxOTI3NDAw?accessToken=y687z8bnv3ch8mxc2yxmmvjy6je4jmvf0xo69hw4ko4z7yi4un0a6ycl5ynbgf2o

@ramdrop
Copy link
Author

ramdrop commented May 2, 2022

Many thanks for your additional information. With your conf file #21 (comment), I trained MS-SVConv with one head using the command:
poetry run python train.py task=registration models=registration/ms_svconv_base model_name=MS_SVCONV_B2cm_X2_1head data=registration/fragment3dmatch_sparse training=sparse_fragment_reg tracker_options.make_submission=True training.epochs=20 eval_frequency=2
Training results show that :

  1. my training performance on the val set makes some sense (good news), while does not make sense on the test set.
  2. somehow my training performance is lower than yours.
    (My training performance: https://wandb.ai/ramdrop/registration/reports/MS-SVConv-3DMatch-1head--VmlldzoxOTMxNzMx?accessToken=xucwivw8rc7k8vqem283a8uwxqvei1p0h3e38qmfvd3ws7p6qogy5lyrwb9dqit1
    Your training performance: https://wandb.ai/humanpose1/registration/reports/MS-SVConv-3DMatch-1head--VmlldzoxOTI3MzUx?accessToken=pxnuheilrl516fl7xrjfyzrhwp7zvlkpxtrwh18id4k14qr4our6q5h1gbuuu55v)

As you said your provided conf file is for the training and not for the fragment generation, could the problem (1) and (2) result from the data preprocessing part?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants