Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM issue #29

Open
JunhyeongDoyle opened this issue Jun 22, 2023 · 3 comments
Open

OOM issue #29

JunhyeongDoyle opened this issue Jun 22, 2023 · 3 comments

Comments

@JunhyeongDoyle
Copy link

Hi! Thanks for sharing awesome work.

I'm trying to train a model with dynerf data, but I keep encountering an OOM issue.

( Adjusted down_sample and num_steps in config for initial dynerf training )

'save_outputs': True,
 'scene_bbox': [[-3.0, -1.8, -1.2], [3.0, 1.8, 1.2]],
 'scheduler_type': 'warmup_cosine',
 'single_jitter': False,
 'time_smoothness_weight': 0.001,
 'time_smoothness_weight_proposal_net': 1e-05,
 'train_fp16': True,
 'use_proposal_weight_anneal': True,
 'use_same_proposal_network': False,
 'valid_every': 30000}
2023-06-23 04:43:45,251|    INFO| Loading Video360Dataset with downsample=4.0
Loading train data: 100%|███████████████████████████████████████████████████████████████| 19/19 [00:41<00:00,  2.20s/it]2023-06-23 04:44:53,937|    INFO| Computed 1953572400 ISG weights in 24.48s.
killed

When checked with dmesg, the following error appeared:

[2916867.742639] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-1004.slice/session-1589.scope,task=python,pid=3555848,uid=1004
[2916867.742721] Out of memory: Killed process 3555848 (python) total-vm:167336276kB, anon-rss:123357620kB, file-rss:4kB, shmem-rss:8kB, UID:1004 pgtables:243752kB oom_score_adj:0
[2916871.967326] oom_reaper: reaped process 3555848 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB
@sarafridov
Copy link
Owner

The preprocessing step for dynerf is pretty CPU-memory-intensive; I remember getting similar issues if I tried to run too many of these in parallel or without downsampling. I uploaded my precomputed sampling weights for some of the scenes (the .pt files here for salmon and in flamesteak_explicit and searsteak_explicit), so you can try downloading those weights into your data folders and then running the actual training step to see if it works. Note that the salmon scene is a bit more memory-intensive than the others.

@JunhyeongDoyle
Copy link
Author

@sarafridov Thanks for sharing :)

@JokerYan
Copy link

thank you very much for the sharing. It is very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants