Help understanding prepare_batches and prepare_subfunctions #140
-
Hi! First of all, I am a massive fan of MIScnn. I've been playing around with it for several months now and it has been incredibly useful. Huge props to the developers and those who help maintain the code-base. I have a quick question regarding the prepare_* functions in preprocessor.py. Let's say we have several data augmentations, example: cycles=2, scaling=True, rotations=True, elastic_deform=True, mirror=True and also several subfunctions, example: subfunctions = [sf_resample, sf_clipping, sf_normalize] What is actually happening behind the scenes when we prepare either one of these to save on disk? Is there a specific order that happens with data augmentations, subfunctions, and saving? It seems like these functions should save time since we are saving recompute at the expense of disk space. Since subfunctions are sequential and the same always, it seems like we should always prepare_subfunctions when we have the space. Would you agree that this is the case? For prepare_batches, I have a quick question. When we say prepare_batches=True, does that randomly take a training batch and apply a random cycle of the above data augmentations and save it to disk? And then all future epochs re-use that training batch with the same augmentation? Because then it seems like each epoch we will be training on the same data aug for that batch, whereas we may wish to take new random data aug cycles per batch per epoch (which seems like something we might compute in an online environment per-epoch). Or is it just saying that we are saving the raw image batch on disk on pickle format but then we apply the data augmentation after loading? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey @ChrisJWest,
Thank you! Always happy to hear that MIScnn is useful for the community :)
Sure, the general process behind the scene is the following:
So now we have the two parameter: prepare_batches and prepare_subfunctions prepare_batches: What does this mean? Why should we ever use this? I want that my images are randomly altered in each epoch! If you save the batches to disk and reload the batches, doesn't that mean that the sample grouping is always the same? prepare_subfcuntions: What does this mean? Sounds good! Any adverse effects? Hope that this is a bit helpful :)
Yes, correct!
Yes, absolutely correct! In my opinion, this is only useful if you have enough images and don't want to apply image augmentation at all or just very marginal. But for my experiments, I always use and recommend online augmentation (so I always disable prepare_batches).
No, this is only done for the prepare_subfunction parameter. Cheers, |
Beta Was this translation helpful? Give feedback.
Hey @ChrisJWest,
Thank you! Always happy to hear that MIScnn is useful for the community :)
And also nice to see that the new Discussion Forum is useful ;)