You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Found the dataloader in https://github.com/deepmodeling/deepmd-kit/blob/master/deepmd/train/trainer.py. It uses https://github.com/deepmodeling/deepmd-kit/tree/master/deepmd/utils/random.py and data_system.py in that same utils dir. random.py is just a wrapper around an older numpy random function (RandomState) which is technically deprecated, but there is a seed set that is passed in from the input json file that should work ok. Otherwise the frames are just chosen using this RNG (which is also strange since you would think you would want to train on ALL the frames, not just a random subset, that could potentially have repetitions?). But anyway, it does seem like at this DeePMD level, the data loading should be deterministic. We still may have some type of streaming happening at the TF or Horovod level though.
Need to next check the TF/horovod levels of distributed training to see if there may be some task stealing or asynchronous data streaming or something.
Take a deep dive into the DeepMD code-base. We need to understand fundamentally how it works.
source
people
find where tensorflow is being invoked
find the dataloader
The text was updated successfully, but these errors were encountered: