Releases: marovira/helios-ml
Releases · marovira/helios-ml
0.1.8
Updates
- Allow easy access to the datasets held by the
DataModule
. Previously there was no direct way of accessing them without having to go through the private members of theDataModule
. This complicated certain cases where the length of the dataset was required. - Added a way to halt training based on arbitrary conditions. The main use-case for this is to allow the
Model
sub-classes to halt training when the trained network has converged to a value or if the network is diverging and there's no reason to continue. - Addresses a potential crash that occurs whenever training occurs on a
None
checkpoint path.
Full Changelog
0.1.7
Updates
- For iteration training, the global iteration is now updated correctly. Previously it was updated in the middle of the training loop, which caused the progress bar and the log flag passed in to the model after the batch was over to be out of sync with the global iteration count. This has now been addressed by updating the global iteration count at the top of the iteration loop.
- Removes the callback system from the trainer. Given the current implementation, there's nothing that the callbacks could do that couldn't be performed by overriding the corresponding function in the model or the datamodule.
- Adds wrappers for printing which allow the user to choose which rank (global or local) the print should happen on.
- Adds a wrapper for
torch.distributed.barrier
which works in both distributed and regular contexts.
Full Changelog
0.1.6
Updates
- Adds a getter for
swa_utils.EMA
so the underlying network can be easily retrieved. - Better import support for the core package. Aliasing the core package and importing sub-modules from it is now supported.
- The trainer no longer prints duplicate messages when using distributed training.
- Allows the trainer to populate the registries itself. This provides better support for distributed training that uses spawn.
- The internal distributed flag for the trainer is now correctly set when invoked through
torchrun
.
Full Changelog
0.1.5
Updates
- Fixes an issue where printing/saving were incorrectly called whenever training by iteration used accumulation steps. This was caused by an incorrect guarding of the printing, validation, and saving operations.
Full Changelog
0.1.4
Updates
- Added a context manager to disable cuDNN benchmark on scope.
- Fixes an issue where cuDNN is disabled upon entering the validation code but is never re-enabled. This could lead to poor performance after the first validation cycle.
Full Changelog
0.1.3
Updates
args
andkwargs
are now consistently typed throughout the code.- Re-works the way
strip_training_data
works in the model to allow it to be more flexible. The new function is now calledtrained_state_dict
and will return the state of the final trained model. The function accepts arbitrary arguments for further flexibility. - Progress bars now restart correctly when training on iteration. Previously the progress bar would restart at 0 instead of using the last saved iteration.
- Saved checkpoints now have epoch numbers starting at 1 instead of 0.
- Improved running loss system. The model now contains a table of running losses that is automatically updated from the main loss table and is reset at the end of every iteration cycle.
Full Changelog
0.1.2
Updates
- When saving a checkpoint, metadata set by the model wasn't being correctly set. This has now been addressed.
Full Changelog
0.1.1
Updates
- Removes all instances of the name "Pyro" from the code base.
- Replaces the README with RST instead of markdown. Hopefully this should make Pypi render things better.