Releases: nmichlo/disent
v0.4.0
Major Additions
- Added
disent.dataset.DisentIterDataset
to complimentDisentDataset
for datasets without size. - Added
Cars3d64Data
andSmallNorb64Data
todisent.dataset.data
. These classes are optimised versions of their respective datasets that have their transforms pre-computed. This is much faster than resizing the observations during training as most of the disentanglement benchmarks are based off of datasets of width and height: 64x64 - Added
disent.dataset.sampling.GroundTruthRandomWalkSampler
. This ground-truth dataset sampler simulates random walks around the factor space. For example: if there are two ground-truth factorsx
andy
corresponding to a grid, this sampler would simulate an agent randomly moving around the grid. - Improvements to the registry. Augments, reconstruction losses and latent distributions can now be registered with disent using
disent.registry.KERNELS
,disent.registry.RECON_LOSSES
anddisent.registry.LATENT_HANDLERS
. This affects:disent.frameworks.helper.latent_distributions.make_latent_distribution
disent.frameworks.helper.reconstructions.make_reconstruction_loss
disent.dataset.transform._augment.get_kernel
- Refactored
disent.frameworks.DisentFramework
, now also supports PyTorchLightningtraining
,validation
andtest
steps. - Split
Ae
andVae
heirarchy- This is so that we can directly check if a framework is an instance of one or the other. Previously
Vae
was a subclass ofAe
which was unintuitive.
- This is so that we can directly check if a framework is an instance of one or the other. Previously
- Rewrite of the
disent.registry
to make it more intuitive and useful throughoutdisent
. Custom regex resolvers can now also be registered. There are now also different types of registries. Registries now also have examples for each item that can be constructed. Seedisent.registry._registry
for more information.
Other Improvements
- Improvements to
disent.dataset.DisentDataset
:- Added
sampler
,transform
andaugment
properties. - Improved
shallow_copy
andunwrapped_shallow_copy
logic and available arguments. - Can now return the ground-truth factors by specifying
DisentDataset(return_factors=True)
- Improved handling of batches and collating
- Added
- Added
state_space_copy(...)
todisent.dataset.data.GroundTruthData
, this function returns a copy of the underlying state space.disent.dataset.samling
Samplers now store the copy of the state space instead of the original dataset
- Added
sample(...)
todisent.dataset.sampling.BaseDisentSampler
, which is a more explicit alias to the original__call__(...)
method. to_img_tensor_u8
andto_img_tensor_f32
now check the size of the observations before resizing, if the size is unchanged, performance is greatly improved! This affectsToImgTensorF32
andToImgTensorU8
fromdisent.dataset.transform
.- Added
factor_multipliers
property todisent.dataset.util.state_space.StateSpace
which allows custom implementations ofpos_to_idx
andidx_to_pos
. - Added torch math helper functions to:
disent.nn.functional
- including:
torch_norm
,torch_dist
,torch_norm_euclidean
,torch_norm_manhattan
, andtorch_dist_hamming
.
- including:
- Added
triplet_soft_loss
anddist_triplet_soft_loss
totorch.nn.loss.triplet
. - Added more modes to
disent.nn.weights.init_model_weights
. - Added
FixedValueSchedule
andMultiplySchedule
todisent.schedule
. These schedules are useful for setting a constant value throughout a run, and overriding the actually set values in the config. - Added
modify_name_keep_ext
todisent.util.inout.paths
. For adding prefixes or suffixes to files names without affecting the extension. - Added the decorator
try_njit
todisent.util.jit
. This decorator tries to wrap the function withnumba.njit
, otherwise a warning is displayed. Numba should be an optional dependency, it is not specified in the requirements. - Split
disent.util.lightning.callbacks
into separate files.- Added many new features and fixes to these callbacks for the new versions.
- Added
disent.util.math.integer
for computing thegcd
andlcm
with arbitrary precision values. - Added
disent.util.visualise.vis_img
with various features for visualising both tensors and bumpy images.- tensors by default are considered to be in
CHW
format, while numpy arrays are considered to be inHWC
format. These values can be overridden - See
torch_to_images(...)
andnumpy_to_images(...)
for more details. - Other duplicated functions throughout the library will be replaced with these in future.
- tensors by default are considered to be in
Breaking Changes
- Temporarily removed
DSpritesImagenetData
. This dataset contains research code for my MSc and was not intended to be in previous releases. This will be re-added soon. disent.dataset.transform._augment.make_kernel
default scale mode changed to"none"
from"sum"
.- This affects various other locations in the code, including
disent.frameworks.helper.reconstructions.AugmentedReconLossHandler
which uses kernels to augment loss functions.
- This affects various other locations in the code, including
- Split
Ae
andVae
heirarchyVae
is no longer an instance ofAe
.
- Metrics are now instances of
disent.metrics.utils.Metric
.- This callable class can easily be created using the
disent.metrics.utils.make_metric
decorator over existing metric functions. - The purpose of this change is to make metric default arguments self-contained. The
Metric
class has the functionscompute
andcompute_fast
which wrap the underlying decorated function. Arguments can be overridden as usual, however, the two versions when called use different default arguments.
- This callable class can easily be created using the
- Renamed and removed functions inside
disent.util.visualise.vis_latents
Fixes
- Fixed
disent.dataset.sampling.GroundTruthDistSampler
numerical precision error when computing scaled factor distances. Without this fix there is up to 1.5% change of making a sampling error over certain datasets. - Updated
disent.nn.functional._pca
for newer torch versions - Renamed
disent.nn.loss.softsort.torch_soft_sort(...)
parameterdims_at_end
toleave_dims_at_end
. This now matchestorch_soft_rank(...)
. disent.nn.loss.triplet_mining.configured_idx_mine(...)
now exits early if the mode is set to"none"
.
Config Changes
- Removed
augment/basic.yaml
and addedaugment/example.yaml
instead. - Added the config group
run_plugins
which can be used to register a callback that is run by the experiment to register custom items with the disent framework such as new reconstruction losses or kernels. dataset/cars3d.yaml
anddataset/smallnorb.yaml
now point to the optimized 64x64 versions of the datasets by default.- Renamed
disable_decoder
todetach_decoder
inAe
andVae
configs - Removed
disable_posterior_scale
option fromAe
andVae
configs models/*.yaml
now directly point to a model target instead of a separate encoder and decoderrun_callbacks/*.yaml
now directly point to class targets rather than using pre-defined keysrun_logging/*.yaml
now directly point to class targets rather than using pre-defined keys- Rewrite
experiment.run
to be more general. The hydra and experiment functionality can now be called from anywhere or used anywhere.- Ability to register your own config overrides without extending or forking disent has been added. We enable this by adding to the hydra search path. All that a user needs to do is specify the
DISENT_CONFIGS_PREPEND
environment variable to a new config folder. Anything inside this new config folder will recursively take priority over the existingexperiment/config
folder.
- Ability to register your own config overrides without extending or forking disent has been added. We enable this by adding to the hydra search path. All that a user needs to do is specify the
- Rewrite
HydraDataModule
to only accept necessary arguments rather than the raw config. Configs are updated accordingly to specify these parameters directly. - Added
experiment.util.hydra_main
which can be used anywhere to launch a hydra experiment using the disent configs.hydra_main(...)
is used to run an experiment that passes a config to the given callbackpatch_hydra()
can instead be used just to initialise hydra if you want to setup everything yourself. The search path plugin that looks forDISENT_CONFIGS_PREPEND
is registered, as well as various OmegaConf resolvers, including:${exit:<msg>}
register a custom OmegaConf resolver that exits the program if accessed. We can use this to deprecate functionality, or force variables to be overridden!${run_num:<root_dir>}
returns the current experiment number${run_dir:<root_dir>,<name>}
returns the current experiment folder with the name appended${fmt:"{:04d}",42}
returns "0042", the exact same asstr.format
${abspath:<rel_path>}
convert a relative path to an abs path using the original hydra working directory, not the changed experiment dir.${rsync_dir:<src>/<name>,<dst>/<name>}
useful if datasets are already prepared on a shared drive and need to be copied to a temp drive for example!
- Added
experiment.util.path_utils
which adds support for automatically obtaining an experiment number from a directory of number prefixed files. The number returned is the existing maximum number plus one.
Test Changes
- Updated
tests.test_experiment
to use newexperiment.util.hydra_main
functionality - Pickle tests for frameworks
- Tests for torch norm functions
- Registry test fixes
- Extensive tests for new
disent.util.visualize.vis_img
functions and returned datatypes temp_environ
context manager
v0.3.4
Fixes
- Leftover research config values have been fixed, addressing #23. Defaults should now just work locally.
Added
- Frameworks did not implement validation and test functions for data, addressing #22. Schedules may be unintentionally affected by this change if used with test & validation datasets. An issue has been opened to investigate this.
v0.3.3
v0.3.2
Fixes
- Fix
FftKernel
, accidentally forgot to freeze tensor weights. - Fix callbacks logging l1 instead of l2 distance
- Fix callbacks failure if metrics are NaN
- dsprites_imagenet macos prepare fix
Added
run_action=skip
experiment action to just test if hydra is working.- VAEs now log the ratios between different loss terms.
Breaking
experiment.run.hydra_check_cuda
renamed tohydra_get_gpus
. Now returns an integer for the number of GPUs to use. Intended to be passed to a PyTorch Lightning Trainer.- Removed
XYObjectData
warning that things are now different
v0.3.1
Experiment Fixes
run_action=prepare_data
has been fixed
Experiment Additions
- new tests to ensure this continues to work properly
Experiment Changes
- correct action is now chosen via the
experiment.run.run_action(cfg)
methodexperiment.run.train
renamed toaction_train
experiment.run.prepare_data
renamed toaction_prepare_data
- input config is no longer mutated
v0.3.0
This release touches most of the codebase.
Major Additions
- added
XYObjectShadedData
dataset, which is exactly the same asXYObjectData
but the ground truth factors differ. This might be useful for testing how metrics are affected by the ground truth representation of factors. Note that XYObjectData differs from previous versions due to this. - added
DSpritesImagenetData
dataset that is the same asDSpritesData
but masks that background or foreground depending on the mode and replaces the content with deterministic data from tiny-imagenet - added
disent.framework.vae.AdaGVaeMinimal
which is a minimal implementation ofAdaVae
configured to run ingvae
- added
disent.util.lightning.callbacks.VaeGtDistsLoggingCallback
which logs various distances matrices computed from averaged ground truth factor traversals. - Updated experiment files to use hydra 1.1
- can now switch between
train
andprepare_data
modes with the defaults grouprun_action=train
- can now switch between
Other Additions
- added
shallow_copy
todisent.dataset.DisentDataset
enabling a shallow copy of the dataset but overriding specific properties such as the transform - added new
disent.dataset.transform
includingToImgTensorF32
(wasToStandardisedTensor
) andToImgTensorU8
- additions to
H5Builder
add_dataset_from_array
that constructs and fills a dataset in the hdf5 file from an array- converted into context manager instead of manually opening the hdf5 file
- additions to
StateSpace
(and ground truth dataset child classes)normalise_factor_idx
convert names of ground truth factors into the numerical valuenormalise_factor_idxs
convert a name, an idx, lists of names, or lists of idxs to the numerical values of the ground truth factors.
disent.dataset.util.stats
addedcompute_data_mean_std(data)
to compute the mean and std of datasets- added
disent.schedule.SingleSchedule
- improved
disent.util.deprecate.deprecated
, now prints the stack trace for the call location of the deprecated function by default. This can be disabled. - added
restart
method todisent.util.profiling.Timer
for easy use within a loop - added
disent.util.vizualize.plot
which contains various matplotlib helper code used throughout the library and PyTorch lightning callbacks.
Breaking Changes
- removed confusing
observation_shape
andobs_shape
properties fromGroundTruthData
and any child classes. Any methods that require these properties across disent had their names update too. For example theArrayGroundTruthData
class now takesx_shape
.observation_shape
(H, W, C)
should be replaced withimg_shape
, you will need to update your overrides in child classesobs_shape
(C, H, W)
should be replaced withx_shape
XYObjectData
default parameters updated forXYObjectShadedData
, dataset and colour palettes differs slightly from previous versions.- moved module
disent.nn.transform
todisent.dataset.transform
- renamed
ToStandardisedTensor
toToImgTensorF32
- renamed
H5Builder
converted into context manager, similar API toopen
orh5py.File
ReconLossHandlerMse
changed to not scale or centre the output, this is because we now normalise the data instead which is more correctAdaVae
and inheriting classes have various functions renamed for claritydisent.metrics
functions haveground_truth_dataset
parameter renamed todataset
disent.model.ae
renamedDecoderTest
andEncoderTest
toDecoderLinear
andEncoderLinear
disent.registry
updated registry to use new more simple class structure and format. Some variables have been renamed, and registry names have been changed to plurals, eg.OPTIMIZER
is nowOPTIMIZERS
disent.schedule
cleaned up- renamed various variables and parameters
min_step
->start_step
,max_step
->end_step
- removed
disent.schedule.lerp.scale()
function, as it is the same aslerp
just not clipped
- renamed various variables and parameters
disent.util.lightning.callbacks.VaeDisentanglementLoggingCallback
renamed toVaeMetricLoggingCallback
docs.examples
updated to use newXYObjectData
version andToImgTensorF32
transform
Deprecations
- deprecated
ground_truth_data
property onDisentDataset
, this should be replaced with the shortergt_data
property. References toground_truth_data
have been replaced in disent.
Fixes
- Fixed
Mpi3dData
datasets, and added file hashes - Updated requirements
- Many minor fixes, usability and error message improvements
Hydra Experiment Changes
Hydra Config has finally been updated from version 1.0 to 1.1, adding support for recursive defaults and recursive instantiation. This allows is to remove all of our custom & hacky hydra helper code that previously enabled these features.
- hydra now supports recursive instantiation
- value based specialisation can now be done with recursive defaults using dummy groups
Updating hydra was a good opportunity to re-structure the configuration format.
- All settings defined in the root config that are referenced elsewhere are now in the
settings
key. - Default settings defined in various subgroups that are referenced elsewhere are often placed in the
dsettings
key. - Keys for various objects were renamed for clarity, eg.
augment.transform
was renamed toaugment.augment_cls
- All datasets now require the
meta.vis_mean
andmeta.vis_std
keys that are used both to normalise the dataset and used to re-scale it between [0, 1] for visualisation during training.
Every config file has been touched, the best approach is probably to look at the new system. The general structure remains the same, but the recursive defaults from Hydra 1.1 allows us to implement various things in a more clean way.
- new defaults group
run_launcher
to easily swap betweenslurm
andlocal
- defaults group
run_location
only specifies machine resources and paths - new defaults group
sampling
specifies details and the sampling strategy to be used by the frameworks - new defaults group
run_action
to switch betweentraining
and downloading and installing datasetsprepare_data
v0.2.1
Under the hood, quite a lot of code has been added or changed for this release, however the API remains very much the same.
Additions
- Wrapped datasets, instances of
disent.dataset.wrapper.WrappedDataset
are datasets that have some sort of mask applied to them that hides the true state space and resizes the dataset.disent.dataset.wrapper.DitheredDataset
applies an n-dimensional dithering operation to ground truth factorsdisent.dataset.wrapper.MaskedDataset
applies some provided boolean mask over the dataset
disent.dataset.DisentDataset
now supports wrapped datasets (instances ofdisent.dataset.wrapper.WrappedDataset
). New methods and properties have been added to compliment this feature:is_wrapped_data
check if there is wrapped datais_wrapped_gt_data
check if there is wrapped data and the wrapped data is ground truth datawrapped_data
obtain the wrapped data, otherwise throw an errorwrapped_gt_data
obtain the wrapped ground truth data, otherwise throw an errorunwrapped_disent_dataset
creates a copy of the disent dataset with everything the same, except the data is unwrapped.
disent.util.lightning.callbacks
additions- Support for wrapped datasets. They automatically try to unwrap them to obtain the ground truth data which can be used to compute metrics and perform visualisations.
- Support model output scaling to a certain range of values, fixing visualisations when using
VaeLatentCycleLoggingCallback
- new utilities
disent.util.math.dither
disent.util.math.random
- Self contained HDF5 ground-truth datasets. These store all the information needed to construct the dataset and state space in one file, including the factor names.
- Added
disent.dataset.data.SelfContainedHdf5GroundTruthData
to read these files - Added
disent.dataset.util.H5Builder
for creating these files. (API is not yet finalised)
- Added
disent.dataset.util.StateSpace
added helper functioniter_traversal_indices
disent.nn.transform
addedToUint8Tensor
which acts likeToStandardisedTensor
, but instead of loading images asfloat32
, it loads them asuint8
. This is useful when you need to use datasets outside of a ML Model context, eg. performing analysis. This takes up less memory.- corresponding functional version exists
to_uint_tensor
complimentingto_standardised_tensor
- corresponding functional version exists
- Begun work on a component & function registry, although do not use this as the API will change significantly.
API Breakages
- Under the hood, implementing wrapped data and
DisentDataset
copying requires the ability to copy samplers, so each sampler implementation should have theuninit_copy
method implemented too. ArrayGroundTruthData
is more strict about theobservation_shape
must be(H, W, C)
or(C, H, W)
depending onarray_chn_is_last
- Removed reconstruction losses:
ReconLossHandlerMse4
aka."mse4"
ReconLossHandlerMae2
aka."mae2"
- Renamed
disent.util.visualize.get_factor_traversal
toget_idx_traversal
Deprecations
GroundTruthData
property aliases:img_shape
new property for the deprecatedobservation_shape
obs_shape
new property for the deprecatedx_shape
img_channels
new property for the number of channels in the image
Fixes
disent.util.inout.files.AtomicSaveFile
minor fix to overwriting filesdisent.util.lightning.callbacks.LoggerProgressCallback
fix to datatypes and potential crashes due to floats- More stable experiment runs when performing sweeps. Better error handling, error messages and error catching.
- fixes to the various
requirement*.txt
files - many other minor fixes
v0.2.0
API Breakages
DisentFramework
no longer takes inmake_optimizer_fn
callback, but instead includes this as part of thecfg
by specifyingoptimizer
andoptimizer_kwargs
.Ae
derived subclasses now take in an instantiatedAutoEncoder
instance to themodel
param instead of themake_model_fn
callback.
Additions
DisentDataset
can now return observation indices in the"idx"
field ifreturn_indices=True
sample_random_obs_traversal
added toGroundTruthData
- new basic experiment test
Chages
- python 3.8 and 3.9 support (3.7 is unsupported due to missing standard library typing features)
TempNumpySeed
now inherits fromcontextlib.ContextDecorator
- updated
hydra-core
to1.0.7
Fixes
SmallNorbData
by default now returns observations of size(96, 96, 1)
instead of(96, 96)
- Removed
Deprecated
dependency which also couldn't be pickled, fixing hydra submittit issues LoggerProgressCallback
displays more reliable information and now supports PyTorch Lightning 1.4HydraDataModule
now supports PyTorch Lightning 1.4merge_specializations
fixed to depend on OmegaConf not Hydra
v0.1.0 - Initial Release
Initial Release
Overview
The initial release of Disent
- please see the docs and readme for new usage examples, changes should be easy to make to existing code, notably the
DisentDataset
andDisentSampler
changes.
Changes
- Replaced sampling datasets with one common class
disent.dataset.DisentDataset
- Wraps other datasets (
torch.utils.data.Dataset
ordisent.dataset.data.GroundTruthData
) - Accepts an implemented subclass of
disent.dataset.sampling.BaseDisentSampler
which controls how many observations are sampled and returned (eg. for triplet networks). - eg.
disent.dataset.groundtruth.GroundTruthDatasetPairs
is nowdisent.dataset.sampling.GroundTruthPairSampler
- Wraps other datasets (
- Removed all experimental code & features unique to Disent. Hydra configs and runners for non-experimental features remain. These features will be cleaned up and re-added once I submit my dissertation.
- ❌ experimental frameworks
- ❌ experimental datasets
- ❌ experimental metrics
- ❌ experimental models
- ❌ experimental augmentations
- ❌ experiment files
-
Verified models
- some models had potentially diverged from their original implementations and papers.
- Added a new test model: EncoderTest & DecoderTest
-
disent.nn
Changes:- Added
disent.nn.activations.Swish
- Removed loss reduction mode
"sum"
indisent.nn.loss.reduction
- Split out triplet mining logic from frameworks into
torch.nn.loss.triplet_mining
- Replaced
from torch.nn.modules import BatchView, Unsqueeze3D, Flatten3D
with pytorch 1.9 equivalents - Backwards compatible opt-in
disent.nn.transform.ToStandardisedTensor
enhancements
- Added
-
disent.util
Refactor, grouping logic into submodules:disent.util.inout
: utilities for working with paths, files and saving files.disent.util.lightning
: various helper functions and callbacks for pytorch lightning, some incorperated from past experiment files.disent.util.strings
: utilities for working with strings and ansi escape codesdisent.util.visualise
: moveddisent.visualise
into this module, separating framework logic and helper logic in disent.
-
Cleaned up
requirements.txt
- optional requirements moved into:
requirements-test.txt
andrequirements-exp.txt
- optional requirements moved into:
-
New tests
- samplers
- models
-
And a many bug-fixes
v0.0.1.dev14
Overview
This release is mostly a large set of refactors, and reproducibility improvements with regards to seeds and datasets.
Notable Changes
- Data now relies on
disent.data.datafile.DataFile
s, which are deterministic, hash and cache based, file generators that can fetch or pre-process data. - Added
XYSquaresMinimalData
, which is a minimal faster version ofXYSquaresData
without any configuration options. With default parameters, data fromXYSquaresData
should equalXYSquaresMinimalData
- Added
PickleH5pyFile
that can pickle an hdf5 file and dataset. This is intended to be used withtorch
DataLoader
s or multiprocessing.
Definitely Breaking Changes
-
renamed classes:
- renamed
AugmentableDataset
toDisentDataset
- renamed
BaseFramework
toDisentFramework
- renamed
BaseEncoderModule
toDisentEncoder
- renamed
BaseDecoderModule
toDisentDecoder
- renamed
-
consolidated maths and helper functions into new submodule
disent.nn
disent.nn.weights
initialisation functions from originallydisent.model.init
disent.nn.modules
basic modules from various locations includingDisentModule
,DisentLightningModule
,BatchView
,Unsqueeze3D
,Flatten3D
disent.nn.transform
transform and augment functions and classes fromdisent.transform
, still needs to be cleaned up in future releases.disent.nn.loss
various loss functions from other places includingtriplet
,kl
,softsort
andreduction
modulestorch.nn.functional
various differentiable torch helper functions mostly fromdisent.util.math
, including functions for computing the Covariance, Correlation, Generalised Mean, PCA, DCT, Channel-Wise convolutions and more! Some functions such as kernel generation need to be moved out of here.
-
split up and consolidated utilities:
disent.util.cache
caching utilities including thestalefile
decorator that only runs the wrapped function if the specified file is stale (hash does not match, or file does not exist)disent.util.colors
ANSI escape codesdisent.util.function
wrapper, decorator and inspect utilitiesdisent.util.hashing
compute thefull
hash of a file or afast
hash based on the README for the imohash algorithm.disent.util.in_out
originally fromdisent.data.util
for handling file retrieval/downloading/copying and savingdisent.util.iters
general iterators or map functions, includingiter_chunks
anditer_rechunk
disent.util.paths
path handling and file or directory managementdisent.util.profiling
timers & memory usagedisent.util.seeds
seed management contexts and functionsdisent.util.strings
string formatting helper functions
-
removed and cleaned up functions from:
disent.data.hdf5
disent.dataset.__init__
disent.util.__init__
disent.schedule.lerp
renamedactivate
toscale_ratio
and removed other functions.
Other Changes
- Replaced
GroundTruthData
specialisations with general loading fromDataFile
s. StateSpace
now storesfactor_names
instead ofGroundTruthData
- preparing for rewrite of datasets to use dependency injections and samplers.
Experiment Config & Runner Changes
- Many config fixes for refactors
- Experiment can now be seeded
New Tests
- test
PickleH5pyFile
multiprocessing support - test
XYSquaresData
andXYSquaresMinimalData
similarity