31 Mar 10:32

nmichlo

9c7fe40

v0.4.0

Major Additions

Added disent.dataset.DisentIterDataset to compliment DisentDataset for datasets without size.
Added Cars3d64Data and SmallNorb64Data to disent.dataset.data. These classes are optimised versions of their respective datasets that have their transforms pre-computed. This is much faster than resizing the observations during training as most of the disentanglement benchmarks are based off of datasets of width and height: 64x64
Added disent.dataset.sampling.GroundTruthRandomWalkSampler. This ground-truth dataset sampler simulates random walks around the factor space. For example: if there are two ground-truth factors x and y corresponding to a grid, this sampler would simulate an agent randomly moving around the grid.
Improvements to the registry. Augments, reconstruction losses and latent distributions can now be registered with disent using disent.registry.KERNELS, disent.registry.RECON_LOSSES and disent.registry.LATENT_HANDLERS. This affects:
- disent.frameworks.helper.latent_distributions.make_latent_distribution
- disent.frameworks.helper.reconstructions.make_reconstruction_loss
- disent.dataset.transform._augment.get_kernel
Refactored disent.frameworks.DisentFramework, now also supports PyTorchLightning training, validation and test steps.
Split Ae and Vae heirarchy
- This is so that we can directly check if a framework is an instance of one or the other. Previously Vae was a subclass of Ae which was unintuitive.
Rewrite of the disent.registry to make it more intuitive and useful throughout disent. Custom regex resolvers can now also be registered. There are now also different types of registries. Registries now also have examples for each item that can be constructed. See disent.registry._registry for more information.

Other Improvements

Improvements to disent.dataset.DisentDataset:
- Added sampler, transform and augment properties.
- Improved shallow_copy and unwrapped_shallow_copy logic and available arguments.
- Can now return the ground-truth factors by specifying DisentDataset(return_factors=True)
- Improved handling of batches and collating
Added state_space_copy(...) to disent.dataset.data.GroundTruthData, this function returns a copy of the underlying state space.
- disent.dataset.samling Samplers now store the copy of the state space instead of the original dataset
Added sample(...) to disent.dataset.sampling.BaseDisentSampler, which is a more explicit alias to the original __call__(...) method.
to_img_tensor_u8 and to_img_tensor_f32 now check the size of the observations before resizing, if the size is unchanged, performance is greatly improved! This affects ToImgTensorF32 and ToImgTensorU8 from disent.dataset.transform.
Added factor_multipliers property to disent.dataset.util.state_space.StateSpace which allows custom implementations of pos_to_idx and idx_to_pos.
Added torch math helper functions to: disent.nn.functional
- including: torch_norm, torch_dist, torch_norm_euclidean, torch_norm_manhattan, and torch_dist_hamming.
Added triplet_soft_loss and dist_triplet_soft_loss to torch.nn.loss.triplet.
Added more modes to disent.nn.weights.init_model_weights.
Added FixedValueSchedule and MultiplySchedule to disent.schedule. These schedules are useful for setting a constant value throughout a run, and overriding the actually set values in the config.
Added modify_name_keep_ext to disent.util.inout.paths. For adding prefixes or suffixes to files names without affecting the extension.
Added the decorator try_njit to disent.util.jit. This decorator tries to wrap the function with numba.njit, otherwise a warning is displayed. Numba should be an optional dependency, it is not specified in the requirements.
Split disent.util.lightning.callbacks into separate files.
- Added many new features and fixes to these callbacks for the new versions.
Added disent.util.math.integer for computing the gcd and lcm with arbitrary precision values.
Added disent.util.visualise.vis_img with various features for visualising both tensors and bumpy images.
- tensors by default are considered to be in CHW format, while numpy arrays are considered to be in HWC format. These values can be overridden
- See torch_to_images(...) and numpy_to_images(...) for more details.
- Other duplicated functions throughout the library will be replaced with these in future.

Breaking Changes

Temporarily removed DSpritesImagenetData. This dataset contains research code for my MSc and was not intended to be in previous releases. This will be re-added soon.
disent.dataset.transform._augment.make_kernel default scale mode changed to "none" from "sum".
- This affects various other locations in the code, including disent.frameworks.helper.reconstructions.AugmentedReconLossHandler which uses kernels to augment loss functions.
Split Ae and Vae heirarchy
- Vae is no longer an instance of Ae.
Metrics are now instances of disent.metrics.utils.Metric.
- This callable class can easily be created using the disent.metrics.utils.make_metric decorator over existing metric functions.
- The purpose of this change is to make metric default arguments self-contained. The Metric class has the functions compute and compute_fast which wrap the underlying decorated function. Arguments can be overridden as usual, however, the two versions when called use different default arguments.
Renamed and removed functions inside disent.util.visualise.vis_latents

Fixes

Fixed disent.dataset.sampling.GroundTruthDistSampler numerical precision error when computing scaled factor distances. Without this fix there is up to 1.5% change of making a sampling error over certain datasets.
Updated disent.nn.functional._pca for newer torch versions
Renamed disent.nn.loss.softsort.torch_soft_sort(...) parameter dims_at_end to leave_dims_at_end. This now matches torch_soft_rank(...).
disent.nn.loss.triplet_mining.configured_idx_mine(...) now exits early if the mode is set to "none".

Config Changes

Removed augment/basic.yaml and added augment/example.yaml instead.
Added the config group run_plugins which can be used to register a callback that is run by the experiment to register custom items with the disent framework such as new reconstruction losses or kernels.
dataset/cars3d.yaml and dataset/smallnorb.yaml now point to the optimized 64x64 versions of the datasets by default.
Renamed disable_decoder to detach_decoder in Ae and Vae configs
Removed disable_posterior_scale option from Ae and Vae configs
models/*.yaml now directly point to a model target instead of a separate encoder and decoder
run_callbacks/*.yaml now directly point to class targets rather than using pre-defined keys
run_logging/*.yaml now directly point to class targets rather than using pre-defined keys
Rewrite experiment.run to be more general. The hydra and experiment functionality can now be called from anywhere or used anywhere.
- Ability to register your own config overrides without extending or forking disent has been added. We enable this by adding to the hydra search path. All that a user needs to do is specify the DISENT_CONFIGS_PREPEND environment variable to a new config folder. Anything inside this new config folder will recursively take priority over the existing experiment/config folder.
Rewrite HydraDataModule to only accept necessary arguments rather than the raw config. Configs are updated accordingly to specify these parameters directly.
Added experiment.util.hydra_main which can be used anywhere to launch a hydra experiment using the disent configs.
- hydra_main(...) is used to run an experiment that passes a config to the given callback
- patch_hydra() can instead be used just to initialise hydra if you want to setup everything yourself. The search path plugin that looks for DISENT_CONFIGS_PREPEND is registered, as well as various OmegaConf resolvers, including:
  - ${exit:<msg>} register a custom OmegaConf resolver that exits the program if accessed. We can use this to deprecate functionality, or force variables to be overridden!
  - ${run_num:<root_dir>} returns the current experiment number
  - ${run_dir:<root_dir>,<name>} returns the current experiment folder with the name appended
  - ${fmt:"{:04d}",42} returns "0042", the exact same as str.format
  - ${abspath:<rel_path>} convert a relative path to an abs path using the original hydra working directory, not the changed experiment dir.
  - ${rsync_dir:<src>/<name>,<dst>/<name>} useful if datasets are already prepared on a shared drive and need to be copied to a temp drive for example!
Added experiment.util.path_utils which adds support for automatically obtaining an experiment number from a directory of number prefixed files. The number returned is the existing maximum number plus one.

Test Changes

Updated tests.test_experiment to use new experiment.util.hydra_main functionality
Pickle tests for frameworks
Tests for torch norm functions
Registry test fixes
Extensive tests for new disent.util.visualize.vis_img functions and returned datatypes
temp_environ context manager

Assets 2

06 Feb 11:05

nmichlo

v0.3.4

5695747

v0.3.4

Fixes

Leftover research config values have been fixed, addressing #23. Defaults should now just work locally.

Added

Frameworks did not implement validation and test functions for data, addressing #22. Schedules may be unintentionally affected by this change if used with test & validation datasets. An issue has been opened to investigate this.

Assets 2

28 Nov 23:14

nmichlo

v0.3.3

c4fb871

v0.3.3

Fixes

disent.util.math was not a module, added empty __init__.py file

Assets 2

22 Nov 12:21

nmichlo

v0.3.2

8a53282

v0.3.2

Fixes

Fix FftKernel, accidentally forgot to freeze tensor weights.
Fix callbacks logging l1 instead of l2 distance
Fix callbacks failure if metrics are NaN
dsprites_imagenet macos prepare fix

Added

run_action=skip experiment action to just test if hydra is working.
VAEs now log the ratios between different loss terms.

Breaking

experiment.run.hydra_check_cuda renamed to hydra_get_gpus. Now returns an integer for the number of GPUs to use. Intended to be passed to a PyTorch Lightning Trainer.
Removed XYObjectData warning that things are now different

Assets 2

11 Nov 10:14

nmichlo

v0.3.1

82cd508

v0.3.1

Experiment Fixes

run_action=prepare_data has been fixed

Experiment Additions

new tests to ensure this continues to work properly

Experiment Changes

correct action is now chosen via the experiment.run.run_action(cfg) method
- experiment.run.train renamed to action_train
- experiment.run.prepare_data renamed to action_prepare_data
input config is no longer mutated

Assets 2

11 Nov 09:12

nmichlo

v0.3.0

3276d57

v0.3.0

This release touches most of the codebase.

Major Additions

added XYObjectShadedData dataset, which is exactly the same as XYObjectData but the ground truth factors differ. This might be useful for testing how metrics are affected by the ground truth representation of factors. Note that XYObjectData differs from previous versions due to this.
added DSpritesImagenetData dataset that is the same as DSpritesData but masks that background or foreground depending on the mode and replaces the content with deterministic data from tiny-imagenet
added disent.framework.vae.AdaGVaeMinimal which is a minimal implementation of AdaVae configured to run in gvae
added disent.util.lightning.callbacks.VaeGtDistsLoggingCallback which logs various distances matrices computed from averaged ground truth factor traversals.
Updated experiment files to use hydra 1.1
- can now switch between train and prepare_data modes with the defaults group run_action=train

Other Additions

added shallow_copy to disent.dataset.DisentDataset enabling a shallow copy of the dataset but overriding specific properties such as the transform
added new disent.dataset.transform including ToImgTensorF32 (was ToStandardisedTensor ) and ToImgTensorU8
additions to H5Builder
- add_dataset_from_array that constructs and fills a dataset in the hdf5 file from an array
- converted into context manager instead of manually opening the hdf5 file
additions to StateSpace (and ground truth dataset child classes)
- normalise_factor_idx convert names of ground truth factors into the numerical value
- normalise_factor_idxs convert a name, an idx, lists of names, or lists of idxs to the numerical values of the ground truth factors.
disent.dataset.util.stats added compute_data_mean_std(data) to compute the mean and std of datasets
added disent.schedule.SingleSchedule
improved disent.util.deprecate.deprecated, now prints the stack trace for the call location of the deprecated function by default. This can be disabled.
added restart method to disent.util.profiling.Timer for easy use within a loop
added disent.util.vizualize.plot which contains various matplotlib helper code used throughout the library and PyTorch lightning callbacks.

Breaking Changes

removed confusing observation_shape and obs_shape properties from GroundTruthData and any child classes. Any methods that require these properties across disent had their names update too. For example the ArrayGroundTruthData class now takes x_shape.
- observation_shape (H, W, C) should be replaced with img_shape, you will need to update your overrides in child classes
- obs_shape (C, H, W) should be replaced with x_shape
XYObjectData default parameters updated for XYObjectShadedData , dataset and colour palettes differs slightly from previous versions.
moved module disent.nn.transform to disent.dataset.transform
- renamed ToStandardisedTensor to ToImgTensorF32
H5Builder converted into context manager, similar API to open or h5py.File
ReconLossHandlerMse changed to not scale or centre the output, this is because we now normalise the data instead which is more correct
AdaVae and inheriting classes have various functions renamed for clarity
disent.metrics functions have ground_truth_dataset parameter renamed to dataset
disent.model.ae renamed DecoderTest and EncoderTest to DecoderLinear and EncoderLinear
disent.registry updated registry to use new more simple class structure and format. Some variables have been renamed, and registry names have been changed to plurals, eg. OPTIMIZER is now OPTIMIZERS
disent.schedule cleaned up
- renamed various variables and parameters min_step -> start_step, max_step -> end_step
- removed disent.schedule.lerp.scale() function, as it is the same as lerp just not clipped
disent.util.lightning.callbacks.VaeDisentanglementLoggingCallback renamed to VaeMetricLoggingCallback
docs.examples updated to use new XYObjectData version and ToImgTensorF32 transform

Deprecations

deprecated ground_truth_data property on DisentDataset , this should be replaced with the shorter gt_data property. References to ground_truth_data have been replaced in disent.

Fixes

Fixed Mpi3dData datasets, and added file hashes
Updated requirements
Many minor fixes, usability and error message improvements

Hydra Experiment Changes

Hydra Config has finally been updated from version 1.0 to 1.1, adding support for recursive defaults and recursive instantiation. This allows is to remove all of our custom & hacky hydra helper code that previously enabled these features.

hydra now supports recursive instantiation
value based specialisation can now be done with recursive defaults using dummy groups

Updating hydra was a good opportunity to re-structure the configuration format.

All settings defined in the root config that are referenced elsewhere are now in the settings key.
Default settings defined in various subgroups that are referenced elsewhere are often placed in the dsettings key.
Keys for various objects were renamed for clarity, eg. augment.transform was renamed to augment.augment_cls
All datasets now require the meta.vis_mean and meta.vis_std keys that are used both to normalise the dataset and used to re-scale it between [0, 1] for visualisation during training.

Every config file has been touched, the best approach is probably to look at the new system. The general structure remains the same, but the recursive defaults from Hydra 1.1 allows us to implement various things in a more clean way.

new defaults group run_launcher to easily swap between slurm and local
defaults group run_location only specifies machine resources and paths
new defaults group sampling specifies details and the sampling strategy to be used by the frameworks
new defaults group run_action to switch between training and downloading and installing datasets prepare_data

Assets 2

04 Oct 23:53

nmichlo

v0.2.1

9bdd81d

v0.2.1

Under the hood, quite a lot of code has been added or changed for this release, however the API remains very much the same.

Additions

Wrapped datasets, instances of disent.dataset.wrapper.WrappedDataset are datasets that have some sort of mask applied to them that hides the true state space and resizes the dataset.
- disent.dataset.wrapper.DitheredDataset applies an n-dimensional dithering operation to ground truth factors
- disent.dataset.wrapper.MaskedDataset applies some provided boolean mask over the dataset
disent.dataset.DisentDataset now supports wrapped datasets (instances of disent.dataset.wrapper.WrappedDataset). New methods and properties have been added to compliment this feature:
- is_wrapped_data check if there is wrapped data
- is_wrapped_gt_data check if there is wrapped data and the wrapped data is ground truth data
- wrapped_data obtain the wrapped data, otherwise throw an error
- wrapped_gt_data obtain the wrapped ground truth data, otherwise throw an error
- unwrapped_disent_dataset creates a copy of the disent dataset with everything the same, except the data is unwrapped.
disent.util.lightning.callbacks additions
- Support for wrapped datasets. They automatically try to unwrap them to obtain the ground truth data which can be used to compute metrics and perform visualisations.
- Support model output scaling to a certain range of values, fixing visualisations when using VaeLatentCycleLoggingCallback
new utilities
- disent.util.math.dither
- disent.util.math.random
Self contained HDF5 ground-truth datasets. These store all the information needed to construct the dataset and state space in one file, including the factor names.
- Added disent.dataset.data.SelfContainedHdf5GroundTruthData to read these files
- Added disent.dataset.util.H5Builder for creating these files. (API is not yet finalised)
disent.dataset.util.StateSpace added helper function iter_traversal_indices
disent.nn.transform added ToUint8Tensor which acts like ToStandardisedTensor, but instead of loading images as float32, it loads them as uint8. This is useful when you need to use datasets outside of a ML Model context, eg. performing analysis. This takes up less memory.
- corresponding functional version exists to_uint_tensor complimenting to_standardised_tensor
Begun work on a component & function registry, although do not use this as the API will change significantly.

API Breakages

Under the hood, implementing wrapped data and DisentDataset copying requires the ability to copy samplers, so each sampler implementation should have the uninit_copy method implemented too.
ArrayGroundTruthData is more strict about the observation_shape must be (H, W, C) or (C, H, W) depending on array_chn_is_last
Removed reconstruction losses:
- ReconLossHandlerMse4 aka. "mse4"
- ReconLossHandlerMae2 aka. "mae2"
Renamed disent.util.visualize.get_factor_traversal to get_idx_traversal

Deprecations

GroundTruthData property aliases:
- img_shape new property for the deprecated observation_shape
- obs_shape new property for the deprecated x_shape
- img_channels new property for the number of channels in the image

Fixes

disent.util.inout.files.AtomicSaveFile minor fix to overwriting files
disent.util.lightning.callbacks.LoggerProgressCallback fix to datatypes and potential crashes due to floats
More stable experiment runs when performing sweeps. Better error handling, error messages and error catching.
fixes to the various requirement*.txt files
many other minor fixes

Assets 2

04 Oct 14:34

nmichlo

v0.2.0

6717851

v0.2.0

API Breakages

DisentFramework no longer takes in make_optimizer_fn callback, but instead includes this as part of the cfg by specifying optimizer and optimizer_kwargs.
Ae derived subclasses now take in an instantiated AutoEncoder instance to the model param instead of the make_model_fn callback.

Additions

DisentDataset can now return observation indices in the "idx" field if return_indices=True
sample_random_obs_traversal added to GroundTruthData
new basic experiment test

Chages

python 3.8 and 3.9 support (3.7 is unsupported due to missing standard library typing features)
TempNumpySeed now inherits from contextlib.ContextDecorator
updated hydra-core to 1.0.7

Fixes

SmallNorbData by default now returns observations of size (96, 96, 1) instead of (96, 96)
Removed Deprecated dependency which also couldn't be pickled, fixing hydra submittit issues
LoggerProgressCallback displays more reliable information and now supports PyTorch Lightning 1.4
HydraDataModule now supports PyTorch Lightning 1.4
merge_specializations fixed to depend on OmegaConf not Hydra

Assets 2

28 Jul 12:43

nmichlo

v0.1.0

7f7d757

v0.1.0 - Initial Release

Initial Release

Overview

The initial release of Disent

please see the docs and readme for new usage examples, changes should be easy to make to existing code, notably the DisentDataset and DisentSampler changes.

Changes

Replaced sampling datasets with one common class disent.dataset.DisentDataset
- Wraps other datasets (torch.utils.data.Dataset or disent.dataset.data.GroundTruthData)
- Accepts an implemented subclass of disent.dataset.sampling.BaseDisentSampler which controls how many observations are sampled and returned (eg. for triplet networks).
- eg. disent.dataset.groundtruth.GroundTruthDatasetPairs is now disent.dataset.sampling.GroundTruthPairSampler

Removed all experimental code & features unique to Disent. Hydra configs and runners for non-experimental features remain. These features will be cleaned up and re-added once I submit my dissertation.
- ❌ experimental frameworks
- ❌ experimental datasets
- ❌ experimental metrics
- ❌ experimental models
- ❌ experimental augmentations
- ❌ experiment files

Verified models
- some models had potentially diverged from their original implementations and papers.
- Added a new test model: EncoderTest & DecoderTest
disent.nn Changes:
- Added disent.nn.activations.Swish
- Removed loss reduction mode "sum" in disent.nn.loss.reduction
- Split out triplet mining logic from frameworks into torch.nn.loss.triplet_mining
- Replaced from torch.nn.modules import BatchView, Unsqueeze3D, Flatten3D with pytorch 1.9 equivalents
- Backwards compatible opt-in disent.nn.transform.ToStandardisedTensor enhancements
disent.util Refactor, grouping logic into submodules:
- disent.util.inout: utilities for working with paths, files and saving files.
- disent.util.lightning: various helper functions and callbacks for pytorch lightning, some incorperated from past experiment files.
- disent.util.strings: utilities for working with strings and ansi escape codes
- disent.util.visualise: moved disent.visualise into this module, separating framework logic and helper logic in disent.
Cleaned up requirements.txt
- optional requirements moved into: requirements-test.txt and requirements-exp.txt
New tests
- samplers
- models
And a many bug-fixes

Assets 2

04 Jun 23:08

nmichlo

v0.0.1.dev14

b2663a1

v0.0.1.dev14 Pre-release

Pre-release

Overview

This release is mostly a large set of refactors, and reproducibility improvements with regards to seeds and datasets.

Notable Changes

Data now relies on disent.data.datafile.DataFiles, which are deterministic, hash and cache based, file generators that can fetch or pre-process data.
Added XYSquaresMinimalData, which is a minimal faster version of XYSquaresData without any configuration options. With default parameters, data from XYSquaresData should equal XYSquaresMinimalData
Added PickleH5pyFile that can pickle an hdf5 file and dataset. This is intended to be used with torch DataLoaders or multiprocessing.

Definitely Breaking Changes

renamed classes:
- renamed AugmentableDataset to DisentDataset
- renamed BaseFramework to DisentFramework
- renamed BaseEncoderModule to DisentEncoder
- renamed BaseDecoderModule to DisentDecoder
consolidated maths and helper functions into new submodule disent.nn
- disent.nn.weights initialisation functions from originally disent.model.init
- disent.nn.modules basic modules from various locations including DisentModule, DisentLightningModule, BatchView, Unsqueeze3D, Flatten3D
- disent.nn.transform transform and augment functions and classes from disent.transform, still needs to be cleaned up in future releases.
- disent.nn.loss various loss functions from other places including triplet, kl, softsort and reduction modules
- torch.nn.functional various differentiable torch helper functions mostly from disent.util.math, including functions for computing the Covariance, Correlation, Generalised Mean, PCA, DCT, Channel-Wise convolutions and more! Some functions such as kernel generation need to be moved out of here.
split up and consolidated utilities:
- disent.util.cache caching utilities including the stalefile decorator that only runs the wrapped function if the specified file is stale (hash does not match, or file does not exist)
- disent.util.colors ANSI escape codes
- disent.util.function wrapper, decorator and inspect utilities
- disent.util.hashing compute the full hash of a file or a fast hash based on the README for the imohash algorithm.
- disent.util.in_out originally from disent.data.util for handling file retrieval/downloading/copying and saving
- disent.util.iters general iterators or map functions, including iter_chunks and iter_rechunk
- disent.util.paths path handling and file or directory management
- disent.util.profiling timers & memory usage
- disent.util.seeds seed management contexts and functions
- disent.util.strings string formatting helper functions
removed and cleaned up functions from:
- disent.data.hdf5
- disent.dataset.__init__
- disent.util.__init__
- disent.schedule.lerp renamed activate to scale_ratio and removed other functions.

Other Changes

Replaced GroundTruthData specialisations with general loading from DataFiles.
StateSpace now stores factor_names instead of GroundTruthData - preparing for rewrite of datasets to use dependency injections and samplers.

Experiment Config & Runner Changes

Many config fixes for refactors
Experiment can now be seeded

New Tests

test PickleH5pyFile multiprocessing support
test XYSquaresData and XYSquaresMinimalData similarity

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major Additions

Other Improvements

Breaking Changes

Fixes

Config Changes

Test Changes

Fixes

Added

Fixes

Fixes

Added

Breaking

Experiment Fixes

Experiment Additions

Experiment Changes

Major Additions

Other Additions

Breaking Changes

Deprecations

Fixes

Hydra Experiment Changes

Initial Release

Overview

Changes

Overview

Notable Changes

Definitely Breaking Changes

Other Changes

Experiment Config & Runner Changes

New Tests

Releases: nmichlo/disent

v0.4.0

Major Additions

Other Improvements

Breaking Changes

Fixes

Config Changes

Test Changes

v0.3.4

Fixes

Added

v0.3.3

Fixes

v0.3.2

Fixes

Added

Breaking

v0.3.1

Experiment Fixes

Experiment Additions

Experiment Changes

v0.3.0

Major Additions

Other Additions

Breaking Changes

Deprecations

Fixes

Hydra Experiment Changes

v0.2.1

v0.2.0

v0.1.0 - Initial Release

Initial Release

Overview

Changes

v0.0.1.dev14

Overview

Notable Changes

Definitely Breaking Changes

Other Changes

Experiment Config & Runner Changes

New Tests