Skip to content

Releases: nmichlo/disent

v0.4.0

31 Mar 10:32
Compare
Choose a tag to compare

Major Additions

  • Added disent.dataset.DisentIterDataset to compliment DisentDataset for datasets without size.
  • Added Cars3d64Data and SmallNorb64Data to disent.dataset.data. These classes are optimised versions of their respective datasets that have their transforms pre-computed. This is much faster than resizing the observations during training as most of the disentanglement benchmarks are based off of datasets of width and height: 64x64
  • Added disent.dataset.sampling.GroundTruthRandomWalkSampler. This ground-truth dataset sampler simulates random walks around the factor space. For example: if there are two ground-truth factors x and y corresponding to a grid, this sampler would simulate an agent randomly moving around the grid.
  • Improvements to the registry. Augments, reconstruction losses and latent distributions can now be registered with disent using disent.registry.KERNELS, disent.registry.RECON_LOSSES and disent.registry.LATENT_HANDLERS. This affects:
    • disent.frameworks.helper.latent_distributions.make_latent_distribution
    • disent.frameworks.helper.reconstructions.make_reconstruction_loss
    • disent.dataset.transform._augment.get_kernel
  • Refactored disent.frameworks.DisentFramework, now also supports PyTorchLightning training, validation and test steps.
  • Split Ae and Vae heirarchy
    • This is so that we can directly check if a framework is an instance of one or the other. Previously Vae was a subclass of Ae which was unintuitive.
  • Rewrite of the disent.registry to make it more intuitive and useful throughout disent. Custom regex resolvers can now also be registered. There are now also different types of registries. Registries now also have examples for each item that can be constructed. See disent.registry._registry for more information.

Other Improvements

  • Improvements to disent.dataset.DisentDataset:
    • Added sampler, transform and augment properties.
    • Improved shallow_copy and unwrapped_shallow_copy logic and available arguments.
    • Can now return the ground-truth factors by specifying DisentDataset(return_factors=True)
    • Improved handling of batches and collating
  • Added state_space_copy(...) to disent.dataset.data.GroundTruthData, this function returns a copy of the underlying state space.
    • disent.dataset.samling Samplers now store the copy of the state space instead of the original dataset
  • Added sample(...) to disent.dataset.sampling.BaseDisentSampler, which is a more explicit alias to the original __call__(...) method.
  • to_img_tensor_u8 and to_img_tensor_f32 now check the size of the observations before resizing, if the size is unchanged, performance is greatly improved! This affects ToImgTensorF32 and ToImgTensorU8 from disent.dataset.transform.
  • Added factor_multipliers property to disent.dataset.util.state_space.StateSpace which allows custom implementations of pos_to_idx and idx_to_pos.
  • Added torch math helper functions to: disent.nn.functional
    • including: torch_norm, torch_dist, torch_norm_euclidean, torch_norm_manhattan, and torch_dist_hamming.
  • Added triplet_soft_loss and dist_triplet_soft_loss to torch.nn.loss.triplet.
  • Added more modes to disent.nn.weights.init_model_weights.
  • Added FixedValueSchedule and MultiplySchedule to disent.schedule. These schedules are useful for setting a constant value throughout a run, and overriding the actually set values in the config.
  • Added modify_name_keep_ext to disent.util.inout.paths. For adding prefixes or suffixes to files names without affecting the extension.
  • Added the decorator try_njit to disent.util.jit. This decorator tries to wrap the function with numba.njit, otherwise a warning is displayed. Numba should be an optional dependency, it is not specified in the requirements.
  • Split disent.util.lightning.callbacks into separate files.
    • Added many new features and fixes to these callbacks for the new versions.
  • Added disent.util.math.integer for computing the gcd and lcm with arbitrary precision values.
  • Added disent.util.visualise.vis_img with various features for visualising both tensors and bumpy images.
    • tensors by default are considered to be in CHW format, while numpy arrays are considered to be in HWC format. These values can be overridden
    • See torch_to_images(...) and numpy_to_images(...) for more details.
    • Other duplicated functions throughout the library will be replaced with these in future.

Breaking Changes

  • Temporarily removed DSpritesImagenetData. This dataset contains research code for my MSc and was not intended to be in previous releases. This will be re-added soon.
  • disent.dataset.transform._augment.make_kernel default scale mode changed to "none" from "sum".
    • This affects various other locations in the code, including disent.frameworks.helper.reconstructions.AugmentedReconLossHandler which uses kernels to augment loss functions.
  • Split Ae and Vae heirarchy
    • Vae is no longer an instance of Ae.
  • Metrics are now instances of disent.metrics.utils.Metric.
    • This callable class can easily be created using the disent.metrics.utils.make_metric decorator over existing metric functions.
    • The purpose of this change is to make metric default arguments self-contained. The Metric class has the functions compute and compute_fast which wrap the underlying decorated function. Arguments can be overridden as usual, however, the two versions when called use different default arguments.
  • Renamed and removed functions inside disent.util.visualise.vis_latents

Fixes

  • Fixed disent.dataset.sampling.GroundTruthDistSampler numerical precision error when computing scaled factor distances. Without this fix there is up to 1.5% change of making a sampling error over certain datasets.
  • Updated disent.nn.functional._pca for newer torch versions
  • Renamed disent.nn.loss.softsort.torch_soft_sort(...) parameter dims_at_end to leave_dims_at_end. This now matches torch_soft_rank(...).
  • disent.nn.loss.triplet_mining.configured_idx_mine(...) now exits early if the mode is set to "none".

Config Changes

  • Removed augment/basic.yaml and added augment/example.yaml instead.
  • Added the config group run_plugins which can be used to register a callback that is run by the experiment to register custom items with the disent framework such as new reconstruction losses or kernels.
  • dataset/cars3d.yaml and dataset/smallnorb.yaml now point to the optimized 64x64 versions of the datasets by default.
  • Renamed disable_decoder to detach_decoder in Ae and Vae configs
  • Removed disable_posterior_scale option from Ae and Vae configs
  • models/*.yaml now directly point to a model target instead of a separate encoder and decoder
  • run_callbacks/*.yaml now directly point to class targets rather than using pre-defined keys
  • run_logging/*.yaml now directly point to class targets rather than using pre-defined keys
  • Rewrite experiment.run to be more general. The hydra and experiment functionality can now be called from anywhere or used anywhere.
    • Ability to register your own config overrides without extending or forking disent has been added. We enable this by adding to the hydra search path. All that a user needs to do is specify the DISENT_CONFIGS_PREPEND environment variable to a new config folder. Anything inside this new config folder will recursively take priority over the existing experiment/config folder.
  • Rewrite HydraDataModule to only accept necessary arguments rather than the raw config. Configs are updated accordingly to specify these parameters directly.
  • Added experiment.util.hydra_main which can be used anywhere to launch a hydra experiment using the disent configs.
    • hydra_main(...) is used to run an experiment that passes a config to the given callback
    • patch_hydra() can instead be used just to initialise hydra if you want to setup everything yourself. The search path plugin that looks for DISENT_CONFIGS_PREPEND is registered, as well as various OmegaConf resolvers, including:
      • ${exit:<msg>} register a custom OmegaConf resolver that exits the program if accessed. We can use this to deprecate functionality, or force variables to be overridden!
      • ${run_num:<root_dir>} returns the current experiment number
      • ${run_dir:<root_dir>,<name>} returns the current experiment folder with the name appended
      • ${fmt:"{:04d}",42} returns "0042", the exact same as str.format
      • ${abspath:<rel_path>} convert a relative path to an abs path using the original hydra working directory, not the changed experiment dir.
      • ${rsync_dir:<src>/<name>,<dst>/<name>} useful if datasets are already prepared on a shared drive and need to be copied to a temp drive for example!
  • Added experiment.util.path_utils which adds support for automatically obtaining an experiment number from a directory of number prefixed files. The number returned is the existing maximum number plus one.

Test Changes

  • Updated tests.test_experiment to use new experiment.util.hydra_main functionality
  • Pickle tests for frameworks
  • Tests for torch norm functions
  • Registry test fixes
  • Extensive tests for new disent.util.visualize.vis_img functions and returned datatypes
  • temp_environ context manager

v0.3.4

06 Feb 11:05
Compare
Choose a tag to compare

Fixes

  • Leftover research config values have been fixed, addressing #23. Defaults should now just work locally.

Added

  • Frameworks did not implement validation and test functions for data, addressing #22. Schedules may be unintentionally affected by this change if used with test & validation datasets. An issue has been opened to investigate this.

v0.3.3

28 Nov 23:14
Compare
Choose a tag to compare

Fixes

  • disent.util.math was not a module, added empty __init__.py file

v0.3.2

22 Nov 12:21
Compare
Choose a tag to compare

Fixes

  • Fix FftKernel, accidentally forgot to freeze tensor weights.
  • Fix callbacks logging l1 instead of l2 distance
  • Fix callbacks failure if metrics are NaN
  • dsprites_imagenet macos prepare fix

Added

  • run_action=skip experiment action to just test if hydra is working.
  • VAEs now log the ratios between different loss terms.

Breaking

  • experiment.run.hydra_check_cuda renamed to hydra_get_gpus. Now returns an integer for the number of GPUs to use. Intended to be passed to a PyTorch Lightning Trainer.
  • Removed XYObjectData warning that things are now different

v0.3.1

11 Nov 10:14
Compare
Choose a tag to compare

Experiment Fixes

  • run_action=prepare_data has been fixed

Experiment Additions

  • new tests to ensure this continues to work properly

Experiment Changes

  • correct action is now chosen via the experiment.run.run_action(cfg) method
    • experiment.run.train renamed to action_train
    • experiment.run.prepare_data renamed to action_prepare_data
  • input config is no longer mutated

v0.3.0

11 Nov 09:12
Compare
Choose a tag to compare

This release touches most of the codebase.

Major Additions

  • added XYObjectShadedData dataset, which is exactly the same as XYObjectData but the ground truth factors differ. This might be useful for testing how metrics are affected by the ground truth representation of factors. Note that XYObjectData differs from previous versions due to this.
  • added DSpritesImagenetData dataset that is the same as DSpritesData but masks that background or foreground depending on the mode and replaces the content with deterministic data from tiny-imagenet
  • added disent.framework.vae.AdaGVaeMinimal which is a minimal implementation of AdaVae configured to run in gvae
  • added disent.util.lightning.callbacks.VaeGtDistsLoggingCallback which logs various distances matrices computed from averaged ground truth factor traversals.
  • Updated experiment files to use hydra 1.1
    • can now switch between train and prepare_data modes with the defaults group run_action=train

Other Additions

  • added shallow_copy to disent.dataset.DisentDataset enabling a shallow copy of the dataset but overriding specific properties such as the transform
  • added new disent.dataset.transform including ToImgTensorF32 (was ToStandardisedTensor ) and ToImgTensorU8
  • additions to H5Builder
    • add_dataset_from_array that constructs and fills a dataset in the hdf5 file from an array
    • converted into context manager instead of manually opening the hdf5 file
  • additions to StateSpace (and ground truth dataset child classes)
    • normalise_factor_idx convert names of ground truth factors into the numerical value
    • normalise_factor_idxs convert a name, an idx, lists of names, or lists of idxs to the numerical values of the ground truth factors.
  • disent.dataset.util.stats added compute_data_mean_std(data) to compute the mean and std of datasets
  • added disent.schedule.SingleSchedule
  • improved disent.util.deprecate.deprecated, now prints the stack trace for the call location of the deprecated function by default. This can be disabled.
  • added restart method to disent.util.profiling.Timer for easy use within a loop
  • added disent.util.vizualize.plot which contains various matplotlib helper code used throughout the library and PyTorch lightning callbacks.

Breaking Changes

  • removed confusing observation_shape and obs_shape properties from GroundTruthData and any child classes. Any methods that require these properties across disent had their names update too. For example the ArrayGroundTruthData class now takes x_shape.
    • observation_shape (H, W, C) should be replaced with img_shape, you will need to update your overrides in child classes
    • obs_shape (C, H, W) should be replaced with x_shape
  • XYObjectData default parameters updated for XYObjectShadedData , dataset and colour palettes differs slightly from previous versions.
  • moved module disent.nn.transform to disent.dataset.transform
    • renamed ToStandardisedTensor to ToImgTensorF32
  • H5Builder converted into context manager, similar API to open or h5py.File
  • ReconLossHandlerMse changed to not scale or centre the output, this is because we now normalise the data instead which is more correct
  • AdaVae and inheriting classes have various functions renamed for clarity
  • disent.metrics functions have ground_truth_dataset parameter renamed to dataset
  • disent.model.ae renamed DecoderTest and EncoderTest to DecoderLinear and EncoderLinear
  • disent.registry updated registry to use new more simple class structure and format. Some variables have been renamed, and registry names have been changed to plurals, eg. OPTIMIZER is now OPTIMIZERS
  • disent.schedule cleaned up
    • renamed various variables and parameters min_step -> start_step, max_step -> end_step
    • removed disent.schedule.lerp.scale() function, as it is the same as lerp just not clipped
  • disent.util.lightning.callbacks.VaeDisentanglementLoggingCallback renamed to VaeMetricLoggingCallback
  • docs.examples updated to use new XYObjectData version and ToImgTensorF32 transform

Deprecations

  • deprecated ground_truth_data property on DisentDataset , this should be replaced with the shorter gt_data property. References to ground_truth_data have been replaced in disent.

Fixes

  • Fixed Mpi3dData datasets, and added file hashes
  • Updated requirements
  • Many minor fixes, usability and error message improvements

Hydra Experiment Changes

Hydra Config has finally been updated from version 1.0 to 1.1, adding support for recursive defaults and recursive instantiation. This allows is to remove all of our custom & hacky hydra helper code that previously enabled these features.

  • hydra now supports recursive instantiation
  • value based specialisation can now be done with recursive defaults using dummy groups

Updating hydra was a good opportunity to re-structure the configuration format.

  • All settings defined in the root config that are referenced elsewhere are now in the settings key.
  • Default settings defined in various subgroups that are referenced elsewhere are often placed in the dsettings key.
  • Keys for various objects were renamed for clarity, eg. augment.transform was renamed to augment.augment_cls
  • All datasets now require the meta.vis_mean and meta.vis_std keys that are used both to normalise the dataset and used to re-scale it between [0, 1] for visualisation during training.

Every config file has been touched, the best approach is probably to look at the new system. The general structure remains the same, but the recursive defaults from Hydra 1.1 allows us to implement various things in a more clean way.

  • new defaults group run_launcher to easily swap between slurm and local
  • defaults group run_location only specifies machine resources and paths
  • new defaults group sampling specifies details and the sampling strategy to be used by the frameworks
  • new defaults group run_action to switch between training and downloading and installing datasets prepare_data

v0.2.1

04 Oct 23:53
Compare
Choose a tag to compare

Under the hood, quite a lot of code has been added or changed for this release, however the API remains very much the same.

Additions

  • Wrapped datasets, instances of disent.dataset.wrapper.WrappedDataset are datasets that have some sort of mask applied to them that hides the true state space and resizes the dataset.
    • disent.dataset.wrapper.DitheredDataset applies an n-dimensional dithering operation to ground truth factors
    • disent.dataset.wrapper.MaskedDataset applies some provided boolean mask over the dataset
  • disent.dataset.DisentDataset now supports wrapped datasets (instances of disent.dataset.wrapper.WrappedDataset). New methods and properties have been added to compliment this feature:
    • is_wrapped_data check if there is wrapped data
    • is_wrapped_gt_data check if there is wrapped data and the wrapped data is ground truth data
    • wrapped_data obtain the wrapped data, otherwise throw an error
    • wrapped_gt_data obtain the wrapped ground truth data, otherwise throw an error
    • unwrapped_disent_dataset creates a copy of the disent dataset with everything the same, except the data is unwrapped.
  • disent.util.lightning.callbacks additions
    • Support for wrapped datasets. They automatically try to unwrap them to obtain the ground truth data which can be used to compute metrics and perform visualisations.
    • Support model output scaling to a certain range of values, fixing visualisations when using VaeLatentCycleLoggingCallback
  • new utilities
    • disent.util.math.dither
    • disent.util.math.random
  • Self contained HDF5 ground-truth datasets. These store all the information needed to construct the dataset and state space in one file, including the factor names.
    • Added disent.dataset.data.SelfContainedHdf5GroundTruthData to read these files
    • Added disent.dataset.util.H5Builder for creating these files. (API is not yet finalised)
  • disent.dataset.util.StateSpace added helper function iter_traversal_indices
  • disent.nn.transform added ToUint8Tensor which acts like ToStandardisedTensor, but instead of loading images as float32, it loads them as uint8. This is useful when you need to use datasets outside of a ML Model context, eg. performing analysis. This takes up less memory.
    • corresponding functional version exists to_uint_tensor complimenting to_standardised_tensor
  • Begun work on a component & function registry, although do not use this as the API will change significantly.

API Breakages

  • Under the hood, implementing wrapped data and DisentDataset copying requires the ability to copy samplers, so each sampler implementation should have the uninit_copy method implemented too.
  • ArrayGroundTruthData is more strict about the observation_shape must be (H, W, C) or (C, H, W) depending on array_chn_is_last
  • Removed reconstruction losses:
    • ReconLossHandlerMse4 aka. "mse4"
    • ReconLossHandlerMae2 aka. "mae2"
  • Renamed disent.util.visualize.get_factor_traversal to get_idx_traversal

Deprecations

  • GroundTruthData property aliases:
    • img_shape new property for the deprecated observation_shape
    • obs_shape new property for the deprecated x_shape
    • img_channels new property for the number of channels in the image

Fixes

  • disent.util.inout.files.AtomicSaveFile minor fix to overwriting files
  • disent.util.lightning.callbacks.LoggerProgressCallback fix to datatypes and potential crashes due to floats
  • More stable experiment runs when performing sweeps. Better error handling, error messages and error catching.
  • fixes to the various requirement*.txt files
  • many other minor fixes

v0.2.0

04 Oct 14:34
Compare
Choose a tag to compare

API Breakages

  • DisentFramework no longer takes in make_optimizer_fn callback, but instead includes this as part of the cfg by specifying optimizer and optimizer_kwargs.
  • Ae derived subclasses now take in an instantiated AutoEncoder instance to the model param instead of the make_model_fn callback.

Additions

  • DisentDataset can now return observation indices in the "idx" field if return_indices=True
  • sample_random_obs_traversal added to GroundTruthData
  • new basic experiment test

Chages

  • python 3.8 and 3.9 support (3.7 is unsupported due to missing standard library typing features)
  • TempNumpySeed now inherits from contextlib.ContextDecorator
  • updated hydra-core to 1.0.7

Fixes

  • SmallNorbData by default now returns observations of size (96, 96, 1) instead of (96, 96)
  • Removed Deprecated dependency which also couldn't be pickled, fixing hydra submittit issues
  • LoggerProgressCallback displays more reliable information and now supports PyTorch Lightning 1.4
  • HydraDataModule now supports PyTorch Lightning 1.4
  • merge_specializations fixed to depend on OmegaConf not Hydra

v0.1.0 - Initial Release

28 Jul 12:43
Compare
Choose a tag to compare

Initial Release

Overview

The initial release of Disent

  • please see the docs and readme for new usage examples, changes should be easy to make to existing code, notably the DisentDataset and DisentSampler changes.

Changes

  • Replaced sampling datasets with one common class disent.dataset.DisentDataset
    • Wraps other datasets (torch.utils.data.Dataset or disent.dataset.data.GroundTruthData)
    • Accepts an implemented subclass of disent.dataset.sampling.BaseDisentSampler which controls how many observations are sampled and returned (eg. for triplet networks).
    • eg. disent.dataset.groundtruth.GroundTruthDatasetPairs is now disent.dataset.sampling.GroundTruthPairSampler
  • Removed all experimental code & features unique to Disent. Hydra configs and runners for non-experimental features remain. These features will be cleaned up and re-added once I submit my dissertation.
    • ❌ experimental frameworks
    • ❌ experimental datasets
    • ❌ experimental metrics
    • ❌ experimental models
    • ❌ experimental augmentations
    • ❌ experiment files
  • Verified models

    • some models had potentially diverged from their original implementations and papers.
    • Added a new test model: EncoderTest & DecoderTest
  • disent.nn Changes:

    • Added disent.nn.activations.Swish
    • Removed loss reduction mode "sum" in disent.nn.loss.reduction
    • Split out triplet mining logic from frameworks into torch.nn.loss.triplet_mining
    • Replaced from torch.nn.modules import BatchView, Unsqueeze3D, Flatten3D with pytorch 1.9 equivalents
    • Backwards compatible opt-in disent.nn.transform.ToStandardisedTensor enhancements
  • disent.util Refactor, grouping logic into submodules:

    • disent.util.inout: utilities for working with paths, files and saving files.
    • disent.util.lightning: various helper functions and callbacks for pytorch lightning, some incorperated from past experiment files.
    • disent.util.strings: utilities for working with strings and ansi escape codes
    • disent.util.visualise: moved disent.visualise into this module, separating framework logic and helper logic in disent.
  • Cleaned up requirements.txt

    • optional requirements moved into: requirements-test.txt and requirements-exp.txt
  • New tests

    • samplers
    • models
  • And a many bug-fixes

v0.0.1.dev14

04 Jun 23:08
Compare
Choose a tag to compare
v0.0.1.dev14 Pre-release
Pre-release

Overview

This release is mostly a large set of refactors, and reproducibility improvements with regards to seeds and datasets.

Notable Changes

  • Data now relies on disent.data.datafile.DataFiles, which are deterministic, hash and cache based, file generators that can fetch or pre-process data.
  • Added XYSquaresMinimalData, which is a minimal faster version of XYSquaresData without any configuration options. With default parameters, data from XYSquaresData should equal XYSquaresMinimalData
  • Added PickleH5pyFile that can pickle an hdf5 file and dataset. This is intended to be used with torch DataLoaders or multiprocessing.

Definitely Breaking Changes

  • renamed classes:

    • renamed AugmentableDataset to DisentDataset
    • renamed BaseFramework to DisentFramework
    • renamed BaseEncoderModule to DisentEncoder
    • renamed BaseDecoderModule to DisentDecoder
  • consolidated maths and helper functions into new submodule disent.nn

    • disent.nn.weights initialisation functions from originally disent.model.init
    • disent.nn.modules basic modules from various locations including DisentModule, DisentLightningModule, BatchView, Unsqueeze3D, Flatten3D
    • disent.nn.transform transform and augment functions and classes from disent.transform, still needs to be cleaned up in future releases.
    • disent.nn.loss various loss functions from other places including triplet, kl, softsort and reduction modules
    • torch.nn.functional various differentiable torch helper functions mostly from disent.util.math, including functions for computing the Covariance, Correlation, Generalised Mean, PCA, DCT, Channel-Wise convolutions and more! Some functions such as kernel generation need to be moved out of here.
  • split up and consolidated utilities:

    • disent.util.cache caching utilities including the stalefile decorator that only runs the wrapped function if the specified file is stale (hash does not match, or file does not exist)
    • disent.util.colors ANSI escape codes
    • disent.util.function wrapper, decorator and inspect utilities
    • disent.util.hashing compute the full hash of a file or a fast hash based on the README for the imohash algorithm.
    • disent.util.in_out originally from disent.data.util for handling file retrieval/downloading/copying and saving
    • disent.util.iters general iterators or map functions, including iter_chunks and iter_rechunk
    • disent.util.paths path handling and file or directory management
    • disent.util.profiling timers & memory usage
    • disent.util.seeds seed management contexts and functions
    • disent.util.strings string formatting helper functions
  • removed and cleaned up functions from:

    • disent.data.hdf5
    • disent.dataset.__init__
    • disent.util.__init__
    • disent.schedule.lerp renamed activate to scale_ratio and removed other functions.

Other Changes

  • Replaced GroundTruthData specialisations with general loading from DataFiles.
  • StateSpace now stores factor_names instead of GroundTruthData - preparing for rewrite of datasets to use dependency injections and samplers.

Experiment Config & Runner Changes

  • Many config fixes for refactors
  • Experiment can now be seeded

New Tests

  • test PickleH5pyFile multiprocessing support
  • test XYSquaresData and XYSquaresMinimalData similarity