- Cap numpy to less than 2.0.0 until CTGan supports - Issue #387 by @gsheni
- Redundant whitespace in the demo data - Issue #233
- Add workflow to generate release notes - Issue #404 by @amontanez24
- Switch to using ruff for Python linting and code formatting - Issue #335 by @gsheni
- Add support for numpy 2.0.0 - Issue #386 by @R-Palazzo
This release removes a warning that was cluttering the console.
- Cleanup automated PR workflows - Issue #370 by @R-Palazzo
- Only run unit and integration tests on oldest and latest python versions for macos - Issue #375 by @R-Palazzo
- Remove FutureWarning: Setting an item of incompatible dtype is deprecated - Issue #373 by @fealho
This release adds support for Python 3.12!
- Support Python 3.12 - Issue #324 by @fealho
- Remove scikit-learn dependency - Issue #346 by @R-Palazzo
- Add bandit workflow - Issue #353 by @R-Palazzo
- Replace integration test that uses the iris demo data - Issue #352 by @R-Palazzo
- Fix minimum version workflow when pointing to github branch - Issue #355 by @R-Palazzo
This release changes the loss_values
attribute of a CTGAN model to contain floats instead of torch.Tensors
.
- Return loss values as float values not PyTorch objects - Issue #332 by @fealho
- Transition from using setup.py to pyproject.toml to specify project metadata - Issue #333 by @R-Palazzo
- Remove bumpversion and use bump-my-version - Issue #334 by @R-Palazzo
- Add dependency checker - Issue #336 by @amontanez24
This release makes CTGAN sampling more efficient by saving the frequency of each categorical value.
- Improve DataSampler efficiency - Issue [#327] ((https://github.com/sdv-dev/CTGAN/issue/327)) by @fealho
This release adds a progress bar that will show when setting the verbose
parameter to True
when initializing TVAE
.
- Add verbosity TVAE (progress bar + save the loss values) - Issue [#300]((#300) by @frances-h
This release adds a progress bar that will show when setting the verbose
parameter to True when initializing CTGAN
. It also removes a warning that was showing.
- Remove model_missing_values from ClusterBasedNormalizer call - PR #310 by @fealho
- Switch default branch from master to main - Issue #311 by @amontanez24
- Remove or implement CTGAN tests - Issue #312 by @fealho
- Add progress bar for CTGAN fitting (+ save the loss values) - Issue #298 by @frances-h
This release adds support for Python 3.11 and drops support for Python 3.7.
- Why is there an upper bound in the packaging requirement? (packaging<22) - Issue #276 by @fealho
- Add support for Python 3.11 - Issue #296 by @fealho
- Drop support for Python 3.7 - Issue #302 by @fealho
This release adds support for Torch 2.0!
- Torch 2.0 fails with cuda=False - Issue #288 by @amontanez24
- Upgrade to torch 2.0 - Issue #280 by @frances-h
This release adds support for Pandas 2.0! It also fixes a bug in the load_demo
function.
- load_demo raises urllib.error.HTTPError: HTTP Error 403: Forbidden - Issue #284 by @amontanez24
- Remove upper bound for pandas - Issue #282 by @frances-h
This release fixes a bug that prevented the CTGAN
model from being saved after sampling.
- Cannot save CTGANSynthesizer after sampling (TypeError) - Issue #270 by @pvk-developer
This release adds support for python 3.10 and drops support for python 3.6. It also fixes a couple of the most common warnings that were surfacing.
- Support Python 3.10 and 3.11 - Issue #259 by @pvk-developer
- Fix SettingWithCopyWarning (may be leading to a numerical calculation bug) - Issue #215 by @amontanez24
- FutureWarning in data_transformer with pandas 1.5.0 - Issue #246 by @amontanez24
- CTGAN Package Maintenance Updates - Issue #257 by @amontanez24
This release renames the models in CTGAN. CTGANSynthesizer
is now called CTGAN
and TVAESynthesizer
is now called TVAE
.
- Rename synthesizers - Issue #243 by @amontanez24
This release updates CTGAN to use the latest version of RDT. It also includes performance and robustness updates to the data transformer.
- Bump rdt version - Issue #242 by @katxiao
- Single thread data transform is slow for huge table - Issue #151 by @mfhbree
- Fix RDT api - Issue #232 by @pvk-developer
- Update macos to use latest version. - Issue #237 by @pvk-developer
- Update the RDT version to 1.0 - Issue #224 by @pvk-developer
- Update slack invite link. - Issue #222 by @pvk-developer
- robustness fix, when data have less rows than the default number of cl… - Issue #211 by @Deathn0t
This release fixes a bug with the decoder instantiation, and also allows users to set a random state for the model fitting and sampling.
- Update self.decoder with correct variable name - Issue #203 by @tejuafonja
- Add random state - Issue #204 by @katxiao
This release adds support for Python 3.9 and updates dependencies to ensure compatibility with the rest of the SDV ecosystem, and upgrades to the latests RDT release.
- Add support for Python 3.9 - Issue #177 by @pvk-developer
- Add pip check to CI workflows - Issue #174 by @pvk-developer
- Typo in
CTGAN
code - Issue #158 by @ori-katz100 and @fealho
Dependency upgrades to ensure compatibility with the rest of the SDV ecosystem.
In this release, the way in which the loss function of the TVAE model was computed has been fixed.
In addition, the default value of the discriminator_decay
has been changed to a more optimal
value. Also some improvements to the tests were added.
TVAE
: loss function - Issue #143 by @fealho and @DingfanChen- Set
discriminator_decay
to1e-6
- Pull request #145 by @fealho - Adds unit tests - Pull requests #140 by @fealho
This release exposes all the hyperparameters which the user may find useful for both CTGAN
and TVAE
. Also TVAE
can now be fitted on datasets that are shorter than the batch
size and drops the last batch only if the data size is not divisible by the batch size.
TVAE
: Adaptbatch_size
to data size - Issue #135 by @fealho and @csalaValueError
fromvalidate_discre_columns
withuniqueCombinationConstraint
- Issue 133 by @fealho and @MLjungg
Maintenance relese to upgrade dependencies to ensure compatibility with the rest of the SDV libraries.
Also add a validation on the CTGAN condition_column
and condition_value
inputs.
- Validate condition_column and condition_value - Issue #124 by @fealho
- Check discrete_columns valid before fitting - Issue #35 by @fealho
- ValueError: max() arg is an empty sequence - Issue #115 by @fealho
In this release we add a new TVAE model which was presented in the original CTGAN paper. It also exposes more hyperparameters and moves epochs and log_frequency from fit to the constructor.
A new verbose argument has been added to optionally disable unnecessary printing, and a new hyperparameter
called discriminator_steps
has been added to CTGAN to control the number of optimization steps performed
in the discriminator for each generator epoch.
The code has also been reorganized and cleaned up for better readability and interpretability.
Special thanks to @Baukebrenninkmeijer @fealho @leix28 @csala for the contributions!
- Add TVAE - Issue #111 by @fealho
- Move
log_frequency
to__init__
- Issue #102 by @fealho - Add discriminator steps hyperparameter - Issue #101 by @Baukebrenninkmeijer
- Code cleanup / Expose hyperparameters - Issue #59 by @fealho and @leix28
- Publish to conda repo - Issue #54 by @fealho
- Fixed NaN != NaN counting bug. - Issue #100 by @fealho
- Update dependencies and testing - Issue #90 by @csala
In this release we introduce several minor improvements to make CTGAN more versatile and propertly support new types of data, such as categorical NaN values, as well as conditional sampling and features to save and load models.
Additionally, the dependency ranges and python versions have been updated to support up to date runtimes.
Many thanks @fealho @leix28 @csala @oregonpillow and @lurosenb for working on making this release possible!
- Drop Python 3.5 support - Issue #79 by @fealho
- Support NaN values in categorical variables - Issue #78 by @fealho
- Sample synthetic data conditioning on a discrete column - Issue #69 by @leix28
- Support recent versions of pandas - Issue #57 by @csala
- Easy solution for restoring original dtypes - Issue #26 by @oregonpillow
- Loss to nan - Issue #73 by @fealho
- Swapped the sklearn utils testing import statement - Issue #53 by @lurosenb
Minor version including changes to ensure the logs are properly printed and the option to disable the log transformation to the discrete column frequencies.
Special thanks to @kevinykuo for the contributions!
- Option to sample from true data frequency instead of logged frequency - Issue #16 by @kevinykuo
- Flush stdout buffer for epoch updates - Issue #14 by @kevinykuo
Reorganization of the project structure with a new Python API, new Command Line Interface and increased data format support.
- Reorganize the project structure - Issue #10 by @csala
- Move epochs to the fit method - Issue #5 by @csala
First Release - NeurIPS 2019 Version.