Releases · tensorflow/transform

24 May 19:27

dhruvesh09

v1.0.0

520ebb4

TensorFlow Transform 1.0.0

Major Features and Improvements

Bug Fixes and Other Changes

Depends on apache-beam[gcp]>=2.29,<3.
Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<2.6.
Depends on tensorflow-metadata>=1.0.0,<1.1.0.
Depends on tfx-bsl>=1.0.0,<1.1.0.

Breaking Changes

tft.ptransform_analyzer has been moved under tft.experimental. The order
of args in the API has also been changed.
tft_beam.PTransformAnalyzer has been moved under tft_beam.experimental.
The default value of the drop_unused_features parameter to
TFTransformOutput.transform_raw_features is now True.

Deprecations

Assets 2

26 Apr 22:09

dhruvesh09

v0.30.0

cd8490f

TensorFlow Transform 0.30.0

Major Features and Improvements

Bug Fixes and Other Changes

Removed the dataset_schema module, most methods in it have been deprecated
since version 0.14.
Fix a bug where having an analyzer operate on the output of tft.vocabulary
would cause it to evaluate incorrectly when force_tf_compat_v1=False with
TF2 behaviors enabled.
Depends on tensorflow-metadata>=0.30.0,<0.31.0.
Depends on tfx-bsl>=0.30.0,<0.31.0.

Breaking Changes

DatasetMetadata no longer accepts a dict as its input schema. schema is
expected to be a Schema proto now.
TF 1.15 specific APIs apply_saved_model and
apply_function_with_checkpoint were removed from the tft namespace. They
are still available under the pretrained_models module.
tft.AnalyzeDataset, tft.AnalyzeDatasetWithCache,
tft.AnalyzeAndTransformDataset and tft.TransformDataset will use the
native TF2 implementation of tf.transform unless TF2 behaviors are
explicitly disabled. The previous behaviour can still be obtained by setting
tft.Context.force_tf_compat_v1=True.

Deprecations

Assets 2

25 Mar 17:27

dhruvesh09

v0.29.0

3400ce2

TensorFlow Transform 0.29.0

Major Features and Improvements

tft.AnalyzeAndTransformDataset and tft.TransformDataset can now output
pyarrow.RecordBatches. This is controlled by a parameter
output_record_batches which is set to False by default.

Bug Fixes and Other Changes

Added tft.make_and_track_object to load and track tf.Trackable objects
created inside the preprocessing_fn (for example, tf.hub models). This API
should only be used when force_tf_compat_v1=False and TF2 behavior is
enabled.
The decode method of the available coders (tft.coders.CsvCoder and
tft.coders.ExampleProtoCoder) have been removed. These were deprecated in
the 0.25 release.
Canned TFXIO implementations
should be used to read and decode data instead.
Previously deprecated APIs were removed: tft.uniques (replaced by
tft.vocabulary), tft.string_to_int (replaced by
tft.compute_and_apply_vocabulary), tft.apply_vocab (replaced by
tft.apply_vocabulary), and tft.apply_function (identity function).
Removed the always_return_num_quantiles arg of tft.quantiles and
tft.bucketize which was deprecated in version 0.26.
Added support for count_params method to the TransformFeaturesLayer.
This will allow to call Keras Model's summary() method if the model is
using the TransformFeaturesLayer.
Depends on absl-py>=0.9,<0.13.
Depends on tensorflow-metadata>=0.29.0,<0.30.0.
Depends on tfx-bsl>=0.29.0,<0.30.0.

Breaking Changes

Existing caches (for all analyzers) are automatically invalidated.

Deprecations

Assets 2

23 Feb 21:27

dhruvesh09

v0.28.0

e851c82

TensorFlow Transform 0.28.0

Major Features and Improvements

Large vocabularies are now computed faster due to partially parallelizing
VocabularyOrderAndWrite.

Bug Fixes and Other Changes

Generic tf.SparseTensor input support has been added to
tft.scale_to_0_1, tft.scale_to_z_score, tft.scale_by_min_max,
tft.min, tft.max, tft.mean, tft.var, tft.sum, tft.size and
tft.word_count.
Optimize SavedModel written out by tf.Transform when using native TF2 to
speed up loading it.
Added tft_beam.PTransformAnalyzer as a base PTransform class for
tft.ptransform_analyzer users who wish to have access to a base temporary
directory.
Fix an issue where >2D SparseTensors may be incorrectly represented in
instance_dicts format.
Added support for out-of-vocabulary keys for per_key mappers.
Added tft.get_num_buckets_for_transformed_feature which provides the
number of buckets for a transformed feature if it is a direct output of
tft.bucketize, tft.apply_buckets, tft.compute_and_apply_vocabulary or
tft.apply_vocabulary.
Depends on apache-beam[gcp]>=2.28,<3.
Depends on numpy>=1.16,<1.20.
Depends on tensorflow-metadata>=0.28.0,<0.29.0.
Depends on tfx-bsl>=0.28.1,<0.29.0.

Breaking changes

Autograph is disabled when the preprocessing fn is traced using tf.function
when force_tf_compat_v1=False and TF2 behavior is enabled.

Deprecations

Assets 2

27 Jan 23:13

dhruvesh09

v0.27.0

8187629

TensorFlow Transform 0.27.0

Major Features and Improvements

Added QuantilesCombiner.compact method that moves some amount of work done
by tft.quantiles from non-parallelizable to parallelizable stage of the
computation.

Bug Fixes and Other Changes

Strip only newlines instead of all whitespace in the TFTransformOutput
vocabulary_by_name method.
Switch analyzers that output asset files to return an eager tensor
containing the asset file path instead of a tf.saved_model.Asset object when
force_tf_compat_v1=False. If this file is then used to initialize a table,
this ensures the input to the tf.lookup.TextFileInitializer is the file
path as the initializer handles wrapping this in a tf.saved_model.Asset
object.
Added tft.annotate_asset for annotating asset files with a string key that
can be used to retrieve them in tft.TFTransformOutput.
Depends on apache-beam[gcp]>=2.27,<3.
Depends on pyarrow>=1,<3.
Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<2.5.
Depends on tensorflow-metadata>=0.27.0,<0.28.0.
Depends on tfx-bsl>=0.27.0,<0.28.0.

Breaking changes

Deprecations

Parameter use_tfxio in the initializer of Context is removed (it was
deprecated in 0.24.0).

Assets 2

16 Dec 21:15

dhruvesh09

v0.26.0

1a65548

TensorFlow Transform 0.26.0

Major Features and Improvements

Initial support added of >2D SparseTensors as inputs and outputs of the
preprocessing_fn. Note that mappers and analyzers may not support those
yet, and output >2D SparseTensors will have an unkonwn dense shape.

Bug Fixes and Other Changes

Switched to calling tables and initializers within tf.init_scope when the
preprocessing_fn is traced using tf.function to avoid re-initializing
them on every invocation of the traced tf.function.
Switched to a (notably) faster and more accurate implementation of
tft.quantiles analyzer.
Fix an issue where graphs become non-hermetic if a TF2 transform_fn is
loaded in a TF1 Graph context, by making sure all assets are added to the
ASSET_FILEPATHS collection.
Depends on apache-beam[gcp]>=2.25,!=2.26.*,<3.
Depends on pyarrow>=0.17,<0.18.
Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4.
Depends on tensorflow-metadata>=0.26.0,<0.27.0.
Depends on tfx-bsl>=0.26.0,<0.27.0.

Breaking changes

Existing tft.quantiles, tft.min and tft.max caches are invalidated.

Deprecations

Parameter always_return_num_quantiles of tft.quantiles and
tft.bucketize is now deprecated. Both now always generate the requested
number of buckets. Setting always_return_num_quantiles will have no effect
and it will be removed in the next version.

Assets 2

04 Nov 22:48

dhruvesh09

v0.25.0

6dd163b

TensorFlow Transform 0.25.0

Major Features and Improvements

Updated the "Getting Started" guide and examples to demonstrate the support
for both the "instance dict" and the "TFXIO" format. Users are encouraged to
start using the "TFXIO" format, expecially in cases where
pre-canned TFXIO implementations
is available as it offers better performance.
From this release TFT will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:
```
pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-transform
```
Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFT available on PyPI by running the
command pip install tensorflow-transform .

Bug Fixes and Other Changes

TFTransformOutput.transform_raw_features and TransformFeaturesLayer can
be used when a transform fn is exported as a TF2 SavedModel and imported in
graph mode.
Utility methods in tft.inspect_preprocessing_fn now take an optional
parameter force_tf_compat_v1. If this is False, the preprocessing_fn is
traced using tf.function in TF 2.x when TF 2 behaviors are enabled.
Switching to a wrapper for collections.namedtuple to ensure compatibility
with PySpark which modifies classes produced by the factory.
Caching has been disabled for tft.tukey_h_params, tft.tukey_location and
tft.tukey_scale due to the cached accumulator being non-deterministic.
Track variables created within the preprocessing_fn in the native TF 2
implementation.
TFTransformOutput.transform_raw_features returns a wrapped python dict
that overrides pop to return None instead of raising a KeyError when called
with a key not found in the dictionary. This is done as preparation for
switching the default value of drop_unused_features to True.
Vocabularies written in tfrecord_gzip format no longer filter out entries
that are empty or that include a newline character.
Depends on apache-beam[gcp]>=2.25,<3.
Depends on tensorflow-metadata>=0.25,<0.26.
Depends on tfx-bsl>=0.25,<0.26.

Breaking changes

Deprecations

The decode method of the available coders (tft.coders.CsvCoder and
tft.coders.ExampleProtoCoder) has been deprecated and removed.
Canned TFXIO implementations
should be used to read and decode data instead.

Assets 2

24 Sep 21:24

dhruvesh09

v0.24.1

81143e7

TensorFlow Transform 0.24.1

Major Features and Improvements

Bug Fixes and Other Changes

Depends on apache-beam[gcp]>=2.24,<3.
Depends on tfx-bsl>=0.24.1,<0.25.

Breaking changes

Deprecations

Assets 2

14 Sep 20:34

dhruvesh09

v0.24.0

0816ad8

TensorFlow Transform 0.24.0

Major Features and Improvements

Added native TF 2 implementation of Transform's Beam APIs -
tft.AnalyzeDataset, tft.AnalyzeDatasetWithCache,
tft.AnalyzeAndTransformDataset and tft.TransformDataset. The default
behavior will continue to use Tensorflow's compat.v1 APIs. This can be
overriden by setting tft.Context.force_tf_compat_v1=False. The default
behavior for TF 2 users will be switched to the new native implementation in
a future release.

Bug Fixes and Other Changes

Added a small fanout to analyzers' CombineGlobally for improved
performance.
Depends on absl-py>=0.9,<0.11.
Depends on protobuf>=3.9.2,<4.
Depends on tensorflow-metadata>=0.24,<0.25.
Depends on tfx-bsl>=0.24,<0.25.

Breaking changes

Deprecations

Deprecating Py3.5 support.
Parameter use_tfxio in the initializer of Context is deprecated. TFT
Beam APIs now accepts both "instance dicts" and "TFXIO" input formats.
Setting it will have no effect and it will be removed in the next version.

Assets 2

24 Aug 15:46

dhruvesh09

v0.23.0

49c98bb

Version 0.23.0

Major Features and Improvements

Added tft.scale_to_gaussian to transform input to standard gaussian.
Vocabulary related analyzers and mappers now accept a file_format argument
allowing the vocabulary to be saved in TFRecord format. The default format
remains text (TFRecord format requires tensorflow>=2.4).

Bug Fixes and Other Changes

Enable SavedModelLoader to import and apply TF2 SavedModels.
tft.min, tft.max, tft.sum, tft.covariance and tft.pca now have
default output values to properly process empty analysis datasets.
tft.scale_by_min_max, tft.scale_to_0_1 and the corresponding per-key
versions now apply a sigmoid function to scale tensors if the analysis
dataset is either empty or contains a single distinct value.
Added best-effort tf.text op registration when loading transformation
graphs.
Vocabularies computed over numerical features will now assign values to
entries with equal frequency in reverse lexicographical order as well,
similarly to string features.
Fixed an issue that causes the TABLE_INITIALIZERS graph collection to
contain a tensor instead of an op when a TF2 SavedModel or a TF2 Hub Module
containing a table is loaded inside the preprocessing_fn.
Fixes an issue where the output tensors of tft.TransformFeaturesLayer
would all have unknown shapes.
Stopped depending on avro-python3.
Depends on apache-beam[gcp]>=2.23,<3.
Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4.
Depends on tensorflow-metadata>=0.23,<0.24.
Depends on tfx-bsl>=0.23,<0.24.

Breaking changes

Existing caches (for all analyzers) are automatically invalidated.

Deprecations

Deprecating Py2 support.
Note: We plan to remove Python 3.5 support after this release.

Assets 2

Releases: tensorflow/transform

TensorFlow Transform 1.0.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking Changes

Deprecations

TensorFlow Transform 0.30.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking Changes

Deprecations

TensorFlow Transform 0.29.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking Changes

Deprecations

TensorFlow Transform 0.28.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking changes

Deprecations

TensorFlow Transform 0.27.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking changes

Deprecations

TensorFlow Transform 0.26.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking changes

Deprecations

TensorFlow Transform 0.25.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking changes

Deprecations

TensorFlow Transform 0.24.1

Major Features and Improvements

Bug Fixes and Other Changes

Breaking changes

Deprecations

TensorFlow Transform 0.24.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking changes

Deprecations

Version 0.23.0

Major Features and Improvements

Bug Fixes and Other Changes

Breaking changes

Deprecations