Skip to content

Releases: tensorflow/transform

TensorFlow Transform 1.0.0

24 May 19:27
520ebb4
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • Depends on apache-beam[gcp]>=2.29,<3.
  • Depends on
    tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<2.6.
  • Depends on tensorflow-metadata>=1.0.0,<1.1.0.
  • Depends on tfx-bsl>=1.0.0,<1.1.0.

Breaking Changes

  • tft.ptransform_analyzer has been moved under tft.experimental. The order
    of args in the API has also been changed.
  • tft_beam.PTransformAnalyzer has been moved under tft_beam.experimental.
  • The default value of the drop_unused_features parameter to
    TFTransformOutput.transform_raw_features is now True.

Deprecations

  • N/A

TensorFlow Transform 0.30.0

26 Apr 22:09
cd8490f
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • Removed the dataset_schema module, most methods in it have been deprecated
    since version 0.14.
  • Fix a bug where having an analyzer operate on the output of tft.vocabulary
    would cause it to evaluate incorrectly when force_tf_compat_v1=False with
    TF2 behaviors enabled.
  • Depends on tensorflow-metadata>=0.30.0,<0.31.0.
  • Depends on tfx-bsl>=0.30.0,<0.31.0.

Breaking Changes

  • DatasetMetadata no longer accepts a dict as its input schema. schema is
    expected to be a Schema proto now.
  • TF 1.15 specific APIs apply_saved_model and
    apply_function_with_checkpoint were removed from the tft namespace. They
    are still available under the pretrained_models module.
  • tft.AnalyzeDataset, tft.AnalyzeDatasetWithCache,
    tft.AnalyzeAndTransformDataset and tft.TransformDataset will use the
    native TF2 implementation of tf.transform unless TF2 behaviors are
    explicitly disabled. The previous behaviour can still be obtained by setting
    tft.Context.force_tf_compat_v1=True.

Deprecations

  • N/A

TensorFlow Transform 0.29.0

25 Mar 17:27
3400ce2
Compare
Choose a tag to compare

Major Features and Improvements

  • tft.AnalyzeAndTransformDataset and tft.TransformDataset can now output
    pyarrow.RecordBatches. This is controlled by a parameter
    output_record_batches which is set to False by default.

Bug Fixes and Other Changes

  • Added tft.make_and_track_object to load and track tf.Trackable objects
    created inside the preprocessing_fn (for example, tf.hub models). This API
    should only be used when force_tf_compat_v1=False and TF2 behavior is
    enabled.
  • The decode method of the available coders (tft.coders.CsvCoder and
    tft.coders.ExampleProtoCoder) have been removed. These were deprecated in
    the 0.25 release.
    Canned TFXIO implementations
    should be used to read and decode data instead.
  • Previously deprecated APIs were removed: tft.uniques (replaced by
    tft.vocabulary), tft.string_to_int (replaced by
    tft.compute_and_apply_vocabulary), tft.apply_vocab (replaced by
    tft.apply_vocabulary), and tft.apply_function (identity function).
  • Removed the always_return_num_quantiles arg of tft.quantiles and
    tft.bucketize which was deprecated in version 0.26.
  • Added support for count_params method to the TransformFeaturesLayer.
    This will allow to call Keras Model's summary() method if the model is
    using the TransformFeaturesLayer.
  • Depends on absl-py>=0.9,<0.13.
  • Depends on tensorflow-metadata>=0.29.0,<0.30.0.
  • Depends on tfx-bsl>=0.29.0,<0.30.0.

Breaking Changes

  • Existing caches (for all analyzers) are automatically invalidated.

Deprecations

  • N/A

TensorFlow Transform 0.28.0

23 Feb 21:27
e851c82
Compare
Choose a tag to compare

Major Features and Improvements

  • Large vocabularies are now computed faster due to partially parallelizing
    VocabularyOrderAndWrite.

Bug Fixes and Other Changes

  • Generic tf.SparseTensor input support has been added to
    tft.scale_to_0_1, tft.scale_to_z_score, tft.scale_by_min_max,
    tft.min, tft.max, tft.mean, tft.var, tft.sum, tft.size and
    tft.word_count.
  • Optimize SavedModel written out by tf.Transform when using native TF2 to
    speed up loading it.
  • Added tft_beam.PTransformAnalyzer as a base PTransform class for
    tft.ptransform_analyzer users who wish to have access to a base temporary
    directory.
  • Fix an issue where >2D SparseTensors may be incorrectly represented in
    instance_dicts format.
  • Added support for out-of-vocabulary keys for per_key mappers.
  • Added tft.get_num_buckets_for_transformed_feature which provides the
    number of buckets for a transformed feature if it is a direct output of
    tft.bucketize, tft.apply_buckets, tft.compute_and_apply_vocabulary or
    tft.apply_vocabulary.
  • Depends on apache-beam[gcp]>=2.28,<3.
  • Depends on numpy>=1.16,<1.20.
  • Depends on tensorflow-metadata>=0.28.0,<0.29.0.
  • Depends on tfx-bsl>=0.28.1,<0.29.0.

Breaking changes

  • Autograph is disabled when the preprocessing fn is traced using tf.function
    when force_tf_compat_v1=False and TF2 behavior is enabled.

Deprecations

  • N/A

TensorFlow Transform 0.27.0

27 Jan 23:13
8187629
Compare
Choose a tag to compare

Major Features and Improvements

  • Added QuantilesCombiner.compact method that moves some amount of work done
    by tft.quantiles from non-parallelizable to parallelizable stage of the
    computation.

Bug Fixes and Other Changes

  • Strip only newlines instead of all whitespace in the TFTransformOutput
    vocabulary_by_name method.
  • Switch analyzers that output asset files to return an eager tensor
    containing the asset file path instead of a tf.saved_model.Asset object when
    force_tf_compat_v1=False. If this file is then used to initialize a table,
    this ensures the input to the tf.lookup.TextFileInitializer is the file
    path as the initializer handles wrapping this in a tf.saved_model.Asset
    object.
  • Added tft.annotate_asset for annotating asset files with a string key that
    can be used to retrieve them in tft.TFTransformOutput.
  • Depends on apache-beam[gcp]>=2.27,<3.
  • Depends on pyarrow>=1,<3.
  • Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<2.5.
  • Depends on tensorflow-metadata>=0.27.0,<0.28.0.
  • Depends on tfx-bsl>=0.27.0,<0.28.0.

Breaking changes

  • N/A

Deprecations

  • Parameter use_tfxio in the initializer of Context is removed (it was
    deprecated in 0.24.0).

TensorFlow Transform 0.26.0

16 Dec 21:15
1a65548
Compare
Choose a tag to compare

Major Features and Improvements

  • Initial support added of >2D SparseTensors as inputs and outputs of the
    preprocessing_fn. Note that mappers and analyzers may not support those
    yet, and output >2D SparseTensors will have an unkonwn dense shape.

Bug Fixes and Other Changes

  • Switched to calling tables and initializers within tf.init_scope when the
    preprocessing_fn is traced using tf.function to avoid re-initializing
    them on every invocation of the traced tf.function.
  • Switched to a (notably) faster and more accurate implementation of
    tft.quantiles analyzer.
  • Fix an issue where graphs become non-hermetic if a TF2 transform_fn is
    loaded in a TF1 Graph context, by making sure all assets are added to the
    ASSET_FILEPATHS collection.
  • Depends on apache-beam[gcp]>=2.25,!=2.26.*,<3.
  • Depends on pyarrow>=0.17,<0.18.
  • Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4.
  • Depends on tensorflow-metadata>=0.26.0,<0.27.0.
  • Depends on tfx-bsl>=0.26.0,<0.27.0.

Breaking changes

  • Existing tft.quantiles, tft.min and tft.max caches are invalidated.

Deprecations

  • Parameter always_return_num_quantiles of tft.quantiles and
    tft.bucketize is now deprecated. Both now always generate the requested
    number of buckets. Setting always_return_num_quantiles will have no effect
    and it will be removed in the next version.

TensorFlow Transform 0.25.0

04 Nov 22:48
6dd163b
Compare
Choose a tag to compare

Major Features and Improvements

  • Updated the "Getting Started" guide and examples to demonstrate the support
    for both the "instance dict" and the "TFXIO" format. Users are encouraged to
    start using the "TFXIO" format, expecially in cases where
    pre-canned TFXIO implementations
    is available as it offers better performance.

  • From this release TFT will also be hosting nightly packages on
    https://pypi-nightly.tensorflow.org. To install the nightly package use the
    following command:

    pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-transform
    

    Note: These nightly packages are unstable and breakages are likely to
    happen. The fix could often take a week or more depending on the complexity
    involved for the wheels to be available on the PyPI cloud service. You can
    always use the stable version of TFT available on PyPI by running the
    command pip install tensorflow-transform .

Bug Fixes and Other Changes

  • TFTransformOutput.transform_raw_features and TransformFeaturesLayer can
    be used when a transform fn is exported as a TF2 SavedModel and imported in
    graph mode.
  • Utility methods in tft.inspect_preprocessing_fn now take an optional
    parameter force_tf_compat_v1. If this is False, the preprocessing_fn is
    traced using tf.function in TF 2.x when TF 2 behaviors are enabled.
  • Switching to a wrapper for collections.namedtuple to ensure compatibility
    with PySpark which modifies classes produced by the factory.
  • Caching has been disabled for tft.tukey_h_params, tft.tukey_location and
    tft.tukey_scale due to the cached accumulator being non-deterministic.
  • Track variables created within the preprocessing_fn in the native TF 2
    implementation.
  • TFTransformOutput.transform_raw_features returns a wrapped python dict
    that overrides pop to return None instead of raising a KeyError when called
    with a key not found in the dictionary. This is done as preparation for
    switching the default value of drop_unused_features to True.
  • Vocabularies written in tfrecord_gzip format no longer filter out entries
    that are empty or that include a newline character.
  • Depends on apache-beam[gcp]>=2.25,<3.
  • Depends on tensorflow-metadata>=0.25,<0.26.
  • Depends on tfx-bsl>=0.25,<0.26.

Breaking changes

  • N/A

Deprecations

  • The decode method of the available coders (tft.coders.CsvCoder and
    tft.coders.ExampleProtoCoder) has been deprecated and removed.
    Canned TFXIO implementations
    should be used to read and decode data instead.

TensorFlow Transform 0.24.1

24 Sep 21:24
81143e7
Compare
Choose a tag to compare

Major Features and Improvements

  • N/A

Bug Fixes and Other Changes

  • Depends on apache-beam[gcp]>=2.24,<3.
  • Depends on tfx-bsl>=0.24.1,<0.25.

Breaking changes

  • N/A

Deprecations

  • N/A

TensorFlow Transform 0.24.0

14 Sep 20:34
0816ad8
Compare
Choose a tag to compare

Major Features and Improvements

  • Added native TF 2 implementation of Transform's Beam APIs -
    tft.AnalyzeDataset, tft.AnalyzeDatasetWithCache,
    tft.AnalyzeAndTransformDataset and tft.TransformDataset. The default
    behavior will continue to use Tensorflow's compat.v1 APIs. This can be
    overriden by setting tft.Context.force_tf_compat_v1=False. The default
    behavior for TF 2 users will be switched to the new native implementation in
    a future release.

Bug Fixes and Other Changes

  • Added a small fanout to analyzers' CombineGlobally for improved
    performance.
  • Depends on absl-py>=0.9,<0.11.
  • Depends on protobuf>=3.9.2,<4.
  • Depends on tensorflow-metadata>=0.24,<0.25.
  • Depends on tfx-bsl>=0.24,<0.25.

Breaking changes

  • N/A

Deprecations

  • Deprecating Py3.5 support.
  • Parameter use_tfxio in the initializer of Context is deprecated. TFT
    Beam APIs now accepts both "instance dicts" and "TFXIO" input formats.
    Setting it will have no effect and it will be removed in the next version.

Version 0.23.0

24 Aug 15:46
49c98bb
Compare
Choose a tag to compare

Major Features and Improvements

  • Added tft.scale_to_gaussian to transform input to standard gaussian.
  • Vocabulary related analyzers and mappers now accept a file_format argument
    allowing the vocabulary to be saved in TFRecord format. The default format
    remains text (TFRecord format requires tensorflow>=2.4).

Bug Fixes and Other Changes

  • Enable SavedModelLoader to import and apply TF2 SavedModels.
  • tft.min, tft.max, tft.sum, tft.covariance and tft.pca now have
    default output values to properly process empty analysis datasets.
  • tft.scale_by_min_max, tft.scale_to_0_1 and the corresponding per-key
    versions now apply a sigmoid function to scale tensors if the analysis
    dataset is either empty or contains a single distinct value.
  • Added best-effort tf.text op registration when loading transformation
    graphs.
  • Vocabularies computed over numerical features will now assign values to
    entries with equal frequency in reverse lexicographical order as well,
    similarly to string features.
  • Fixed an issue that causes the TABLE_INITIALIZERS graph collection to
    contain a tensor instead of an op when a TF2 SavedModel or a TF2 Hub Module
    containing a table is loaded inside the preprocessing_fn.
  • Fixes an issue where the output tensors of tft.TransformFeaturesLayer
    would all have unknown shapes.
  • Stopped depending on avro-python3.
  • Depends on apache-beam[gcp]>=2.23,<3.
  • Depends on tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4.
  • Depends on tensorflow-metadata>=0.23,<0.24.
  • Depends on tfx-bsl>=0.23,<0.24.

Breaking changes

  • Existing caches (for all analyzers) are automatically invalidated.

Deprecations

  • Deprecating Py2 support.
  • Note: We plan to remove Python 3.5 support after this release.