Releases: tensorflow/transform
TensorFlow Transform 1.0.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.29,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<2.6
. - Depends on
tensorflow-metadata>=1.0.0,<1.1.0
. - Depends on
tfx-bsl>=1.0.0,<1.1.0
.
Breaking Changes
tft.ptransform_analyzer
has been moved undertft.experimental
. The order
of args in the API has also been changed.tft_beam.PTransformAnalyzer
has been moved undertft_beam.experimental
.- The default value of the
drop_unused_features
parameter to
TFTransformOutput.transform_raw_features
is now True.
Deprecations
- N/A
TensorFlow Transform 0.30.0
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Removed the
dataset_schema
module, most methods in it have been deprecated
since version 0.14. - Fix a bug where having an analyzer operate on the output of
tft.vocabulary
would cause it to evaluate incorrectly whenforce_tf_compat_v1=False
with
TF2 behaviors enabled. - Depends on
tensorflow-metadata>=0.30.0,<0.31.0
. - Depends on
tfx-bsl>=0.30.0,<0.31.0
.
Breaking Changes
DatasetMetadata
no longer accepts a dict as its input schema.schema
is
expected to be aSchema
proto now.- TF 1.15 specific APIs
apply_saved_model
and
apply_function_with_checkpoint
were removed from thetft
namespace. They
are still available under thepretrained_models
module. tft.AnalyzeDataset
,tft.AnalyzeDatasetWithCache
,
tft.AnalyzeAndTransformDataset
andtft.TransformDataset
will use the
native TF2 implementation of tf.transform unless TF2 behaviors are
explicitly disabled. The previous behaviour can still be obtained by setting
tft.Context.force_tf_compat_v1=True
.
Deprecations
- N/A
TensorFlow Transform 0.29.0
Major Features and Improvements
tft.AnalyzeAndTransformDataset
andtft.TransformDataset
can now output
pyarrow.RecordBatch
es. This is controlled by a parameter
output_record_batches
which is set toFalse
by default.
Bug Fixes and Other Changes
- Added
tft.make_and_track_object
to load and tracktf.Trackable
objects
created inside thepreprocessing_fn
(for example, tf.hub models). This API
should only be used whenforce_tf_compat_v1=False
and TF2 behavior is
enabled. - The
decode
method of the available coders (tft.coders.CsvCoder
and
tft.coders.ExampleProtoCoder
) have been removed. These were deprecated in
the 0.25 release.
Canned TFXIO implementations
should be used to read and decode data instead. - Previously deprecated APIs were removed:
tft.uniques
(replaced by
tft.vocabulary
),tft.string_to_int
(replaced by
tft.compute_and_apply_vocabulary
),tft.apply_vocab
(replaced by
tft.apply_vocabulary
), andtft.apply_function
(identity function). - Removed the
always_return_num_quantiles
arg oftft.quantiles
and
tft.bucketize
which was deprecated in version 0.26. - Added support for
count_params
method to theTransformFeaturesLayer
.
This will allow to call Keras Model'ssummary()
method if the model is
using theTransformFeaturesLayer
. - Depends on
absl-py>=0.9,<0.13
. - Depends on
tensorflow-metadata>=0.29.0,<0.30.0
. - Depends on
tfx-bsl>=0.29.0,<0.30.0
.
Breaking Changes
- Existing caches (for all analyzers) are automatically invalidated.
Deprecations
- N/A
TensorFlow Transform 0.28.0
Major Features and Improvements
- Large vocabularies are now computed faster due to partially parallelizing
VocabularyOrderAndWrite
.
Bug Fixes and Other Changes
- Generic
tf.SparseTensor
input support has been added to
tft.scale_to_0_1
,tft.scale_to_z_score
,tft.scale_by_min_max
,
tft.min
,tft.max
,tft.mean
,tft.var
,tft.sum
,tft.size
and
tft.word_count
. - Optimize SavedModel written out by
tf.Transform
when using native TF2 to
speed up loading it. - Added
tft_beam.PTransformAnalyzer
as a base PTransform class for
tft.ptransform_analyzer
users who wish to have access to a base temporary
directory. - Fix an issue where >2D
SparseTensor
s may be incorrectly represented in
instance_dicts format. - Added support for out-of-vocabulary keys for per_key mappers.
- Added
tft.get_num_buckets_for_transformed_feature
which provides the
number of buckets for a transformed feature if it is a direct output of
tft.bucketize
,tft.apply_buckets
,tft.compute_and_apply_vocabulary
or
tft.apply_vocabulary
. - Depends on
apache-beam[gcp]>=2.28,<3
. - Depends on
numpy>=1.16,<1.20
. - Depends on
tensorflow-metadata>=0.28.0,<0.29.0
. - Depends on
tfx-bsl>=0.28.1,<0.29.0
.
Breaking changes
- Autograph is disabled when the preprocessing fn is traced using tf.function
whenforce_tf_compat_v1=False
and TF2 behavior is enabled.
Deprecations
- N/A
TensorFlow Transform 0.27.0
Major Features and Improvements
- Added
QuantilesCombiner.compact
method that moves some amount of work done
bytft.quantiles
from non-parallelizable to parallelizable stage of the
computation.
Bug Fixes and Other Changes
- Strip only newlines instead of all whitespace in the TFTransformOutput
vocabulary_by_name method. - Switch analyzers that output asset files to return an eager tensor
containing the asset file path instead of a tf.saved_model.Asset object when
force_tf_compat_v1=False
. If this file is then used to initialize a table,
this ensures the input to thetf.lookup.TextFileInitializer
is the file
path as the initializer handles wrapping this in atf.saved_model.Asset
object. - Added
tft.annotate_asset
for annotating asset files with a string key that
can be used to retrieve them intft.TFTransformOutput
. - Depends on
apache-beam[gcp]>=2.27,<3
. - Depends on
pyarrow>=1,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,<2.5
. - Depends on
tensorflow-metadata>=0.27.0,<0.28.0
. - Depends on
tfx-bsl>=0.27.0,<0.28.0
.
Breaking changes
- N/A
Deprecations
- Parameter
use_tfxio
in the initializer ofContext
is removed (it was
deprecated in 0.24.0).
TensorFlow Transform 0.26.0
Major Features and Improvements
- Initial support added of >2D
SparseTensor
s as inputs and outputs of the
preprocessing_fn
. Note that mappers and analyzers may not support those
yet, and output >2DSparseTensor
s will have an unkonwn dense shape.
Bug Fixes and Other Changes
- Switched to calling tables and initializers within
tf.init_scope
when the
preprocessing_fn
is traced usingtf.function
to avoid re-initializing
them on every invocation of the tracedtf.function
. - Switched to a (notably) faster and more accurate implementation of
tft.quantiles
analyzer. - Fix an issue where graphs become non-hermetic if a TF2 transform_fn is
loaded in a TF1 Graph context, by making sure all assets are added to the
ASSET_FILEPATHS
collection. - Depends on
apache-beam[gcp]>=2.25,!=2.26.*,<3
. - Depends on
pyarrow>=0.17,<0.18
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4
. - Depends on
tensorflow-metadata>=0.26.0,<0.27.0
. - Depends on
tfx-bsl>=0.26.0,<0.27.0
.
Breaking changes
- Existing
tft.quantiles
,tft.min
andtft.max
caches are invalidated.
Deprecations
- Parameter
always_return_num_quantiles
oftft.quantiles
and
tft.bucketize
is now deprecated. Both now always generate the requested
number of buckets. Settingalways_return_num_quantiles
will have no effect
and it will be removed in the next version.
TensorFlow Transform 0.25.0
Major Features and Improvements
-
Updated the "Getting Started" guide and examples to demonstrate the support
for both the "instance dict" and the "TFXIO" format. Users are encouraged to
start using the "TFXIO" format, expecially in cases where
pre-canned TFXIO implementations
is available as it offers better performance. -
From this release TFT will also be hosting nightly packages on
https://pypi-nightly.tensorflow.org. To install the nightly package use the
following command:pip install -i https://pypi-nightly.tensorflow.org/simple tensorflow-transform
Note: These nightly packages are unstable and breakages are likely to
happen. The fix could often take a week or more depending on the complexity
involved for the wheels to be available on the PyPI cloud service. You can
always use the stable version of TFT available on PyPI by running the
commandpip install tensorflow-transform
.
Bug Fixes and Other Changes
TFTransformOutput.transform_raw_features
andTransformFeaturesLayer
can
be used when a transform fn is exported as a TF2 SavedModel and imported in
graph mode.- Utility methods in
tft.inspect_preprocessing_fn
now take an optional
parameterforce_tf_compat_v1
. If this is False, thepreprocessing_fn
is
traced using tf.function in TF 2.x when TF 2 behaviors are enabled. - Switching to a wrapper for
collections.namedtuple
to ensure compatibility
with PySpark which modifies classes produced by the factory. - Caching has been disabled for
tft.tukey_h_params
,tft.tukey_location
and
tft.tukey_scale
due to the cached accumulator being non-deterministic. - Track variables created within the
preprocessing_fn
in the native TF 2
implementation. TFTransformOutput.transform_raw_features
returns a wrapped python dict
that overrides pop to return None instead of raising a KeyError when called
with a key not found in the dictionary. This is done as preparation for
switching the default value ofdrop_unused_features
to True.- Vocabularies written in
tfrecord_gzip
format no longer filter out entries
that are empty or that include a newline character. - Depends on
apache-beam[gcp]>=2.25,<3
. - Depends on
tensorflow-metadata>=0.25,<0.26
. - Depends on
tfx-bsl>=0.25,<0.26
.
Breaking changes
- N/A
Deprecations
- The
decode
method of the available coders (tft.coders.CsvCoder
and
tft.coders.ExampleProtoCoder
) has been deprecated and removed.
Canned TFXIO implementations
should be used to read and decode data instead.
TensorFlow Transform 0.24.1
Major Features and Improvements
- N/A
Bug Fixes and Other Changes
- Depends on
apache-beam[gcp]>=2.24,<3
. - Depends on
tfx-bsl>=0.24.1,<0.25
.
Breaking changes
- N/A
Deprecations
- N/A
TensorFlow Transform 0.24.0
Major Features and Improvements
- Added native TF 2 implementation of Transform's Beam APIs -
tft.AnalyzeDataset
,tft.AnalyzeDatasetWithCache
,
tft.AnalyzeAndTransformDataset
andtft.TransformDataset
. The default
behavior will continue to use Tensorflow's compat.v1 APIs. This can be
overriden by settingtft.Context.force_tf_compat_v1=False
. The default
behavior for TF 2 users will be switched to the new native implementation in
a future release.
Bug Fixes and Other Changes
- Added a small fanout to analyzers'
CombineGlobally
for improved
performance. - Depends on
absl-py>=0.9,<0.11
. - Depends on
protobuf>=3.9.2,<4
. - Depends on
tensorflow-metadata>=0.24,<0.25
. - Depends on
tfx-bsl>=0.24,<0.25
.
Breaking changes
- N/A
Deprecations
- Deprecating Py3.5 support.
- Parameter
use_tfxio
in the initializer ofContext
is deprecated. TFT
Beam APIs now accepts both "instance dicts" and "TFXIO" input formats.
Setting it will have no effect and it will be removed in the next version.
Version 0.23.0
Major Features and Improvements
- Added
tft.scale_to_gaussian
to transform input to standard gaussian. - Vocabulary related analyzers and mappers now accept a
file_format
argument
allowing the vocabulary to be saved in TFRecord format. The default format
remains text (TFRecord format requires tensorflow>=2.4).
Bug Fixes and Other Changes
- Enable
SavedModelLoader
to import and apply TF2 SavedModels. tft.min
,tft.max
,tft.sum
,tft.covariance
andtft.pca
now have
default output values to properly process empty analysis datasets.tft.scale_by_min_max
,tft.scale_to_0_1
and the corresponding per-key
versions now apply a sigmoid function to scale tensors if the analysis
dataset is either empty or contains a single distinct value.- Added best-effort tf.text op registration when loading transformation
graphs. - Vocabularies computed over numerical features will now assign values to
entries with equal frequency in reverse lexicographical order as well,
similarly to string features. - Fixed an issue that causes the
TABLE_INITIALIZERS
graph collection to
contain a tensor instead of an op when a TF2 SavedModel or a TF2 Hub Module
containing a table is loaded inside thepreprocessing_fn
. - Fixes an issue where the output tensors of
tft.TransformFeaturesLayer
would all have unknown shapes. - Stopped depending on
avro-python3
. - Depends on
apache-beam[gcp]>=2.23,<3
. - Depends on
tensorflow>=1.15.2,!=2.0.*,!=2.1.*,!=2.2.*,<2.4
. - Depends on
tensorflow-metadata>=0.23,<0.24
. - Depends on
tfx-bsl>=0.23,<0.24
.
Breaking changes
- Existing caches (for all analyzers) are automatically invalidated.
Deprecations
- Deprecating Py2 support.
- Note: We plan to remove Python 3.5 support after this release.