You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance improvements for vocabulary generation when using top_k.
Utility to deep-copy Beam PCollections was added to avoid unnecessary
materialization.
Utilize deep_copy to avoid unnecessary materialization of pcollections when
the input data is immutable. This feature is currently off by default and can
be enabled by setting tft.Context.use_deep_copy_optimization=True.
Add bucketize_per_key which computes separate quantiles for each key and then
bucketizes each value according to the quantiles computed for its key.
tft.scale_to_z_score is now implemented with a single pass over the data.
Export schema_utils package to convert from the tensorflow-metadata package
to the (soon to be deprecated) tf_metadata subpackage of tensorflow-transform.
Bug Fixes and Other Changes
Memory reduction during vocabulary generation.
Clarify documentation on return values from tft.compute_and_apply_vocabulary
and tft.string_to_int.
tft.unit now explicitly creates Beam PCollections and validates the
transformed dataset by writing and then reading it from disk.
tft.min, tft.size, tft.sum, tft.scale_to_z_score and tft.bucketize
now support tf.SparseTensor.
Fix to tft.scale_to_z_score so it no longer attempts to divide by 0 when the
variance is 0.
Fix bug where internal graph analysis didn't handle the case where an
operation has control inputs that are operations (as opposed to tensors).
tft.sparse_tensor_to_dense_with_shape added which allows densifying a SparseTensor while specifying the resulting Tensor's shape.
Add load_transform_graph method to TFTransformOutput to load the transform
graph without applying it. This has the effect of adding variables to the
checkpoint when calling it from the training input_fn when using tf.Estimator.
'tft.vocabulary' and 'tft.compute_and_apply_vocabulary' now accept an
optional weights argument. When weights is provided, weighted frequencies
are used instead of frequencies based on counts.
'tft.quantiles' and 'tft.bucketize' now accept an optoinal weights argument.
When weights is provided, weighted count is used for quantiles instead of
the counts themselves.
Updated examples to construct the schema using dataset_schema.from_feature_spec.
Updated the census example to allow the 'education-num' feature to be missing
and fill in a default value when it is.
Depends on tensorflow-metadata>=0.9,<1.
Depends on apache-beam[gcp]>=2.6,<3.
Breaking changes
We now validate a Schema in its constructor to make sure that it can be
converted to a feature spec. In particular only tf.int64, tf.string and tf.float32 types are allowed.
We now disallow default values for FixedColumnRepresentation.
It is no longer possible to set a default value in the Schema, and validation
of shape parameters will occur earlier.
Removed Schema.as_batched_placeholders() method.
Removed all components of DatasetMetadata except the schema, and removed all
related classes and code.
Removed the merge method for DatasetMetadata and related classes.
read_metadata can now only read from a single metadata directory and
read_metadata and write_metadata no longer accept the versions parameter.
They now only read/write the JSON format.
Requires pre-installed TensorFlow >=1.9,<2.
Deprecations
apply_function is no longer needed and is deprecated. apply_function(fn, *args) is now equivalent to fn(*args). tf.Transform
is able to handle while loops and tables without the user wrapping the
function call in apply_function.