Skip to content

Release 0.9.0

Compare
Choose a tag to compare
@zoyahav zoyahav released this 06 Sep 20:46
· 824 commits to master since this release

Major Features and Improvements

  • Performance improvements for vocabulary generation when using top_k.
  • Utility to deep-copy Beam PCollections was added to avoid unnecessary
    materialization.
  • Utilize deep_copy to avoid unnecessary materialization of pcollections when
    the input data is immutable. This feature is currently off by default and can
    be enabled by setting tft.Context.use_deep_copy_optimization=True.
  • Add bucketize_per_key which computes separate quantiles for each key and then
    bucketizes each value according to the quantiles computed for its key.
  • tft.scale_to_z_score is now implemented with a single pass over the data.
  • Export schema_utils package to convert from the tensorflow-metadata package
    to the (soon to be deprecated) tf_metadata subpackage of
    tensorflow-transform.

Bug Fixes and Other Changes

  • Memory reduction during vocabulary generation.
  • Clarify documentation on return values from tft.compute_and_apply_vocabulary
    and tft.string_to_int.
  • tft.unit now explicitly creates Beam PCollections and validates the
    transformed dataset by writing and then reading it from disk.
  • tft.min, tft.size, tft.sum, tft.scale_to_z_score and tft.bucketize
    now support tf.SparseTensor.
  • Fix to tft.scale_to_z_score so it no longer attempts to divide by 0 when the
    variance is 0.
  • Fix bug where internal graph analysis didn't handle the case where an
    operation has control inputs that are operations (as opposed to tensors).
  • tft.sparse_tensor_to_dense_with_shape added which allows densifying a
    SparseTensor while specifying the resulting Tensor's shape.
  • Add load_transform_graph method to TFTransformOutput to load the transform
    graph without applying it. This has the effect of adding variables to the
    checkpoint when calling it from the training input_fn when using
    tf.Estimator.
  • 'tft.vocabulary' and 'tft.compute_and_apply_vocabulary' now accept an
    optional weights argument. When weights is provided, weighted frequencies
    are used instead of frequencies based on counts.
  • 'tft.quantiles' and 'tft.bucketize' now accept an optoinal weights argument.
    When weights is provided, weighted count is used for quantiles instead of
    the counts themselves.
  • Updated examples to construct the schema using
    dataset_schema.from_feature_spec.
  • Updated the census example to allow the 'education-num' feature to be missing
    and fill in a default value when it is.
  • Depends on tensorflow-metadata>=0.9,<1.
  • Depends on apache-beam[gcp]>=2.6,<3.

Breaking changes

  • We now validate a Schema in its constructor to make sure that it can be
    converted to a feature spec. In particular only tf.int64, tf.string and
    tf.float32 types are allowed.
  • We now disallow default values for FixedColumnRepresentation.
  • It is no longer possible to set a default value in the Schema, and validation
    of shape parameters will occur earlier.
  • Removed Schema.as_batched_placeholders() method.
  • Removed all components of DatasetMetadata except the schema, and removed all
    related classes and code.
  • Removed the merge method for DatasetMetadata and related classes.
  • read_metadata can now only read from a single metadata directory and
    read_metadata and write_metadata no longer accept the versions parameter.
    They now only read/write the JSON format.
  • Requires pre-installed TensorFlow >=1.9,<2.

Deprecations

  • apply_function is no longer needed and is deprecated.
    apply_function(fn, *args) is now equivalent to fn(*args). tf.Transform
    is able to handle while loops and tables without the user wrapping the
    function call in apply_function.