Skip to content

Commit

Permalink
Merge pull request #191 from fjetter/bugfix/ktk_hash_bucket_in_schema
Browse files Browse the repository at this point in the history
Remove _KTK_HASH_BUCKET if exists
  • Loading branch information
fjetter authored Dec 17, 2019
2 parents 08a3aec + 43be3c6 commit cec1cb1
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 3 deletions.
11 changes: 8 additions & 3 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,18 @@
Changelog
=========

Version Unreleased
==================
Version 3.6.2 (2019-12-17)
==========================

Improvements
^^^^^^^^^^^^

- Add more explicit typing to :mod:`kartothek.io.eager`.
* Add more explicit typing to :mod:`kartothek.io.eager`.

Bug fixes
^^^^^^^^^
* Fix an issue where :func:`~kartothek.io.dask.dataframe.update_dataset_from_ddf` would create a column named "_KTK_HASH_BUCKET" in the dataset


Version 3.6.1 (2019-12-11)
==========================
Expand Down
2 changes: 2 additions & 0 deletions kartothek/io/dask/_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ def _store_partition(
df_serializer,
metadata_version,
):
if _KTK_HASH_BUCKET in df:
df = df.drop(_KTK_HASH_BUCKET, axis=1)
store = store_factory()
# I don't have access to the group values
mps = parse_input_to_metapartition(
Expand Down
6 changes: 6 additions & 0 deletions tests/io/dask/dataframe/test_update.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,12 @@ def test_update_shuffle_buckets(
range(unique_secondaries)
)

assert set(dataset.table_meta["core"].names) == {
"primary",
"secondary",
"sorted_column",
}

factory = DatasetFactory("output_dataset_uuid", store_factory)
factory.load_all_indices()

Expand Down

0 comments on commit cec1cb1

Please sign in to comment.