Woodwork 0.17.2 compatibility :( #3626

chukarsten · 2022-07-25T17:16:47Z

All the work required to get EvalML compatible with Woodwork 0.17.2.

…ormer.py.

codecov · 2022-07-27T17:24:07Z

Codecov Report

Merging #3626 (69156a1) into main (987cdcc) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3626     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        335     335             
  Lines      33750   33839     +89     
=======================================
+ Hits       33627   33714     +87     
- Misses       123     125      +2

Impacted Files	Coverage Δ
.../tests/component_tests/test_datetime_featurizer.py	`100.0% <ø> (ø)`
.../component_tests/test_drop_nan_rows_transformer.py	`100.0% <ø> (ø)`
evalml/tests/data_checks_tests/test_data_checks.py	`100.0% <ø> (ø)`
evalml/tests/pipeline_tests/test_pipeline_utils.py	`99.8% <ø> (ø)`
evalml/model_understanding/metrics.py	`100.0% <100.0%> (ø)`
evalml/pipelines/classification_pipeline.py	`100.0% <100.0%> (ø)`
...omponents/estimators/regressors/arima_regressor.py	`100.0% <100.0%> (ø)`
...onents/estimators/regressors/catboost_regressor.py	`100.0% <100.0%> (ø)`
...onents/estimators/regressors/lightgbm_regressor.py	`100.0% <100.0%> (ø)`
...elines/components/transformers/imputers/imputer.py	`100.0% <100.0%> (ø)`
... and 28 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

…ble types temporary change to ClassificationPipeline.

…_nullable_types_builds_pipelines.

…uter to, column by column, use the woodwork accessor to set the schema.

…x/evalml into latest-dep-update-d117391

ParthivNaresh

Awesome work man, this was truly next level finessing. I think some outstanding questions/issues are:

Why is the ClassImbalanceDataCheck potentially casting 2.0 to 2?
Issue has been filed to keep pseudo floats like 2.0, 12.0, -4.0, etc as Double instead of Integer or IntegerNullable.
Issue has been filed for support of multiple column assignments
Issue has been filed for no exception when casting to a Boolean.
Issue has been filed to match ww.init() behaviour across DataFrames and Series.

ParthivNaresh · 2022-08-06T13:27:15Z

evalml/pipelines/components/transformers/imputers/imputer.py

@@ -161,7 +161,8 @@ def transform(self, X, y=None):
        if self._numeric_cols is not None and len(self._numeric_cols) > 0:
            X_numeric = X.ww[self._numeric_cols.tolist()]
            imputed = self._numeric_imputer.transform(X_numeric)
-            X_no_all_null[X_numeric.columns] = imputed
+            for numeric_col in X_numeric.columns:


I've filed it

ParthivNaresh · 2022-08-06T13:35:10Z

evalml/pipelines/components/transformers/imputers/time_series_imputer.py

            imputed.bfill(inplace=True)  # Fill in the first value, if missing
            X_not_all_null[X_interpolate.columns] = imputed
+        X_not_all_null.ww.init(schema=X_schema)


Reinitializes the dataframe with the original schema excluding IntegerNullable and BooleanNullable types so that they can be reinferred post imputation

ParthivNaresh · 2022-08-06T13:49:35Z

evalml/pipelines/components/transformers/imputers/time_series_imputer.py

                y_imputed.bfill(inplace=True)
-            y_imputed.ww.init(schema=y.ww.schema)
-
-        X_not_all_null.ww.init(schema=X_schema)


Covered as part of test_numeric_only_input and test_imputer_bool_dtype_object

ParthivNaresh · 2022-08-06T13:52:37Z

evalml/pipelines/components/transformers/preprocessing/time_series_regularizer.py

@@ -311,5 +312,8 @@ def transform(self, X, y=None):

        if cleaned_y is not None:
            cleaned_y = cleaned_y["target"]
+            cleaned_y = ww.init_series(cleaned_y)
+
+        cleaned_x.ww.init()


Introduction of nulls makes initialization necessary here

ParthivNaresh · 2022-08-06T13:55:38Z

evalml/pipelines/components/utils.py

@@ -380,3 +381,27 @@ def make_balancing_dictionary(y, sampling_ratio):
            # this class is already larger than the ratio, don't change
            class_dic[index] = value_counts[index]
    return class_dic
+
+
+def downcast_int_nullable_to_double(X):


A function that helps with some components not accepting an IntegerArray or being unable to cast values from a float to an int

ParthivNaresh · 2022-08-07T14:43:22Z

evalml/tests/component_tests/test_imputer.py

@@ -88,14 +90,14 @@ def test_numeric_only_input(imputer_test_data):
    expected = pd.DataFrame(
        {
            "int col": [0, 1, 2, 0, 3] * 4,
-            "float col": [0.0, 1.0, 0.0, -2.0, 5.0] * 4,
+            "float col": [0.1, 1.0, 0.0, -2.0, 5.0] * 4,


ParthivNaresh · 2022-08-07T14:45:08Z

evalml/tests/conftest.py

@@ -1982,7 +1982,7 @@ def imputer_test_data():
            ),
            "int col": [0, 1, 2, 0, 3] * 4,
            "object col": ["b", "b", "a", "c", "d"] * 4,
-            "float col": [0.0, 1.0, 0.0, -2.0, 5.0] * 4,
+            "float col": [0.1, 1.0, 0.0, -2.0, 5.0] * 4,


…evalml into ww_0.17.0_compatibility

eccabay

LGTM! Just left some small efficiency and clarification questions.

eccabay · 2022-08-08T14:02:36Z

evalml/tests/automl_tests/test_automl.py

@@ -4016,7 +4016,7 @@ def test_automl_baseline_pipeline_predictions_and_scores_time_series(problem_typ
        expected_predictions = pd.Series(expected_predictions, name="target_delay_1")

    preds = baseline.predict(X_validation, None, X_train, y_train)
-    pd.testing.assert_series_equal(expected_predictions, preds)
+    pd.testing.assert_series_equal(expected_predictions, preds, check_dtype=False)


This worries me slightly - are there any scenarios where this would cause us issues down the road?

evalml/pipelines/components/utils.py

evalml/pipelines/components/transformers/samplers/base_sampler.py

eccabay · 2022-08-08T14:12:22Z

evalml/pipelines/components/transformers/preprocessing/time_series_featurizer.py

@@ -288,6 +288,7 @@ def transform(self, X, y=None):
        delayed_features = self._compute_delays(X_ww, y)
        rolling_means = self._compute_rolling_transforms(X_ww, y, original_features)
        features = ww.concat_columns([delayed_features, rolling_means])
+        features.ww.init()


Can we reuse any part of the initial schema or use what we know about the dtypes of these features here to reduce the amount of type reinference this might introduce?

evalml/pipelines/classification_pipeline.py

eccabay · 2022-08-08T14:16:35Z

evalml/pipelines/classification_pipeline.py

@@ -66,7 +66,16 @@ def fit(self, X, y):
            )

        self._fit(X, y)
-        self._classes_ = list(ww.init_series(np.unique(y)))
+
+        # TODO: Added this in because numpy's unique() does not support pandas.NA


If there's a workaround for this error, why do we start off by attempting to use numpy? Are there downsides to just using y.unique() in all cases instead?

jeremyliweishih

@chukarsten this looks good to me! Apologies for all the "did we file an issue" comments. Just wanted to make sure we're keep track of removing these fixes once the appropriate WW changes are in. 😄

jeremyliweishih · 2022-08-08T14:15:09Z

evalml/pipelines/classification_pipeline.py

@@ -66,7 +66,16 @@ def fit(self, X, y):
            )

        self._fit(X, y)
-        self._classes_ = list(ww.init_series(np.unique(y)))
+
+        # TODO: Added this in because numpy's unique() does not support pandas.NA


do we have an issue filed to resolve this?

jeremyliweishih · 2022-08-08T14:32:16Z

evalml/pipelines/components/transformers/imputers/time_series_imputer.py

@@ -165,7 +166,10 @@ def transform(self, X, y=None):

        if self._interpolate_cols is not None:
            X_interpolate = X.ww[self._interpolate_cols]
-            imputed = X_interpolate.interpolate()
+            # TODO: Revert when pandas introduces Float64 dtype


do we have an issue filed to track this?

evalml/tests/automl_tests/parallel_tests/test_automl_dask.py

evalml/tests/automl_tests/parallel_tests/test_cf_engine.py

jeremyliweishih · 2022-08-08T14:44:13Z

evalml/tests/automl_tests/test_automl.py

@@ -4016,7 +4016,7 @@ def test_automl_baseline_pipeline_predictions_and_scores_time_series(problem_typ
        expected_predictions = pd.Series(expected_predictions, name="target_delay_1")

    preds = baseline.predict(X_validation, None, X_train, y_train)
-    pd.testing.assert_series_equal(expected_predictions, preds)
+    pd.testing.assert_series_equal(expected_predictions, preds, check_dtype=False)


a little confused here - is preds coming out as an integers here and why if so?

jeremyliweishih · 2022-08-08T14:51:30Z

evalml/tests/component_tests/test_time_series_featurizer.py

@@ -328,6 +343,7 @@ def test_delayed_feature_extractor_numpy(mock_roll, delayed_features_data):
            "target_delay_11": y_answer.shift(11),
        },
    )
+    answer_only_y.ww.init()


was the new ww.init call in TimeSeriesFeaturizer in response to this? If not should we file an issue to cover this?

* Revert "Woodwork 0.17.2 compatibility :( (#3626)" This reverts commit 12ec98e. * Updated the release and pandas version. Co-authored-by: Karsten Chu <[email protected]>

chukarsten added 10 commits July 27, 2022 12:40

Initial.

c41d1de

Fixed test_datetime_featurizer_with_inconsistent_date_format.

3b18083

Fixed test_drop_nan_rows_transformer.py test_drop_null_columns_transf…

97157e0

…ormer.py.

Fixed test_drop_null_columns_transformer.py.

6dfd577

Fixed test_drop_null_columns_transformer.py

5bbf65d

Fixed test_imputer.py

70d038c

Fixed test_fit_transform_drop_all_nan_columns

43ece37

Updated to Woodwork 0.17.0

638705d

Fixed test_per_column_imputer.py

4e47f78

Fixed imputers.

a3aa645

chukarsten force-pushed the ww_0.17.0_compatibility branch from a253784 to a3aa645 Compare July 27, 2022 16:41

Fixed test_automl.py

7cdeb1d

chukarsten added 6 commits July 27, 2022 13:52

Adjusted test_imputer.py to match the imputer test data.

b50ad59

Adjusted test_imputer.py to match the imputer test data.

c0a7566

Fixed test_simple_imputer.py again.

ca24080

Release.

a55ec4f

Latest deps and conda.

e294000

Fixed test_target_imputer.py

c69ec06

chukarsten changed the title ~~Ww 0.17.0 compatibility~~ Woodwork 0.17.0 compatibility :( Jul 27, 2022

chukarsten added 10 commits July 27, 2022 22:00

Fixed test_time_series_featurizer.py

263b9a0

Fixed the time_series_imputer some more.

54ec0cc

Fixed test_class_imabalance_data_check

625681b

Fixed test_data_checks.

bbb6820

Temporarily addressed test_no_variance_data_check.

2a92d38

Fixed test_data_checks_and_actions_integration which involved a nulla…

79cb5c7

…ble types temporary change to ClassificationPipeline.

Fixed test_data_checks_and_actions_integration.

3c1ebee

Removed the differentiation between pandas and woodwork tests in test…

3bc8191

…_nullable_types_builds_pipelines.

Fixed the test_email_url_whatever...involved having to modify the Imp…

a6db357

…uter to, column by column, use the woodwork accessor to set the schema.

Modified the target_imputer.py to properly transform the target dtypes.

573fc6f

ParthivNaresh added 10 commits August 5, 2022 13:32

test fix

860ba3a

Merge branch 'latest-dep-update-d117391' of https://github.com/altery…

ce7af81

…x/evalml into latest-dep-update-d117391

Merge branch 'latest-dep-update-d117391' into ww_0.17.0_compatibility

e3649ad

fix docs

dff1c4a

update min woodwork

f324962

update woodwork version

49f2d5d

test coverage

2412b2c

lint

64a63ee

Merge branch 'main' into ww_0.17.0_compatibility

022e4f7

Merge branch 'main' into ww_0.17.0_compatibility

8962f70

ParthivNaresh changed the title ~~Woodwork 0.17.0 compatibility :(~~ Woodwork 0.17.2 compatibility :( Aug 6, 2022

ParthivNaresh approved these changes Aug 7, 2022

View reviewed changes

ParthivNaresh added 2 commits August 7, 2022 10:59

update release notes and ts imputer test

d8bd307

Merge branch 'ww_0.17.0_compatibility' of https://github.com/alteryx/…

c86479c

…evalml into ww_0.17.0_compatibility

ParthivNaresh marked this pull request as ready for review August 7, 2022 16:12

auto-assign bot assigned chukarsten Aug 7, 2022

eccabay approved these changes Aug 8, 2022

View reviewed changes

jeremyliweishih approved these changes Aug 8, 2022

View reviewed changes

chukarsten and others added 3 commits August 8, 2022 12:25

Swapped ww init with infer_feature_types.

e7d6ff0

Updated base_sampler to pass the current schema forward.

57366ea

Merge branch 'main' into ww_0.17.0_compatibility

69156a1

This was referenced Aug 9, 2022

Revert Interpolate whenf Float64 Introduced #3648

Open

Revert after numpy supports pandas.NA #3649

Open

Audit the Unit Test Suite for Woodwork Inference Sensitivity #3651

Closed

Re-enable Support for Dask/Concurrent Futures Engines #3652

Closed

chukarsten merged commit 12ec98e into main Aug 10, 2022

chukarsten deleted the ww_0.17.0_compatibility branch August 10, 2022 03:37

chukarsten mentioned this pull request Aug 10, 2022

Release v0.56.0 #3653

Merged

ParthivNaresh restored the ww_0.17.0_compatibility branch August 22, 2022 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Woodwork 0.17.2 compatibility :( #3626

Woodwork 0.17.2 compatibility :( #3626

chukarsten commented Jul 25, 2022 •

edited

Loading

codecov bot commented Jul 27, 2022 •

edited

Loading

ParthivNaresh left a comment

ParthivNaresh Aug 6, 2022

ParthivNaresh Aug 6, 2022

ParthivNaresh Aug 6, 2022

ParthivNaresh Aug 6, 2022

ParthivNaresh Aug 6, 2022

ParthivNaresh Aug 7, 2022

ParthivNaresh Aug 7, 2022

eccabay left a comment

eccabay Aug 8, 2022

eccabay Aug 8, 2022

eccabay Aug 8, 2022

jeremyliweishih left a comment

jeremyliweishih Aug 8, 2022

jeremyliweishih Aug 8, 2022

jeremyliweishih Aug 8, 2022

jeremyliweishih Aug 8, 2022

Woodwork 0.17.2 compatibility :( #3626

Woodwork 0.17.2 compatibility :( #3626

Conversation

chukarsten commented Jul 25, 2022 • edited Loading

codecov bot commented Jul 27, 2022 • edited Loading

Codecov Report

ParthivNaresh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eccabay left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chukarsten commented Jul 25, 2022 •

edited

Loading

codecov bot commented Jul 27, 2022 •

edited

Loading