Added warning , if column is multi categorical in h2o.cor() #12903 #15674

Mohit1345 · 2023-08-03T20:14:49Z

It will return "NA" if multi categorical columns are passed in it.
Modified in both R and python h2o.cor() method.

tomasfryda

Thank you @Mohit1345. It will need a bit more changes before merging. Please don't be discouraged by the requested changes, h2o's code takes some time to get used to.

tomasfryda · 2023-08-18T08:01:22Z

h2o-py/h2o/frame.py

-        if y is None:
-            y = self
-        if use is None: use = "complete.obs" if na_rm else "everything"


This shouldn't be deleted.

tomasfryda · 2023-08-18T08:06:58Z

h2o-py/h2o/frame.py

-        if y is None:
-            y = self
-        if use is None: use = "complete.obs" if na_rm else "everything"
+        y_categorical = any(self.types[col_name] == "enum" for col_name in y)


This seems incorrect - y H2OFrame not a list of columns. Also you can use y.isfactor() to check if it's categorical - the output is a list of boolean values in the same order as are the columns.

tomasfryda · 2023-08-18T08:10:23Z

h2o-py/h2o/frame.py

+        y_categorical = any(self.types[col_name] == "enum" for col_name in y)
+
+        if y_categorical:
+            num_unique_levels = {col: len(self[col].levels()) for col in y}


Instead of len(self[col].levels()) use self[col].nlevels()[0]. nlevels returns the just a list of cardinalities so there is less communication with the backend and lower memory use in the python client. Also since the y is an H2OFrame (see the assert on the line 3182) you can use something like dict(zip(y.columns, y.nlevels())) to get the same thing.

tomasfryda · 2023-08-18T08:15:32Z

h2o-py/h2o/frame.py

+
+        if multi_categorical:
+            import warnings
+            warnings.warn("NA")


Please make the warning more informative.

For example:

for col, card in num_unique_levels.items(): if card > 2: warnings.warn("Column {} contains {} levels. Only numerical and binary columns are supported.".format(col, card))

tomasfryda · 2023-08-18T08:16:26Z

h2o-r/h2o-package/R/frame.R


+  if ((x_categorical && length(unique(h2o.levels(x))) > 2) || (y_categorical && length(unique(h2o.levels(y))) > 2)) {
+      warning("NA")


Please make the warning more informative.

Mohit1345 · 2023-08-19T14:42:19Z

Thank you @Mohit1345. It will need a bit more changes before merging. Please don't be discouraged by the requested changes, h2o's code takes some time to get used to.

Sure, will try to make changes

Devanshusisodiya · 2024-02-11T15:40:58Z

Hi @tomasfryda @wendycwong , I have raised a PR for this issue, please take a look at it here

Devanshusisodiya · 2024-02-13T05:46:15Z

Hi @tomasfryda @wendycwong please review this PR #16070

Mohit1345 added 4 commits August 4, 2023 00:35

added warning if User Passes Categorical Columns to h2o.cor()

f7b497e

replaced warning text to NA

e053578

replaced frame.R warning text

276864f

no warning for binary categorical

f2b7654

wendycwong requested a review from tomasfryda August 16, 2023 19:39

tomasfryda requested changes Aug 18, 2023

View reviewed changes

Devanshusisodiya mentioned this pull request Feb 11, 2024

Added fix for warning if column is multi categorical in h2o.cor() #16070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added warning , if column is multi categorical in h2o.cor() #12903 #15674

Added warning , if column is multi categorical in h2o.cor() #12903 #15674

Mohit1345 commented Aug 3, 2023

tomasfryda left a comment

tomasfryda Aug 18, 2023

tomasfryda Aug 18, 2023

tomasfryda Aug 18, 2023

tomasfryda Aug 18, 2023

tomasfryda Aug 18, 2023

Mohit1345 commented Aug 19, 2023

Devanshusisodiya commented Feb 11, 2024 •

edited

Loading

Devanshusisodiya commented Feb 13, 2024


		if ((x_categorical && length(unique(h2o.levels(x))) > 2) \|\| (y_categorical && length(unique(h2o.levels(y))) > 2)) {
		warning("NA")

Added warning , if column is multi categorical in h2o.cor() #12903 #15674

Are you sure you want to change the base?

Added warning , if column is multi categorical in h2o.cor() #12903 #15674

Conversation

Mohit1345 commented Aug 3, 2023

tomasfryda left a comment

Choose a reason for hiding this comment

tomasfryda Aug 18, 2023

Choose a reason for hiding this comment

tomasfryda Aug 18, 2023

Choose a reason for hiding this comment

tomasfryda Aug 18, 2023

Choose a reason for hiding this comment

tomasfryda Aug 18, 2023

Choose a reason for hiding this comment

tomasfryda Aug 18, 2023

Choose a reason for hiding this comment

Mohit1345 commented Aug 19, 2023

Devanshusisodiya commented Feb 11, 2024 • edited Loading

Devanshusisodiya commented Feb 13, 2024

Devanshusisodiya commented Feb 11, 2024 •

edited

Loading