From bab2a4db7d4ee66ab2bc10bf0809d3b67641ee16 Mon Sep 17 00:00:00 2001 From: Veronika Maurerova Date: Thu, 26 Oct 2023 10:58:53 +0200 Subject: [PATCH 1/4] Add doc --- h2o-docs/src/product/data-science/gbm.rst | 23 +++++++++++++++++++ h2o-docs/src/product/data-science/xgboost.rst | 23 +++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/h2o-docs/src/product/data-science/gbm.rst b/h2o-docs/src/product/data-science/gbm.rst index abadc2a86db9..e89e7e12c30d 100644 --- a/h2o-docs/src/product/data-science/gbm.rst +++ b/h2o-docs/src/product/data-science/gbm.rst @@ -358,6 +358,21 @@ Metrics Usage is illustrated in the Examples section. +GBM Friedman and Popescu's H statistics +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. + +H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors. + +This statistic can be calculated only for numerical variables. Missing values are supported. + +See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1. + +Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R) + +Usage is illustrated in the Examples section. + Examples ~~~~~~~~ @@ -394,6 +409,10 @@ Below is a simple example showing how to build a Gradient Boosting Machine model # Extract feature interactions: feature_interactions <- h2o.feature_interaction(pros_gbm) + # Get Friedman and Popescu's H statistics + h <- h2o.h(pros_gbm, prostate, c('DPROS','DCAPS')) + print(h) + .. code-tab:: python @@ -424,6 +443,10 @@ Below is a simple example showing how to build a Gradient Boosting Machine model # Extract feature interactions: feature_interactions = pros_gbm.feature_interaction() + # Get Friedman and Popescu's H statistics + h = pros_gbm.h(prostate_train, ['DPROS','DCAPS']) + print(h) + .. code-tab:: scala diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst index 960952be1346..9c40f56316df 100644 --- a/h2o-docs/src/product/data-science/xgboost.rst +++ b/h2o-docs/src/product/data-science/xgboost.rst @@ -373,6 +373,21 @@ Metrics Usage is illustrated in the Examples section. +XGBoost Friedman and Popescu's H statistics +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. + +H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors. + +This statistic can be calculated only for numerical variables. Missing values are supported. + +See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1. + +Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R) + +Usage is illustrated in the Examples section.b + Examples ~~~~~~~~ @@ -415,6 +430,10 @@ Below is a simple example showing how to build a XGBoost model. # Extract feature interactions: feature_interactions = h2o.feature_interaction(titanic_xgb) + # Get Friedman and Popescu's H statistics + h <- h2o.h(titanic_xgb, train, c('sex','age')) + print(h) + .. code-tab:: python @@ -451,6 +470,10 @@ Below is a simple example showing how to build a XGBoost model. # Extract feature interactions: feature_interactions = titanic_xgb.feature_interaction() + # Get Friedman and Popescu's H statistics + h = titanic_xgb.h(train, ['sex','age']) + print(h) + Note '''' From b9d3016bbb131e4cec5528f0429f748b83be0dbb Mon Sep 17 00:00:00 2001 From: Veronika Maurerova Date: Thu, 26 Oct 2023 11:21:32 +0200 Subject: [PATCH 2/4] Fix xgboost example --- h2o-docs/src/product/data-science/xgboost.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst index 9c40f56316df..3d252a85c6b9 100644 --- a/h2o-docs/src/product/data-science/xgboost.rst +++ b/h2o-docs/src/product/data-science/xgboost.rst @@ -431,7 +431,7 @@ Below is a simple example showing how to build a XGBoost model. feature_interactions = h2o.feature_interaction(titanic_xgb) # Get Friedman and Popescu's H statistics - h <- h2o.h(titanic_xgb, train, c('sex','age')) + h <- h2o.h(titanic_xgb, train, c('fair','age')) print(h) @@ -471,7 +471,7 @@ Below is a simple example showing how to build a XGBoost model. feature_interactions = titanic_xgb.feature_interaction() # Get Friedman and Popescu's H statistics - h = titanic_xgb.h(train, ['sex','age']) + h = titanic_xgb.h(train, ['fair','age']) print(h) Note From 25bd92c452c15e935064788214a7b7bf6c0c83b3 Mon Sep 17 00:00:00 2001 From: Hannah Tillman Date: Tue, 31 Oct 2023 07:48:30 -0500 Subject: [PATCH 3/4] ht/syntax updates --- h2o-docs/src/product/data-science/gbm.rst | 8 ++++---- h2o-docs/src/product/data-science/xgboost.rst | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/h2o-docs/src/product/data-science/gbm.rst b/h2o-docs/src/product/data-science/gbm.rst index e89e7e12c30d..6e276b2adf69 100644 --- a/h2o-docs/src/product/data-science/gbm.rst +++ b/h2o-docs/src/product/data-science/gbm.rst @@ -361,17 +361,17 @@ Usage is illustrated in the Examples section. GBM Friedman and Popescu's H statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. +You can calculates the Friedman and Popescu's H statistics to test for the presence of an interaction between specified variables. H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors. -This statistic can be calculated only for numerical variables. Missing values are supported. +This statistic can only be calculated for numerical variables. Missing values are supported. See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1. -Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R) +Reference implementation: `Python `__ and `R `__ -Usage is illustrated in the Examples section. +You can see how it used in the `Examples section <#examples>`__. Examples ~~~~~~~~ diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst index 3d252a85c6b9..8a20a933155d 100644 --- a/h2o-docs/src/product/data-science/xgboost.rst +++ b/h2o-docs/src/product/data-science/xgboost.rst @@ -376,17 +376,17 @@ Usage is illustrated in the Examples section. XGBoost Friedman and Popescu's H statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. +You can calculates the Friedman and Popescu's H statistics to test for the presence of an interaction between specified variables. H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors. -This statistic can be calculated only for numerical variables. Missing values are supported. +This statistic can only be calculated for numerical variables. Missing values are supported. See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1. -Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R) +Reference implementation: `Python `__ and `R `__ -Usage is illustrated in the Examples section.b +You can see how it used in the `Examples section <#examples>`__. Examples ~~~~~~~~ From 8e8131742a2921bbba921dbb322ba423e5a73f72 Mon Sep 17 00:00:00 2001 From: Veronika Maurerova Date: Wed, 1 Nov 2023 13:09:20 +0100 Subject: [PATCH 4/4] Move link to references. --- h2o-docs/src/product/data-science/gbm.rst | 6 +++--- h2o-docs/src/product/data-science/xgboost.rst | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/h2o-docs/src/product/data-science/gbm.rst b/h2o-docs/src/product/data-science/gbm.rst index 6e276b2adf69..99e65aad103b 100644 --- a/h2o-docs/src/product/data-science/gbm.rst +++ b/h2o-docs/src/product/data-science/gbm.rst @@ -365,9 +365,7 @@ You can calculates the Friedman and Popescu's H statistics to test for the prese H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors. -This statistic can only be calculated for numerical variables. Missing values are supported. - -See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1. +This statistic can only be calculated for numerical variables. Missing values are supported. Reference implementation: `Python `__ and `R `__ @@ -504,6 +502,8 @@ York, 2001. `__ `Nee, Daniel, "Calibrating Classifier Probabilities", 2014 `__ +`Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954. `__ + FAQ ~~~ diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst index 8a20a933155d..55d34687074b 100644 --- a/h2o-docs/src/product/data-science/xgboost.rst +++ b/h2o-docs/src/product/data-science/xgboost.rst @@ -380,9 +380,7 @@ You can calculates the Friedman and Popescu's H statistics to test for the prese H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors. -This statistic can only be calculated for numerical variables. Missing values are supported. - -See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1. +This statistic can only be calculated for numerical variables. Missing values are supported. Reference implementation: `Python `__ and `R `__ @@ -530,4 +528,6 @@ References - Mitchell R, Frank E. (2017) Accelerating the XGBoost algorithm using GPU computing. PeerJ Preprints 5:e2911v1 `https://doi.org/10.7287/peerj.preprints.2911v1 `__ +`Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954. `__ +