From bab2a4db7d4ee66ab2bc10bf0809d3b67641ee16 Mon Sep 17 00:00:00 2001
From: Veronika Maurerova <veronika.maurerova@h2o.ai>
Date: Thu, 26 Oct 2023 10:58:53 +0200
Subject: [PATCH 1/4] Add doc

---
 h2o-docs/src/product/data-science/gbm.rst     | 23 +++++++++++++++++++
 h2o-docs/src/product/data-science/xgboost.rst | 23 +++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/h2o-docs/src/product/data-science/gbm.rst b/h2o-docs/src/product/data-science/gbm.rst
index abadc2a86db9..e89e7e12c30d 100644
--- a/h2o-docs/src/product/data-science/gbm.rst
+++ b/h2o-docs/src/product/data-science/gbm.rst
@@ -358,6 +358,21 @@ Metrics
 
 Usage is illustrated in the Examples section.
 
+GBM Friedman and Popescu's H statistics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. 
+
+H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors.
+
+This statistic can be calculated only for numerical variables. Missing values are supported. 
+
+See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
+
+Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R)
+
+Usage is illustrated in the Examples section.
+
 Examples
 ~~~~~~~~
 
@@ -394,6 +409,10 @@ Below is a simple example showing how to build a Gradient Boosting Machine model
     # Extract feature interactions:
     feature_interactions <- h2o.feature_interaction(pros_gbm)
 
+    # Get Friedman and Popescu's H statistics
+    h <- h2o.h(pros_gbm, prostate, c('DPROS','DCAPS'))
+    print(h) 
+
 
    .. code-tab:: python
    
@@ -424,6 +443,10 @@ Below is a simple example showing how to build a Gradient Boosting Machine model
     # Extract feature interactions:
     feature_interactions = pros_gbm.feature_interaction()
 
+    # Get Friedman and Popescu's H statistics
+    h = pros_gbm.h(prostate_train, ['DPROS','DCAPS'])
+    print(h)
+
 
    .. code-tab:: scala
     
diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst
index 960952be1346..9c40f56316df 100644
--- a/h2o-docs/src/product/data-science/xgboost.rst
+++ b/h2o-docs/src/product/data-science/xgboost.rst
@@ -373,6 +373,21 @@ Metrics
 
 Usage is illustrated in the Examples section.
 
+XGBoost Friedman and Popescu's H statistics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. 
+
+H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors.
+
+This statistic can be calculated only for numerical variables. Missing values are supported. 
+
+See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
+
+Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R)
+
+Usage is illustrated in the Examples section.b
+
 Examples
 ~~~~~~~~
 
@@ -415,6 +430,10 @@ Below is a simple example showing how to build a XGBoost model.
     # Extract feature interactions:
     feature_interactions = h2o.feature_interaction(titanic_xgb)
 
+    # Get Friedman and Popescu's H statistics
+    h <- h2o.h(titanic_xgb, train, c('sex','age'))
+    print(h)
+
 
    .. code-tab:: python
    
@@ -451,6 +470,10 @@ Below is a simple example showing how to build a XGBoost model.
     # Extract feature interactions:
     feature_interactions = titanic_xgb.feature_interaction()
 
+    # Get Friedman and Popescu's H statistics
+    h = titanic_xgb.h(train, ['sex','age'])
+    print(h)
+
 Note
 ''''
 

From b9d3016bbb131e4cec5528f0429f748b83be0dbb Mon Sep 17 00:00:00 2001
From: Veronika Maurerova <veronika.maurerova@h2o.ai>
Date: Thu, 26 Oct 2023 11:21:32 +0200
Subject: [PATCH 2/4] Fix xgboost example

---
 h2o-docs/src/product/data-science/xgboost.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst
index 9c40f56316df..3d252a85c6b9 100644
--- a/h2o-docs/src/product/data-science/xgboost.rst
+++ b/h2o-docs/src/product/data-science/xgboost.rst
@@ -431,7 +431,7 @@ Below is a simple example showing how to build a XGBoost model.
     feature_interactions = h2o.feature_interaction(titanic_xgb)
 
     # Get Friedman and Popescu's H statistics
-    h <- h2o.h(titanic_xgb, train, c('sex','age'))
+    h <- h2o.h(titanic_xgb, train, c('fair','age'))
     print(h)
 
 
@@ -471,7 +471,7 @@ Below is a simple example showing how to build a XGBoost model.
     feature_interactions = titanic_xgb.feature_interaction()
 
     # Get Friedman and Popescu's H statistics
-    h = titanic_xgb.h(train, ['sex','age'])
+    h = titanic_xgb.h(train, ['fair','age'])
     print(h)
 
 Note

From 25bd92c452c15e935064788214a7b7bf6c0c83b3 Mon Sep 17 00:00:00 2001
From: Hannah Tillman <hannah.tillman@h2o.ai>
Date: Tue, 31 Oct 2023 07:48:30 -0500
Subject: [PATCH 3/4] ht/syntax updates

---
 h2o-docs/src/product/data-science/gbm.rst     | 8 ++++----
 h2o-docs/src/product/data-science/xgboost.rst | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/h2o-docs/src/product/data-science/gbm.rst b/h2o-docs/src/product/data-science/gbm.rst
index e89e7e12c30d..6e276b2adf69 100644
--- a/h2o-docs/src/product/data-science/gbm.rst
+++ b/h2o-docs/src/product/data-science/gbm.rst
@@ -361,17 +361,17 @@ Usage is illustrated in the Examples section.
 GBM Friedman and Popescu's H statistics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. 
+You can calculates the Friedman and Popescu's H statistics to test for the presence of an interaction between specified variables. 
 
 H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors.
 
-This statistic can be calculated only for numerical variables. Missing values are supported. 
+This statistic can only be calculated for numerical variables. Missing values are supported. 
 
 See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
 
-Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R)
+Reference implementation: `Python <https://pypi.org/project/sklearn-gbmi/>`__ and `R <https://rdrr.io/cran/gbm/man/interact.gbm.html>`__
 
-Usage is illustrated in the Examples section.
+You can see how it used in the `Examples section <#examples>`__.
 
 Examples
 ~~~~~~~~
diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst
index 3d252a85c6b9..8a20a933155d 100644
--- a/h2o-docs/src/product/data-science/xgboost.rst
+++ b/h2o-docs/src/product/data-science/xgboost.rst
@@ -376,17 +376,17 @@ Usage is illustrated in the Examples section.
 XGBoost Friedman and Popescu's H statistics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Calculates Friedman and Popescu's H statistics, in order to test for the presence of an interaction between specified variables. 
+You can calculates the Friedman and Popescu's H statistics to test for the presence of an interaction between specified variables. 
 
 H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors.
 
-This statistic can be calculated only for numerical variables. Missing values are supported. 
+This statistic can only be calculated for numerical variables. Missing values are supported. 
 
 See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
 
-Reference implementation: https://pypi.org/project/sklearn-gbmi/ (Python) and https://rdrr.io/cran/gbm/man/interact.gbm.html (R)
+Reference implementation: `Python <https://pypi.org/project/sklearn-gbmi/>`__ and `R <https://rdrr.io/cran/gbm/man/interact.gbm.html>`__
 
-Usage is illustrated in the Examples section.b
+You can see how it used in the `Examples section <#examples>`__.
 
 Examples
 ~~~~~~~~

From 8e8131742a2921bbba921dbb322ba423e5a73f72 Mon Sep 17 00:00:00 2001
From: Veronika Maurerova <veronika.maurerova@h2o.ai>
Date: Wed, 1 Nov 2023 13:09:20 +0100
Subject: [PATCH 4/4] Move link to references.

---
 h2o-docs/src/product/data-science/gbm.rst     | 6 +++---
 h2o-docs/src/product/data-science/xgboost.rst | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/h2o-docs/src/product/data-science/gbm.rst b/h2o-docs/src/product/data-science/gbm.rst
index 6e276b2adf69..99e65aad103b 100644
--- a/h2o-docs/src/product/data-science/gbm.rst
+++ b/h2o-docs/src/product/data-science/gbm.rst
@@ -365,9 +365,7 @@ You can calculates the Friedman and Popescu's H statistics to test for the prese
 
 H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors.
 
-This statistic can only be calculated for numerical variables. Missing values are supported. 
-
-See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
+This statistic can only be calculated for numerical variables. Missing values are supported.
 
 Reference implementation: `Python <https://pypi.org/project/sklearn-gbmi/>`__ and `R <https://rdrr.io/cran/gbm/man/interact.gbm.html>`__
 
@@ -504,6 +502,8 @@ York, 2001. <http://statweb.stanford.edu/~tibs/ElemStatLearn/>`__
 
 `Nee, Daniel, "Calibrating Classifier Probabilities", 2014 <http://danielnee.com/tag/platt-scaling>`__
 
+`Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954. <http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046>`__ 
+
 FAQ
 ~~~
 
diff --git a/h2o-docs/src/product/data-science/xgboost.rst b/h2o-docs/src/product/data-science/xgboost.rst
index 8a20a933155d..55d34687074b 100644
--- a/h2o-docs/src/product/data-science/xgboost.rst
+++ b/h2o-docs/src/product/data-science/xgboost.rst
@@ -380,9 +380,7 @@ You can calculates the Friedman and Popescu's H statistics to test for the prese
 
 H varies from 0 to 1. It will have a value of 0 if the model exhibits no interaction between specified variables and a correspondingly larger value for a stronger interaction effect between them. NaN is returned if a computation is spoiled by weak main effects and rounding errors.
 
-This statistic can only be calculated for numerical variables. Missing values are supported. 
-
-See Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954, http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046, s. 8.1.
+This statistic can only be calculated for numerical variables. Missing values are supported.
 
 Reference implementation: `Python <https://pypi.org/project/sklearn-gbmi/>`__ and `R <https://rdrr.io/cran/gbm/man/interact.gbm.html>`__
 
@@ -530,4 +528,6 @@ References
 
 - Mitchell R, Frank E. (2017) Accelerating the XGBoost algorithm using GPU computing. PeerJ Preprints 5:e2911v1 `https://doi.org/10.7287/peerj.preprints.2911v1 <https://doi.org/10.7287/peerj.preprints.2911v1>`__
 
+`Jerome H. Friedman and Bogdan E. Popescu, 2008, "Predictive learning via rule ensembles", *Ann. Appl. Stat.* **2**:916-954. <http://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908046>`__ 
+