diff --git a/examples/01_skpro_intro.ipynb b/examples/01_skpro_intro.ipynb
index 7d8fc3ee9..08029a61c 100644
--- a/examples/01_skpro_intro.ipynb
+++ b/examples/01_skpro_intro.ipynb
@@ -373,7 +373,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 1.2 simple evaluation workflow for probabilistic predictions"
+ "## 1.2 simple evaluation workflow for probabilistic predictions"
]
},
{
@@ -478,7 +478,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 1.2.1 diagnostic visualisations"
+ "## 1.3 diagnostic visualisations"
]
},
{
@@ -587,14 +587,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 1.3 `skpro` objects - `scikit-base` interface, searching for regressors and metrics"
+ "### 1.4 `skpro` objects - `scikit-base` interface, searching for regressors and metrics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 1.3.1 primer on `skpro` object interface "
+ "### 1.4.1 primer on `skpro` object interface "
]
},
{
@@ -722,7 +722,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "### 1.3.2 searching for regressors and metrics "
+ "### 1.4.2 searching for regressors and metrics "
]
},
{
@@ -820,6 +820,237 @@
"all_objects(\"metric\", as_dataframe=True, filter_tags={\"scitype:y_pred\": \"pred_proba\"})"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Prediction types, metrics, benchmarking "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This section gives more details on:\n",
+ "\n",
+ "* different prediction types, including a methodological primer\n",
+ "* the API of metrics to compare probabilistic predictions to non-probabilistic actuals\n",
+ "* utilities for batch benchmarking of estimators and metrics"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.1 Probabilistic predictions - methodological primer "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**readers familir with, or less interested in theory, may like to skip section 2.1**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In supervised learning - probabilistic or not:\n",
+ "\n",
+ "* we fit estimator to i.i.d samples $(X_1, Y_1), \\dots, (X_N, Y_N) \\sim (X_*, Y_*)$\n",
+ "* and want to predict $y$ given $x$ accurately, for $(x, y) \\sim (X_*, Y_*)$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let $y$ be the (true) value, for an observed feature $x$\n",
+ "\n",
+ "(we consider $y$ a random variable)\n",
+ "\n",
+ "| Name | param | prediction/estimate of | `skpro` |\n",
+ "| ---- | ----- | ---------------------- | -------- |\n",
+ "| point forecast | | conditional expectation $\\mathbb{E}[y\\|x]$ | `predict` |\n",
+ "| variance forecast | | conditional variance $Var[y\\|x]$ | `predict_var` |\n",
+ "| quantile forecast | $\\alpha\\in (0,1)$ | $\\alpha$-quantile of $y\\|x$ | `predict_quantiles` |\n",
+ "| interval forecast | $c\\in (0,1)$| $[a,b]$ s.t. $P(a\\le y \\le b\\| x) = c$ | `predict_interval` |\n",
+ "| distribution forecast | | the law/distribution of $y\\|x$ | `predict_proba` |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "##### More formal details & intuition:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "let's consider the toy example again"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.datasets import load_diabetes\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "X, y = load_diabetes(return_X_y=True, as_frame=True)\n",
+ "X_train, X_new, y_train, _ = train_test_split(X, y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.datasets import load_diabetes\n",
+ "from sklearn.ensemble import RandomForestRegressor\n",
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "from skpro.regression.residual import ResidualDouble\n",
+ "\n",
+ "X, y = load_diabetes(return_X_y=True, as_frame=True)\n",
+ "X_train, X_new, y_train, _ = train_test_split(X, y)\n",
+ "\n",
+ "\n",
+ "reg_mean = RandomForestRegressor()\n",
+ "reg_proba = ResidualDouble(reg_mean)\n",
+ "\n",
+ "reg_proba.fit(X_train, y_train)\n",
+ "y_pred_proba = reg_proba.predict_proba(X_new)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "* a **\"point forecast\"** is a prediction/estimate of the conditional expectation $\\mathbb{E}[y|x]$.\\\n",
+ " **Intuition**: \"out of many repetitions/worlds, this value is the arithmetic average of all observations\"."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# if y_pred_proba were *true*, here's how many repetitions would look like:\n",
+ "\n",
+ "# repeating this line is \"one repetition\"\n",
+ "y_pred_proba.sample().head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# \"doing many times and taking the mean\" -> usual point prediction\n",
+ "y_pred_proba.mean().head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "* a **\"variance forecast\"** is a prediction/estimate of the conditional expectation $Var[y|x]$.\\\n",
+ " **Intuition:** \"out of many repetitions/worlds, this value is the average squared distance of the observation to the perfect point forecast\".\n",
+ "* a **\"quantile forecast\"**, at quantile point $\\alpha\\in (0,1)$ is a prediction/estimate of the $\\alpha$-quantile of $y'|y$, i.e., of $F^{-1}_{y|x}(\\alpha)$, where $F^{-1}$ is the (generalized) inverse cdf = quantile function of the random variable y|x.\\\n",
+ " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $\\alpha$ will have equal or smaller than this value.\"\n",
+ "* an **\"interval forecast\"** or \"predictive interval\" with (symmetric) coverage $c\\in (0,1)$ is a prediction/estimate pair of lower bound $a$ and upper bound $b$ such that $P(a\\le y \\le b| x) = c$ and $P(y \\gneq b| x) = P(y \\lneq a| x) = (1 - c) /2$.\\\n",
+ " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $c$ will be contained in the interval $[a,b]$, and being above is equally likely as being below\"."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# same as above - take many samples, and then compute element-wise statistics\n",
+ "\n",
+ "# e.g., predict_var should give the same result as infinite large sample's variance\n",
+ "y_pred_proba.var().head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "* a **\"distribution forecast\"** or \"full probabilistic forecast\" is a prediction/estimate of the distribution of $y|x$, e.g., \"it's a normal distribution with mean 42 and variance 1\".\\\n",
+ "**Intuition**: exhaustive description of the generating mechanism of many repetitions/worlds."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "note: the true distribution is unknown, and not accessible easily!\n",
+ "\n",
+ "`y_pred_proba` is a distribution, but in general not equal to the true one!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.3 Benchmark evaluation of probabilistic regressors "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "for quick evaluation and benchmarking,\n",
+ "\n",
+ "the `benchmarking.evaluate` utility can be used:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.datasets import load_diabetes\n",
+ "from sklearn.linear_model import LinearRegression\n",
+ "from sklearn.model_selection import KFold\n",
+ "\n",
+ "from skpro.benchmarking.evaluate import evaluate\n",
+ "from skpro.metrics import CRPS\n",
+ "from skpro.regression.residual import ResidualDouble\n",
+ "\n",
+ "# 1. specify dataset\n",
+ "X, y = load_diabetes(return_X_y=True, as_frame=True)\n",
+ "\n",
+ "# 2. specify estimator\n",
+ "estimator = ResidualDouble(LinearRegression())\n",
+ "\n",
+ "# 3. specify cross-validation schema\n",
+ "cv = KFold(n_splits=3)\n",
+ "\n",
+ "# 4. specify evaluation metric\n",
+ "crps = CRPS()\n",
+ "\n",
+ "# 5. evaluate - run the benchmark\n",
+ "results = evaluate(estimator=estimator, X=X, y=y, cv=cv, scoring=crps)\n",
+ "\n",
+ "# results are pd.DataFrame\n",
+ "# each row is one repetition of the cross-validation on one fold fit/predict/evaluate\n",
+ "# columns report performance, runtime, and other optional information (see docstring)\n",
+ "results"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {},