diff --git a/examples/01_skpro_intro.ipynb b/examples/01_skpro_intro.ipynb index 7d8fc3ee9..08029a61c 100644 --- a/examples/01_skpro_intro.ipynb +++ b/examples/01_skpro_intro.ipynb @@ -373,7 +373,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 1.2 simple evaluation workflow for probabilistic predictions" + "## 1.2 simple evaluation workflow for probabilistic predictions" ] }, { @@ -478,7 +478,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 1.2.1 diagnostic visualisations" + "## 1.3 diagnostic visualisations" ] }, { @@ -587,14 +587,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 1.3 `skpro` objects - `scikit-base` interface, searching for regressors and metrics" + "### 1.4 `skpro` objects - `scikit-base` interface, searching for regressors and metrics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### 1.3.1 primer on `skpro` object interface " + "### 1.4.1 primer on `skpro` object interface " ] }, { @@ -722,7 +722,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 1.3.2 searching for regressors and metrics " + "### 1.4.2 searching for regressors and metrics " ] }, { @@ -820,6 +820,237 @@ "all_objects(\"metric\", as_dataframe=True, filter_tags={\"scitype:y_pred\": \"pred_proba\"})" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Prediction types, metrics, benchmarking " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This section gives more details on:\n", + "\n", + "* different prediction types, including a methodological primer\n", + "* the API of metrics to compare probabilistic predictions to non-probabilistic actuals\n", + "* utilities for batch benchmarking of estimators and metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.1 Probabilistic predictions - methodological primer " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**readers familir with, or less interested in theory, may like to skip section 2.1**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In supervised learning - probabilistic or not:\n", + "\n", + "* we fit estimator to i.i.d samples $(X_1, Y_1), \\dots, (X_N, Y_N) \\sim (X_*, Y_*)$\n", + "* and want to predict $y$ given $x$ accurately, for $(x, y) \\sim (X_*, Y_*)$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let $y$ be the (true) value, for an observed feature $x$\n", + "\n", + "(we consider $y$ a random variable)\n", + "\n", + "| Name | param | prediction/estimate of | `skpro` |\n", + "| ---- | ----- | ---------------------- | -------- |\n", + "| point forecast | | conditional expectation $\\mathbb{E}[y\\|x]$ | `predict` |\n", + "| variance forecast | | conditional variance $Var[y\\|x]$ | `predict_var` |\n", + "| quantile forecast | $\\alpha\\in (0,1)$ | $\\alpha$-quantile of $y\\|x$ | `predict_quantiles` |\n", + "| interval forecast | $c\\in (0,1)$| $[a,b]$ s.t. $P(a\\le y \\le b\\| x) = c$ | `predict_interval` |\n", + "| distribution forecast | | the law/distribution of $y\\|x$ | `predict_proba` |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### More formal details & intuition:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "let's consider the toy example again" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_new, y_train, _ = train_test_split(X, y)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_new, y_train, _ = train_test_split(X, y)\n", + "\n", + "\n", + "reg_mean = RandomForestRegressor()\n", + "reg_proba = ResidualDouble(reg_mean)\n", + "\n", + "reg_proba.fit(X_train, y_train)\n", + "y_pred_proba = reg_proba.predict_proba(X_new)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* a **\"point forecast\"** is a prediction/estimate of the conditional expectation $\\mathbb{E}[y|x]$.\\\n", + " **Intuition**: \"out of many repetitions/worlds, this value is the arithmetic average of all observations\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# if y_pred_proba were *true*, here's how many repetitions would look like:\n", + "\n", + "# repeating this line is \"one repetition\"\n", + "y_pred_proba.sample().head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# \"doing many times and taking the mean\" -> usual point prediction\n", + "y_pred_proba.mean().head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* a **\"variance forecast\"** is a prediction/estimate of the conditional expectation $Var[y|x]$.\\\n", + " **Intuition:** \"out of many repetitions/worlds, this value is the average squared distance of the observation to the perfect point forecast\".\n", + "* a **\"quantile forecast\"**, at quantile point $\\alpha\\in (0,1)$ is a prediction/estimate of the $\\alpha$-quantile of $y'|y$, i.e., of $F^{-1}_{y|x}(\\alpha)$, where $F^{-1}$ is the (generalized) inverse cdf = quantile function of the random variable y|x.\\\n", + " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $\\alpha$ will have equal or smaller than this value.\"\n", + "* an **\"interval forecast\"** or \"predictive interval\" with (symmetric) coverage $c\\in (0,1)$ is a prediction/estimate pair of lower bound $a$ and upper bound $b$ such that $P(a\\le y \\le b| x) = c$ and $P(y \\gneq b| x) = P(y \\lneq a| x) = (1 - c) /2$.\\\n", + " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $c$ will be contained in the interval $[a,b]$, and being above is equally likely as being below\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# same as above - take many samples, and then compute element-wise statistics\n", + "\n", + "# e.g., predict_var should give the same result as infinite large sample's variance\n", + "y_pred_proba.var().head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* a **\"distribution forecast\"** or \"full probabilistic forecast\" is a prediction/estimate of the distribution of $y|x$, e.g., \"it's a normal distribution with mean 42 and variance 1\".\\\n", + "**Intuition**: exhaustive description of the generating mechanism of many repetitions/worlds." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "note: the true distribution is unknown, and not accessible easily!\n", + "\n", + "`y_pred_proba` is a distribution, but in general not equal to the true one!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3 Benchmark evaluation of probabilistic regressors " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "for quick evaluation and benchmarking,\n", + "\n", + "the `benchmarking.evaluate` utility can be used:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import KFold\n", + "\n", + "from skpro.benchmarking.evaluate import evaluate\n", + "from skpro.metrics import CRPS\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "# 1. specify dataset\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "\n", + "# 2. specify estimator\n", + "estimator = ResidualDouble(LinearRegression())\n", + "\n", + "# 3. specify cross-validation schema\n", + "cv = KFold(n_splits=3)\n", + "\n", + "# 4. specify evaluation metric\n", + "crps = CRPS()\n", + "\n", + "# 5. evaluate - run the benchmark\n", + "results = evaluate(estimator=estimator, X=X, y=y, cv=cv, scoring=crps)\n", + "\n", + "# results are pd.DataFrame\n", + "# each row is one repetition of the cross-validation on one fold fit/predict/evaluate\n", + "# columns report performance, runtime, and other optional information (see docstring)\n", + "results" + ] + }, { "cell_type": "markdown", "metadata": {},