diff --git a/examples/01_skpro_intro.ipynb b/examples/01_skpro_intro.ipynb index aab1b8564..8f9b407ca 100644 --- a/examples/01_skpro_intro.ipynb +++ b/examples/01_skpro_intro.ipynb @@ -5,23 +5,48 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## skpro introduction notebook\n", + "## skpro introduction notebook" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Set-up instructions:** On binder, this should run out-of-the-box.\n", + "\n", + "To run this notebook as intended, ensure that `skpro` with basic dependency requirements is installed in your python environment." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`skpro` provides `scikit-learn`-like, `scikit-base` compatible interfaces to:\n", + "\n", + "* tabular **supervised regressors with probabilistic prediction modes** - interval, quantile and distribution predictions\n", + "* **performance metrics to evaluate probabilistic predictions**, e.g., pinball loss, empirical coverage, CRPS\n", + "* **reductions** to turn non-probabilistic, `scikit-learn` regressors into probabilistic `skpro` regressors, such as bootstrap or conformal\n", + "* tools for building **pipelines and composite machine learning models**, including tuning via probabilistic performance metrics\n", + "* symbolic an lazy **probability distributions** with a value domain of `pandas.DataFrame`-s and a `pandas`-like interface\n", + "\n", + "**Section 1** provides an overview of common **probabilistic supervised regression workflows** supported by `skpro`.\n", + "\n", + "**Section 2** gives an more detailed introduction to **prediction modes, performance metrics, and benchmarking tools**.\n", "\n", - "lists basic vignettes - currently incomplete and used for testing" + "**Section 3** discusses **advanced composition patterns**, including various ways to add probabilistic capability to any `sklearn` regressor, pipeline building, tuning, ensembling.\n", + "\n", + "**Section 4** gives an introduction to how to write **custom estimators** compliant with the `skpro` interface." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ + "# hide warnings\n", "import warnings\n", "\n", - "# import numpy as np\n", - "# import pandas as pd\n", - "\n", - "# hide warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, @@ -33,6 +58,20 @@ "## 1. Basic probabilistic supervised regression workflows " ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`skpro` revolves around supervised probabilistic regressors:\n", + "\n", + "* `fit(X, y)` with tabular features `X`, labels `y`, same rows, both `pd.DataFrame`\n", + "* `predict_interval(X_test)` for interval predictions of labels\n", + "* `predict_quantiles(X_test)` for quantile predictions of labels\n", + "* `predict_var(X_test)` for variance predictions of labels\n", + "* `predict(X_test)` for mean predictions\n", + "* `predict_proba(X_test)` for distributional prediction" + ] + }, { "attachments": {}, "cell_type": "markdown", @@ -41,13 +80,21 @@ "### 1.1 basic deployment workflow" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`skpro` regressors are used via `fit` then `predict_proba` etc.\n", + "\n", + "Same as `sklearn` regressors - `X` and `y` should be `pd.DataFrame` (`numpy` is also ok but not recommended)" + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ - "\n", "from sklearn.datasets import load_diabetes\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.linear_model import LinearRegression\n", @@ -61,7 +108,7 @@ "\n", "# step 2: specifying the regressor\n", "# example - random forest for mean prediction\n", - "# near regression for variance prediction\n", + "# linear regression for variance prediction\n", "reg_mean = RandomForestRegressor()\n", "reg_resid = LinearRegression()\n", "reg_proba = ResidualDouble(reg_mean, reg_resid)\n", @@ -70,8 +117,1052 @@ "reg_proba.fit(X_train, y_train)\n", "\n", "# step 4: predicting labels on new data\n", + "\n", + "# probabilistic prediction modes - pick any or multiple\n", + "# we show the return types in detail below\n", + "\n", + "# full distribution prediction\n", + "y_pred_proba = reg_proba.predict_proba(X_new)\n", + "\n", + "# interval prediction\n", + "y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)\n", + "\n", + "# quantile prediction\n", + "y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])\n", + "\n", + "# variance prediction\n", + "y_pred_var = reg_proba.predict_var(X_new)\n", + "\n", + "# mean prediction is same as \"classical\" sklearn predict, also available\n", + "y_pred_mean = reg_proba.predict(X_new)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.1.1 distribution predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`y_pred_proba` is an `skpro` distribution - it has index and columns like `pd.DataFrame`\n", + "\n", + "\"we predict that true labels are distributed according to `y_pred_proba`\"\n", + "\n", + "(here: distribution marginal by row/columns)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Normal(columns=Index(['target'], dtype='object'),\n",
+       "       index=Index([381, 139,  99, 272,  56, 349, 420, 337, 199,  72,\n",
+       "       ...\n",
+       "       300, 236, 160, 227, 241, 268, 388, 127, 412, 283],\n",
+       "      dtype='int64', length=111),\n",
+       "       mu=array([[118.85],\n",
+       "       [228.85],\n",
+       "       [145.66],\n",
+       "       [125.88],\n",
+       "       [205.65],\n",
+       "       [107.12],\n",
+       "       [130.02],\n",
+       "       [164.81],\n",
+       "       [185.76],\n",
+       "       [149.34],\n",
+       "       [217.09],\n",
+       "       [ 96.99],\n",
+       "       [104.97],\n",
+       "       [196.28],\n",
+       "       [208.39],\n",
+       "       [158.98],\n",
+       "       [145.25],\n",
+       "       [248.86],\n",
+       "       [ 89.01],\n",
+       "       [24...\n",
+       "       [16.29396088],\n",
+       "       [16.32129065],\n",
+       "       [15.3176468 ],\n",
+       "       [12.535319  ],\n",
+       "       [13.1500054 ],\n",
+       "       [27.31178894],\n",
+       "       [22.36422301],\n",
+       "       [21.83370339],\n",
+       "       [15.32004588],\n",
+       "       [18.12178421],\n",
+       "       [11.38451594],\n",
+       "       [11.4564265 ],\n",
+       "       [14.77505789],\n",
+       "       [12.47202459],\n",
+       "       [15.8887364 ],\n",
+       "       [20.78791316],\n",
+       "       [19.85426535],\n",
+       "       [19.96654621],\n",
+       "       [12.92540335],\n",
+       "       [11.64591954],\n",
+       "       [15.35777574],\n",
+       "       [23.883902  ],\n",
+       "       [14.26127797],\n",
+       "       [ 8.10192324],\n",
+       "       [24.10078003],\n",
+       "       [15.11159749]]))
Please rerun this cell to show the HTML repr or trust the notebook.
" + ], + "text/plain": [ + "Normal(columns=Index(['target'], dtype='object'),\n", + " index=Index([381, 139, 99, 272, 56, 349, 420, 337, 199, 72,\n", + " ...\n", + " 300, 236, 160, 227, 241, 268, 388, 127, 412, 283],\n", + " dtype='int64', length=111),\n", + " mu=array([[118.85],\n", + " [228.85],\n", + " [145.66],\n", + " [125.88],\n", + " [205.65],\n", + " [107.12],\n", + " [130.02],\n", + " [164.81],\n", + " [185.76],\n", + " [149.34],\n", + " [217.09],\n", + " [ 96.99],\n", + " [104.97],\n", + " [196.28],\n", + " [208.39],\n", + " [158.98],\n", + " [145.25],\n", + " [248.86],\n", + " [ 89.01],\n", + " [24...\n", + " [16.29396088],\n", + " [16.32129065],\n", + " [15.3176468 ],\n", + " [12.535319 ],\n", + " [13.1500054 ],\n", + " [27.31178894],\n", + " [22.36422301],\n", + " [21.83370339],\n", + " [15.32004588],\n", + " [18.12178421],\n", + " [11.38451594],\n", + " [11.4564265 ],\n", + " [14.77505789],\n", + " [12.47202459],\n", + " [15.8887364 ],\n", + " [20.78791316],\n", + " [19.85426535],\n", + " [19.96654621],\n", + " [12.92540335],\n", + " [11.64591954],\n", + " [15.35777574],\n", + " [23.883902 ],\n", + " [14.26127797],\n", + " [ 8.10192324],\n", + " [24.10078003],\n", + " [15.11159749]]))" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba = reg_proba.predict_proba(X_new)\n", + "y_pred_proba" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`skpro` distribution objects are pandas-like" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(111, 1)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index([381, 139, 99, 272, 56, 349, 420, 337, 199, 72,\n", + " ...\n", + " 300, 236, 160, 227, 241, 268, 388, 127, 412, 283],\n", + " dtype='int64', length=111)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.index # same index as X_new" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['target'], dtype='object')" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.columns # same columns as X_new" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "distribution objects have `sample` and methods such as `mean`, `var`:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
381101.431079
139209.907161
99148.807803
272151.779165
56222.904201
\n", + "
" + ], + "text/plain": [ + " target\n", + "381 101.431079\n", + "139 209.907161\n", + "99 148.807803\n", + "272 151.779165\n", + "56 222.904201" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.sample().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
381118.85
139228.85
99145.66
272125.88
56205.65
\n", + "
" + ], + "text/plain": [ + " target\n", + "381 118.85\n", + "139 228.85\n", + "99 145.66\n", + "272 125.88\n", + "56 205.65" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.mean().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
381127.633421
139472.467320
99331.036284
272233.990518
56433.604575
\n", + "
" + ], + "text/plain": [ + " target\n", + "381 127.633421\n", + "139 472.467320\n", + "99 331.036284\n", + "272 233.990518\n", + "56 433.604575" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.var().head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.1.2 interval predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "interval prediction `y_pred_interval` is a `pd.DataFrame`:\n", + "\n", + "* rows are the same as `X_new`\n", + "* columns indicate variables, nominal coverage, and bottom/upper bound\n", + "\n", + "\"we predict that value in row falls between bottom/upper with 90% chance\"" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
0.9
lowerupper
381100.267272137.432728
139193.096946264.603054
99115.732871175.587129
272100.719088151.040912
56171.398927239.901073
\n", + "
" + ], + "text/plain": [ + " target \n", + " 0.9 \n", + " lower upper\n", + "381 100.267272 137.432728\n", + "139 193.096946 264.603054\n", + "99 115.732871 175.587129\n", + "272 100.719088 151.040912\n", + "56 171.398927 239.901073" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_interval = reg_proba.predict_interval(X_new, coverage=0.9)\n", + "y_pred_interval.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.1.3 quantile predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "quantile prediction `y_pred_quantiles` is a `pd.DataFrame`:\n", + "\n", + "* rows are the same as `X_new`\n", + "* columns indicate variables, quantile points\n", + "\n", + "\"we predict the 5%, 50%, 95% quantile points for the row to be here\"" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
0.050.500.95
381100.267272118.85137.432728
139193.096946228.85264.603054
99115.732871145.66175.587129
272100.719088125.88151.040912
56171.398927205.65239.901073
\n", + "
" + ], + "text/plain": [ + " target \n", + " 0.05 0.50 0.95\n", + "381 100.267272 118.85 137.432728\n", + "139 193.096946 228.85 264.603054\n", + "99 115.732871 145.66 175.587129\n", + "272 100.719088 125.88 151.040912\n", + "56 171.398927 205.65 239.901073" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_quantiles = reg_proba.predict_quantiles(X_new, alpha=[0.05, 0.5, 0.95])\n", + "y_pred_quantiles.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.1.4 mean and variance predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "mean and variance predictions `y_pred_mean`, `y_pred_var` are `pd.DataFrame`-s:\n", + "\n", + "* rows are the same as `X_new`\n", + "* columns are the same as `X_new`\n", + "\n", + "entries are predictive mean and variance in row/column" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ "y_pred_mean = reg_proba.predict(X_new)\n", - "y_pred_proba = reg_proba.predict_proba(X_new)" + "y_pred_var = reg_proba.predict_var(X_new)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
381118.85
139228.85
99145.66
272125.88
56205.65
\n", + "
" + ], + "text/plain": [ + " target\n", + "381 118.85\n", + "139 228.85\n", + "99 145.66\n", + "272 125.88\n", + "56 205.65" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_mean.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
381127.633421
139472.467320
99331.036284
272233.990518
56433.604575
\n", + "
" + ], + "text/plain": [ + " target\n", + "381 127.633421\n", + "139 472.467320\n", + "99 331.036284\n", + "272 233.990518\n", + "56 433.604575" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_var.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "this is the same as taking the distribution prediction and taking mean/variance\n", + "\n", + "(for distribution objects that estimate these precisely)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
381118.85
139228.85
99145.66
272125.88
56205.65
\n", + "
" + ], + "text/plain": [ + " target\n", + "381 118.85\n", + "139 228.85\n", + "99 145.66\n", + "272 125.88\n", + "56 205.65" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.mean().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
381127.633421
139472.467320
99331.036284
272233.990518
56433.604575
\n", + "
" + ], + "text/plain": [ + " target\n", + "381 127.633421\n", + "139 472.467320\n", + "99 331.036284\n", + "272 233.990518\n", + "56 433.604575" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba.var().head()" ] }, { @@ -79,14 +1170,41 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 1.2 simple evaluation workflow for probabilistic predictions" + "## 1.2 simple evaluation workflow for probabilistic predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "for simple evaluation:\n", + "\n", + "1. split the data into train/test set\n", + "2. make predictions of either type for test features\n", + "3. compute metric on test set, comparing test predictions to hend out test labels\n", + "\n", + "Note:\n", + "\n", + "* metrics will compare tabular ground truth to probabilistic prediction\n", + "* the metric will needs to be of a compatible type, e.g., for proba predictions" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "30.429848226043294" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "from sklearn.datasets import load_diabetes\n", "from sklearn.ensemble import RandomForestRegressor\n", @@ -101,10 +1219,10 @@ "X_train, X_test, y_train, y_test = train_test_split(X, y)\n", "\n", "# step 2: specifying the regressor\n", - "# example - random forest for mean prediction\n", - "# near regression for variance prediction\n", - "reg_mean = RandomForestRegressor()\n", - "reg_resid = LinearRegression()\n", + "# example - linear regression for mean prediction\n", + "# random forest for variance prediction\n", + "reg_mean = LinearRegression()\n", + "reg_resid = RandomForestRegressor()\n", "reg_proba = ResidualDouble(reg_mean, reg_resid)\n", "\n", "# step 3: fitting the model to training data\n", @@ -123,7 +1241,3524 @@ { "cell_type": "markdown", "metadata": {}, - "source": [] + "source": [ + "how do we know that metric is of right type? Via `scitype:y_pred` tag" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'estimator_type': 'estimator',\n", + " 'object_type': 'metric',\n", + " 'reserved_params': ['multioutput', 'score_average'],\n", + " 'scitype:y_pred': 'pred_proba',\n", + " 'lower_is_better': True}" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "metric.get_tags()\n", + "# scitype:y_pred is pred_proba - for proba predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "how do we find metrics for a prediction type?" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameobjectscitype:y_pred
0CRPS<class 'skpro.metrics._classes.CRPS'>pred_proba
1ConstraintViolation<class 'skpro.metrics._classes.ConstraintViola...pred_interval
2EmpiricalCoverage<class 'skpro.metrics._classes.EmpiricalCovera...pred_interval
3LinearizedLogLoss<class 'skpro.metrics._classes.LinearizedLogLo...pred_proba
4LogLoss<class 'skpro.metrics._classes.LogLoss'>pred_proba
5PinballLoss<class 'skpro.metrics._classes.PinballLoss'>pred_quantiles
6SquaredDistrLoss<class 'skpro.metrics._classes.SquaredDistrLoss'>pred_proba
\n", + "
" + ], + "text/plain": [ + " name object \\\n", + "0 CRPS \n", + "1 ConstraintViolation \n", + "5 PinballLoss \n", + "6 SquaredDistrLoss \n", + "\n", + " scitype:y_pred \n", + "0 pred_proba \n", + "1 pred_interval \n", + "2 pred_interval \n", + "3 pred_proba \n", + "4 pred_proba \n", + "5 pred_quantiles \n", + "6 pred_proba " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from skpro.registry import all_objects\n", + "\n", + "all_objects(\"metric\", as_dataframe=True, return_tags=\"scitype:y_pred\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "extra note: quantile metrics can be applied to interval predictions as well\n", + "\n", + "more details on metrics below" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1.3 diagnostic visualisations" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "some useful diagnostic visualisations: variants of crossplots for probabilistic predictions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A. crossplot ground truth vs prediction intervals.\n", + "\n", + "Works with both proba and interval predictions.\n", + "\n", + "What to look for: intervals shouhld cut through the x = y line (green points)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_interval\n", + "\n", + "plot_crossplot_interval(y_test, y_pred_proba, coverage=0.9)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_interval\n", + "\n", + "y_pred_interval = reg_proba.predict_interval(X_test, coverage=0.9)\n", + "plot_crossplot_interval(y_test, y_pred_interval)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "B. crossplot residuals vs predictive standard deviation\n", + "\n", + "Works with both proba and variance predictions.\n", + "\n", + "What to look for: should be close to a line, high linear correlation" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_std\n", + "\n", + "plot_crossplot_std(y_test, y_pred_proba)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjcAAAG1CAYAAAAFuNXgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAABPZklEQVR4nO3de1xUdf4/8NeAgNxmFEUQQcRkvV+xFG+ZsqJfs4vWt9ysNLyRd8LSfqvtWomrqVnfVdNQaTdrc83atYspoqYSiWJ5JbwQTAq6KjdRruf3xywjw82ZM2dmzjnzej4e8wDOnDnzng9n5rznc9UIgiCAiIiISCVcHB0AERERkZSY3BAREZGqMLkhIiIiVWFyQ0RERKrC5IaIiIhUhckNERERqQqTGyIiIlIVJjdERESkKs0cHYC9VVdX48qVK/D19YVGo3F0OERERGQGQRBQXFyMoKAguLg0XTfjdMnNlStXEBIS4ugwiIiISITc3FwEBwc3uY/TJTe+vr4ADIWj1WodHA0RERGZo6ioCCEhIcbreFOcLrmpaYrSarVMboiIiBTGnC4l7FBMREREqsLkhoiIiFSFyQ0RERGpCpMbIiIiUhUmN0RERKQqTG6IiIhIVZjcEBERkaowuSEiIiJVYXJDREREqsLkhoiIiFSFyQ0RERGpCpMbIiKyGb0eSEkx/CSyFyY3RERkE4mJQGgoMGKE4WdioqMjImfB5IaIiCSn1wPTpwPV1Ya/q6uBGTNYg0P2weSGiIgkl5V1L7GpUVUFXLjgmHjIuTC5ISIiyYWHAy51rjCurkCnTo6Jh5wLkxsiIpJccDCwaZMhoQEMPz/4wLCdyNZkldxUVVVhyZIlCAsLg6enJx544AG8+eabEATBuI8gCFi6dCnatm0LT09PREVFISsry4FRExFRQ2JigOxsw2ip7GzD30T20MzRAdT2l7/8BRs2bEBSUhK6d++O9PR0TJkyBTqdDnPnzgUArFy5Eu+99x6SkpIQFhaGJUuWIDo6GmfPnkXz5s0d/AqIiKi24GDW1pD9aYTa1SIO9uijjyIgIACJtcYLTpgwAZ6envj73/8OQRAQFBSEV155BfHx8QCAwsJCBAQEYNu2bXj22Wfv+xxFRUXQ6XQoLCyEVqu12WshIiIi6Vhy/ZZVs9SgQYOQnJyMX375BQDw008/4fDhwxgzZgwA4PLly8jLy0NUVJTxMTqdDgMGDEBqaqpDYiYiIiJ5kVWz1KJFi1BUVIQuXbrA1dUVVVVVePvtt/Hcc88BAPLy8gAAAQEBJo8LCAgw3ldXWVkZysrKjH8XFRXZKHoiIiKSA1nV3Hz22Wf4+OOPsX37dpw4cQJJSUl45513kJSUJPqYCQkJ0Ol0xltISIiEERMREZHcyCq5WbhwIRYtWoRnn30WPXv2xPPPP48FCxYgISEBABAYGAgAyM/PN3lcfn6+8b66Fi9ejMLCQuMtNzfXti+CiIiIHEpWyU1paSlc6sz65Orqiur/TnMZFhaGwMBAJCcnG+8vKipCWloaIiMjGzymh4cHtFqtyY2IiIjUS1Z9bsaNG4e3334b7du3R/fu3ZGRkYE1a9bgpZdeAgBoNBrMnz8fb731FsLDw41DwYOCgvDEE084NngiIiKSBVklN++//z6WLFmCl19+GdeuXUNQUBBmzJiBpUuXGvd59dVXcfv2bUyfPh0FBQUYMmQIvv32W85xQ0RERABkNs+NPXCeGyIiIuVR7Dw3RERERNZickNERESqwuSGiIiIVIXJDREREakKkxsiIiJSFSY3REREpCpMboiIiEhVmNwQERGRqjC5ISIiIlVhckNERESqwuSGiIiIVIXJDREREakKkxsiIiJSFSY3REREpCpMboiIiEhVmNwQERGRqjC5ISIiIlVhckNERESqwuSGiIiIVIXJDREREakKkxsiIiJSFSY3REREpCpMboiIiEhVmNwQERGRqjC5ISIiIlVhckNERESqwuSGiIiIVIXJDREREakKkxsiIiJSFSY3REREpCpMboiIiEhVmNwQERGRqjC5ISIiIlVhckNERESqwuSGiIiIVIXJDREREamKVcnN2rVrAQBnzpxBVVWVJAERERERWaOZNQ/u06cPAOD111/H+fPn4enpie7du6Nnz57o0aMHHn30USliJCIiIjKbRhAEwZwdi4uL4evrCwDIzc1FSEhIvX1KSkpw5swZnDp1CqdPn8a7774rabBSKCoqgk6nQ2FhIbRaraPDISIiIjNYcv02u1lq3759xt+7dOmCpUuXorS01GQfHx8fDBgwAFOnTpVlYkNERETqZ3Zyk5+fb/x979692LNnD8LDw7Ft2zZbxEVEREQkilnJzenTp9GuXTvj34MGDUJaWhoSEhKwZMkSRERE4Pvvv7dZkERERETmMiu5+eabb/Dwww/X2/7CCy8gMzMTY8eOxZgxY/DUU0/h8uXLkgdJREREZC6zkpsRI0YgNTW10ftHjRqFqVOnYteuXejWrRteffVVlJSUSBYkERERkbnMSm4iIiJMamQ2btyImJgY9OrVCzqdDiNHjsT333+PmTNnYt26dUhPT0e3bt2Qnp5uUTAdOnSARqOpd5s1axYA4O7du5g1axZatWoFHx8fTJgwwaQvEBEREZHZQ8H/+c9/4qmnngIAhISEYMCAARg4cCAGDhyIiIgIeHp6muy/fPlybN++HadPnzY7mOvXr5tMBnj69Gn8/ve/R0pKCoYPH47Y2Fh89dVX2LZtG3Q6HWbPng0XFxccOXLE7OfgUHAikjO9HsjKAsLDgeBgR0dDJB+WXL/NTm5u3rwJPz8/s4PIz89HUFCQVTMXz58/H7t370ZWVhaKiorg7++P7du3G5Os8+fPo2vXrkhNTcXAgQPNOiaTGyKSq8REYPp0oLoacHEBNm0CYmIcHRWRPNhknhtLEhsAaNOmDfbv32/RY2orLy/H3//+d7z00kvQaDQ4fvw4KioqEBUVZdynS5cuaN++fZP9gcrKylBUVGRyIyKSG73+XmIDGH7OmGHYTkSWsdnCmRqNpsERVub64osvUFBQgMmTJwMA8vLy4O7ujhYtWpjsFxAQgLy8vEaPk5CQAJ1OZ7w1NLMyEZGjZWXdS2xqVFUBFy44Jh4iJZPtquCJiYkYM2YMgoKCrDrO4sWLUVhYaLzl5uZKFCERkXTCww1NUbW5ugKdOjkmHiIlk2Vy8+uvv2Lfvn2YOnWqcVtgYCDKy8tRUFBgsm9+fj4CAwMbPZaHhwe0Wq3JjYhIboKDDX1sXF0Nf7u6Ah98wE7FRGKYndz8/PPPqK5bZ2ojW7duRZs2bTB27FjjtoiICLi5uSE5Odm4LTMzEzk5OYiMjLRLXEREthQTA2RnAykphp/sTEwkTjNzd+zbty+uXr2KNm3aoGPHjjh27BhatWoleUDV1dXYunUrXnzxRTRrdi88nU6HmJgYxMXFwc/PD1qtFnPmzEFkZKTZI6WIiOQuOJi1NUTWMju5adGiBS5fvow2bdogOzvbZrU4+/btQ05ODl566aV6961duxYuLi6YMGECysrKEB0djfXr19skDiIiIlIms+e5mT59Oj766CO0bdsWOTk5CA4OhmtN43Adly5dkjRIKXGeGyIiIuWx5Pptds3Npk2bMH78eFy4cAFz587FtGnT4Ovra3WwRERERFIyO7kBgNGjRwMAjh8/jnnz5jG5IVIpLgFAREpmUXJTY+vWrSgoKMDq1atx7tw5AED37t3x0ksvQafTSRogEdkXlwAgIqUzu89Nbenp6YiOjoanpyceeughAMCxY8dw584dfPfdd+jXr5/kgUqFfW6IGqfXA6GhpjPluroahiWzBoeIHMkmfW5qW7BgAR577DFs3rzZOFy7srISU6dOxfz583Ho0CExhyUiB2tqCQAmN0SkFKKSm/T0dJPEBgCaNWuGV199Ff3795csOCKyr5olAOrW3HAJACJSElHLL2i1WuTk5NTbnpuby07GRArGJQCISA1E1dw888wziImJwTvvvINBgwYBAI4cOYKFCxdi4sSJkgZIRPYVEwNERxuaojp1YmJDRMojKrl55513oNFo8MILL6CyshIA4ObmhtjYWKxYsULSAInI/rgEABEpmajRUjVKS0tx8eJFAMADDzwALy8vyQKzFY6WIiIiUh6bj5aq4eXlhZ49e1pzCCIiIiJJiepQTERERCRXTG6IiIhIVZjcEBERkaowuSEiIiJVEd2hODk5GcnJybh27Rqq68zXvmXLFqsDIyIiIhJDVHLz5z//GcuWLUP//v3Rtm1baDQaqeMiIiIiEkVUcrNx40Zs27YNzz//vNTxEBER2Y1eb1gwNjycE1eqiag+N+Xl5cZlF4iInJVeD6SkGH6S8iQmAqGhwIgRhp+JiY6OiKQiKrmZOnUqtm/fLnUsRESKwQuj7dkyedTrgenTgZouo9XVwIwZTFTVQlSz1N27d7Fp0ybs27cPvXr1gpubm8n9a9askSQ4IiI5auzCGB3Npg2pJCbeK2MXF8Nq9TEx0h0/K+ve/69GVZVhwVj+D5VPVHLz888/o0+fPgCA06dPm9zHzsVEpHa8MNqWPZLH8HBD0lT7/+jqCnTqJM3xybFEJTcpKSlSx0FEpBi8MNqWPZLH4GBDbdCMGYZju7oCH3zA5FQtOIkfEZGFai6Mrq6Gv3lhlFZN8libLZLHmBggO9vQryc7W9pmL3Is0ZP4FRQUIDExEefOnQMAdOvWDTExMdDpdJIFR2QuDueUFsvz/mJiDM0kFy4YLrosJ+nYs1YlOJj/OzXSCIIgWPqg9PR0REdHw9PTEw899BAA4NixY7hz5w6+++479OvXT/JApVJUVASdTofCwkJotVpHh0MSsHXHQ2fD8iS50OuZPNI9lly/RSU3Q4cORadOnbB582Y0a2ao/KmsrMTUqVNx6dIlHDp0SFzkdsDkRl30esMw3Lp9H7Kz+WEoBsuTiOTKkuu3qD436enpeO2114yJDQA0a9YMr776KtLT08UckkiUpjoekuVYnkSkBqKSG61Wi5ycnHrbc3Nz4evra3VQROayV8dDZ8HyJCI1EJXcPPPMM4iJicE//vEP5ObmIjc3F59++immTp2KiRMnSh0jUaM4akVaLE8iUgNRfW7Ky8uxcOFCbNy4EZWVlQAANzc3xMbGYsWKFfDw8JA8UKmwz406seOhtFieRCQ3Nu9QXKO0tBQXL14EADzwwAPw8vISeyi7YXJDRESkPJZcv0XPcwMAXl5e6NmzpzWHICIiIpKU2clNXFwc3nzzTXh7eyMuLq7JfblwJhERETmK2clNRkYGKioqjL83hgtnEpEz4+zORI5ndnJTe7HMpKQkBAcHw6XOmFFBEJCbmytddERECsLZnYnkQdRQ8LCwMPznP/+pt/3mzZsICwuzOigiotr0esPihnq9oyNpnF5/L7EBDD9nzJB3zERqJSq5aWyAVUlJCZo3b25VQEREtSUmGpaEGDHC8DMx0dERNYyzOxPJh0WjpWo6Ems0GixdutRk6HdVVRXS0tLQp08fSQMkIufVWG1IdLT8+rPUzO5cd10uzu5MZH8WJTc1HYkFQcCpU6fg7u5uvM/d3R29e/dGfHy8tBESkdNqqjZEbslNzezOM2YYYuTszkSOI2oSvylTpmDdunWKnASPk/gRKYcSVynn7M5EtmHzVcG3bt3KxICIbE6Ja10FBwPDh8s7RiK1s2qG4rNnzyInJwfl5eUm2x977DGrgiIiqhETY+hjw9oQIjKXqOTm0qVLePLJJ3Hq1CloNBrj6KmaCfyqqqpEB/Tbb7/htddewzfffIPS0lJ06tQJW7duRf/+/QEY+vu88cYb2Lx5MwoKCjB48GBs2LAB4eHhop+TiOQtOJhJDRGZT1Sz1Lx58xAWFoZr167By8sLZ86cwaFDh9C/f38cOHBAdDC3bt3C4MGD4ebmhm+++QZnz57F6tWr0bJlS+M+K1euxHvvvYeNGzciLS0N3t7eiI6Oxt27d0U/LxGR3Chhbh8iuRLVobh169bYv38/evXqBZ1Ohx9//BGdO3fG/v378corrzS5PENTFi1ahCNHjuD7779v8H5BEBAUFIRXXnnFOCqrsLAQAQEB2LZtG5599tn7Pgc7FBOR3HGmY6L6bN6huKqqCr6+vgAMic6VK1cAAKGhocjMzBRzSADAv/71L/Tv3x9PP/002rRpg759+2Lz5s3G+y9fvoy8vDxERUUZt+l0OgwYMACpqakNHrOsrAxFRUUmNyIiueJMx0TWE5Xc9OjRAz/99BMAYMCAAVi5ciWOHDmCZcuWoWPHjqKDuXTpkrH/zJ49exAbG4u5c+ciKSkJAJCXlwcACAgIMHlcQECA8b66EhISoNPpjLeQkBDR8RER2RpnOiaynqgOxX/84x9x+/ZtAMCyZcvw6KOPYujQoWjVqhU+/fRT0cFUV1ejf//+WL58OQCgb9++OH36NDZu3IgXX3xR1DEXL15snFkZMFRrMcEhIrniTMdE1hOV3ERHRxt/79SpE86fP4+bN2+iZcuWxhFTYrRt2xbdunUz2da1a1fs3LkTABAYGAgAyM/PR9u2bY375OfnN7rsg4eHBzw8PETHRERkT5zpmMh6Zic3cXFxePPNN+Ht7W1SE9KQNWvWiApm8ODB9frs/PLLLwgNDQVgWI08MDAQycnJxmSmqKgIaWlpiI2NFfWcRDX0ekOTQHg4LyTkWJzbh8g6Zic3GRkZqKioMP7eGGtqbhYsWIBBgwZh+fLl+N///V/8+OOP2LRpEzZt2mQ89vz58/HWW28hPDwcYWFhWLJkCYKCgvDEE0+Ifl4ijk4hueHcPkTiiRoKbku7d+/G4sWLkZWVhbCwMMTFxWHatGnG+2sm8du0aRMKCgowZMgQrF+/Hr/73e/MOj6HglNdSly/iIjI2Vhy/ZZdcmNrTG6orpQUYMSIhrcPH273cIiIqAE2n+dm6tSpVs1ETCQnNaNTauPoFCIi5RKV3Fy/fh2jR49GSEgIFi5caJzzhkiJlLjyNBERNU50s9StW7ewY8cObN++Hd9//z26dOmC5557Dn/4wx/QoUMHicOUDpulqDF6PUenEBHJld373Oj1enzyySfYsmULsrKyUFlZae0hbYbJDRERkfLYvM9NbRUVFUhPT0daWhqys7PrLY1AREREZE+ik5uUlBRMmzYNAQEBmDx5MrRaLXbv3g09V3cjIiIiBxK1/EK7du1w8+ZNjB49Gps2bcK4ceO4xAERKQpnpCZSL1HJzZ/+9Cc8/fTTaNGihcThEBHZHmekJlI3Uc1S06ZNw6lTpzBp0iQMGjQIv/32GwDgb3/7Gw4fPixpgEREUtLr7yU2gOHnjBmG7URypdcbJhbleWoeUcnNzp07ER0dDU9PT5w4cQJlZWUAgMLCQixfvlzSAIlq4xucrJWVZbrUBmBYffvCBcfEQ3Q/iYmGJWJGjDD8TEx0dETyJyq5eeutt7Bx40Zs3rwZbm5uxu2DBw/GiRMnJAuOqDa+wUkKSp+Rmgm+c2FNoziikpvMzEwMGzas3nadToeCggJrYyKqh29w+1LzBVTJM1IzwXc+rGkUR1RyExgYiAsNlOzhw4fRsWNHq4MiqotvcPtxhgtoTIxh1feUFMNPJXQmZoLvnJRe0+goojsUz5s3D2lpadBoNLhy5Qo+/vhjxMfHIzY2VuoYifgGtxNnuoAGBxtWfVdCjQ3ABN9ZKbmm0ZFEDQVftGgRqqurMXLkSJSWlmLYsGHw8PBAfHw85syZI3WMRMY3+IwZhg90vsFto6kLKMvasWoS/Nr/Hyb4ziEmBoiO5tp3lrBqbany8nJcuHABJSUl6NatG3x8fKSMzSa4tpSycXFL29LrDU1RdS+g2dksbzlITKyf4CuhSY1ICnZfOFNJmNwQNY0XUHljgk/OyibJTVxcnNkBrFmzxux97Y3JDdH98QJKRHJjyfXb7D43GRkZJn+fOHEClZWV6Ny5MwDgl19+gaurKyIiIkSETOSc5Lq+UXCwvOIhIrKE2clNSkqK8fc1a9bA19cXSUlJaNmyJQDg1q1bmDJlCoYOHSp9lEQqxPWN6pNrskdEyiKqz027du3w3XffoXv37ibbT58+jVGjRuHKlSuSBSg1NkuRHLDjbn1M9oioKZZcv0XNc1NUVITr16/X2379+nUUFxeLOaTTUvNMsNQ4zlliypnm1yEi2xOV3Dz55JOYMmUKPv/8c+j1euj1euzcuRMxMTEYP3681DGqljPMBEsNs9WkhEpNlm2R7DVUFkotHyJrOd25L4hw+/ZtITY2VvDw8BBcXFwEFxcXwd3dXYiNjRVKSkrEHNJuCgsLBQBCYWGhQ+PIzRUEFxdBAO7dXF0N28k5fPih4X9e87//8EPrj1dzTrm4WH88e5L6/dBQWSi5fMhyubmCsH8/P1MFQT3nviXXb6vmubl9+zYuXrwIAHjggQfg7e0tUcplO3Lpc5OSYqixaWj78OF2D4ccRKoh12rowyPV/DqNlUV1tSFtqr1NSeVD5mP/rXvU8NlQwyZDwRvi7e2NXr16WXMIp8Wp1AkQN+S6oRFFalg2Qaop5hsri7qUVj5knsb6b0VHO+f/Wg2fDWKI6nND1uNiaCRGY/201LKwqBSLWTZWFhpN/W1KKx+6P3bWN6WWzwZLMblxoJgYQ9VgSorhp7NWm5J5mhpRxGT5nsbKYvNmlo8zcNaLeWOc9bOBa0sRKYQ5/bS4bMI9DZUFy8c5cH20+tRw7nPhzCYwuSGlUlPHQJInNc0QrYaLOZmySYditSycSaRUNdXLdb+R8oObpKC2EUZcH01aSkt8za65eeSRR0z+bmrhzP3790sfqURYc0NKx2+kJDU51Qoq7SLqDOSS+Nqk5oYLZxLJA7+RktTkMly4qYsokx7zSF1OSh1aL2q01OrVq5GQkGBMbACgZcuWeOutt7B69WrJgiNSI6ebBp1kTw4jjJoaDcilasxji3JS6tB6LpxJZEdK+ZBmAuZc5DBcuLGLaGoqF1U1h60Wn5VD4isGF84kshMpP3xsmXwoJQEjaTl63q3GLqKCoMyaA3uzVQ2LHBJfMUQNBS8tLUV8fDy2bNmCiooKAECzZs0QExODVatWyXqNKXYoJkeRaj0xW3buk1PHUnI+Dc1PEx3Nc9Ictn7vymEgg93mueHCmUTmk+LDx9YfYI0lYJ99Bjz9tPXHJ7qfhi6inJTPPGovJ0uu3xY3S1VUVGDkyJHIysoyLpzZq1cvRSQ2RI4kRfWurTv3NdQ0AADPPMPmKbKPhtYXc3STmVKwnO4RVXPj7++Po0ePIjw83BYx2RRrbsjRrKnetUezUe1vf7U5simAw4DJ1niOyZ9Na24AYNKkSUjk1zgiUaxZ+doenftiYoDt2+tvd1QnTnZwJlvjOaY+ompu5syZg48++gjh4eGIiIio1yQl5+UXWHOjPPxGVZ+tO/fJpWOxXOIg9eI5phw2maG4ttOnT6Nfv34ADMsu1KbRaMQckqhBcpn2W25sPUuxXNaxksvMuaQutb8w8RyzjGK+bAoy8sYbbwgATG6dO3c23n/nzh3h5ZdfFvz8/ARvb29h/PjxQl5enkXPUVhYKAAQCgsLpQ6fJJabKwguLoJgmOnCcHN1NWwn+8jNFYSUFMeVOc8BktqHH947p1xcBGHVKp5j5qpbdh9+aN/nt+T6LarmpsbZs2eRk5OD8vJy4zaNRoNx48aJPmb37t2xb98+49/Nmt0LccGCBfjqq6+wY8cO6HQ6zJ49G+PHj8eRI0dEP5/SKSaLFoHfqBzP0etYyaUGidShoYk0Fy0CVqwAFi/mOdYUpa0xJSq5uXTpEp588kmcOnUKGo0Gwn+77dQ0SVXVHWZhSUDNmiEwMLDe9sLCQiQmJmL79u0Y8d+JOLZu3YquXbvihx9+wMCBA0U/p1KpvcmmZlhy3bbwhqb9VnOS5+xiYgwfoI6eQIyUr7EvTA8+aOhjw3OscUr7silqtNS8efMQFhaGa9euwcvLC2fOnMGhQ4fQv39/HDhwwKqAsrKyEBQUhI4dO+K5555DTk4OAOD48eOoqKhAVFSUcd8uXbqgffv2SE1NbfR4ZWVlKCoqMrmpga3WEZETc0cGcaSD+lkzwoyoRlPrJMnlHJPrum5KW2NKVHKTmpqKZcuWoXXr1nBxcYGLiwuGDBmChIQEzJ07V3QwAwYMwLZt2/Dtt99iw4YNuHz5MoYOHYri4mLk5eXB3d0dLVq0MHlMQEAA8vLyGj1mQkICdDqd8RYSEiI6PjlR6kqtlrrfpFTOkOSRfMj1wkPmkfs6SXL+oib3sqtLVLNUVVUVfH19AQCtW7fGlStX0LlzZ4SGhiIzM1N0MGPGjDH+3qtXLwwYMAChoaH47LPP4OnpKeqYixcvRlxcnPHvoqIiVSQ4ljTZKF1T/T6UVlVKyqX2ZmBnIddmTiX0aZFr2TVEVM1Njx498NNPPwEw1LasXLkSR44cwbJly9CxY0fJgmvRogV+97vf4cKFCwgMDER5eTkKCgpM9snPz2+wj04NDw8PaLVak5saKC2LthWlVZWSMrGGUF3k0gRVm1Jq4+VYdg0Rldz88Y9/RPV//wvLli0zNh99/fXXeO+99yQLrqSkBBcvXkTbtm0REREBNzc3JCcnG+/PzMxETk4OIiMjJXtOJeE6IkzyyD6UcuEh5eIXNWlZtSp4bTdv3kTLli2tmsQvPj4e48aNQ2hoKK5cuYI33ngDJ0+exNmzZ+Hv74/Y2Fh8/fXX2LZtG7RaLebMmQMAOHr0qNnPwRmK1cnWM/baG0d/yQtnsSV7UPuq3tay+QzFDfHz87P6GHq9HhMnTsSNGzfg7++PIUOG4IcffoC/vz8AYO3atXBxccGECRNQVlaG6OhorF+/3urnJeVz9HwsUmLfDvnhfDtkD0rq0yJ3Ztfc1O6Uez9cW4pIHNYQyJvaagiJlMQmNTcZGRkmf584cQKVlZXo3LkzAMMaU66uroiIiBARMhEBHP0ld2qqISRSM7OTm5SUFOPva9asga+vL5KSktCyZUsAwK1btzBlyhQMHTpU+iiJ7MiR/V2caYi/mrHPFJFjiRottXr1aiQkJBgTGwBo2bIl3nrrLaxevVqy4IjszdGTaHH0l/I5+hwiIpHJTVFREa5fv15v+/Xr11FcXGx1UESOIJe5TJxtiL+aZv2VyzlE5OxEJTdPPvkkpkyZgs8//xx6vR56vR47d+5ETEwMxo8fL3WMRHYhp7lMlDJRlrXUVsshp3OIyJmJmuemtLQU8fHx2LJlCyoqKgAYVvOOiYnBqlWr4O3tLXmgUuFoKWoMRyrZlxrLW42viUguLLl+i6q58fLywvr163Hjxg1kZGQgIyMDN2/exPr162Wd2BA1hf1d7EuNtRw8h4jkQbIZipWCNTd0P5zLxD7UXMvBc4hIenaZoTg5ORnJycm4du2acZ2pGlu2bBF7WEXj8E914Fwm9qHmWX95DhE5lqjk5s9//jOWLVuG/v37o23btlatJ6UWnDKfyHKcbp6IbEFUs1Tbtm2xcuVKPP/887aIyaZs0Syl5up1IkdhTSjRPXw/2KFDcXl5OQYNGiQqODVSY8dIIkdS2xBxImvw/WA5UcnN1KlTsX37dqljUayaKfNr45T5ROJwIjyie/h+EEdUn5u7d+9i06ZN2LdvH3r16gU3NzeT++W8KrgtqLljJJG9cfFQonv4fhBHVHLz888/o0+fPgCA06dPm9znrJ2L2TGSSBpcPJToHr4fxBGV3NReIZzu4fBPIlNiOkGyJpToHr4fxLFqEr+zZ88iJycH5eXl9w6o0WDcuHGSBGcLnMSPyMDWoy+snR6BE+ER3cP3g2XXb1HJzaVLl/Dkk0/i1KlT0Gg0qDlETZNUVVWViLDtg8kNke3nZeL0CEQkNZsPBZ83bx7CwsJw7do1eHl54cyZMzh06BD69++PAwcOiDkkkazo9UBKijpHJNhj9AWnRyBA3e8jkjdRyU1qaiqWLVuG1q1bw8XFBS4uLhgyZAgSEhIwd+5cqWMksiu1zylhj8RDSdMj8AJsG0p/H/G8UDZRyU1VVRV8fX0BAK1bt8aVK1cAAKGhocjMzJQuOiI7c4Y5JeyReChldWylX4DlSg7vI2uSE54XyicquenRowd++uknAMCAAQOwcuVKHDlyBMuWLUPHjh0lDZDInpyhOcVeiUdMjKGPTUqK4afc1lqTwwVYrRz9PrImOWnovJg+HfjsM54bSiIqufnjH/9oXAl82bJluHz5MoYOHYqvv/4a7733nqQBEtmTFLUaSqjOtlfiERwMDB8uvxobwPEXYDVzZLOktUlrQ+dFdTXwzDOsxVESUclNdHQ0xo8fDwDo1KkTzp8/j//85z+4du0aRowYIWmARPZkba2Gkqqz5Zx42IOS+gUpjSObJa1NWhs6L2qwdk85RCU3OTk5qDuC3M/PDxqNBjk5OZIERuQoYms12MyhLErpF6RUjmqWtDZprXte1MXaPWXUTotKbsLCwnD9+vV622/cuIGwsDCrgyJyNDG1Gmpv5lDCB5ql5N4vSOkcUTsoRdJac15s2NDw/d7eVoepWEqpnRaV3AiC0OAaUiUlJWjevLnVQZF9qfGi5QhqbuZQygeaGA1dgPmeUDYpktbgYKBz54bvu33bmuiUS0m10xatLRUXFwfAMBPxkiVL4OXlZbyvqqoKaWlpxgU1SRlsPVOtM1HrGjCNfaBFRyv/tTWE7wl1kGKtPy5aaUpJK5RblNxkZGQAMNTcnDp1Cu7u7sb73N3d0bt3b8THx0sbIdmMs1207EGNq8Mr6QPNWnxPUG1q/cIilpKSPYuSm5rVwKdMmYJ169ZxbSaFc6aLlj2pbXV4JX2gWYvvCapLjV9YxFJSsidq4cw7d+5AEARjs9Svv/6KXbt2oVu3bhg1apTkQUqJC2few8UNyVyJifU/0JTUVGPuCuh8TxDdn6NWKLf5wpmPP/44PvroIwBAQUEBHnroIaxevRqPP/44NjTWvZxkh0NhyVxKHlVkSWdovieI7k8Jc2SJqrlp3bo1Dh48iO7du+PDDz/E+++/j4yMDOzcuRNLly7FuXPnbBGrJFhzU5+jsnAiWxNbE8P3BKmZuTWZcmPJ9duiPjc1SktLjQtnfvfddxg/fjxcXFwwcOBA/Prrr2IOSQ6ktj4iRDXE9qGxxXtCqRcUUhdnGQ0oqlmqU6dO+OKLL5Cbm4s9e/YY+9lcu3aNtSFEJBtymXtIzfMEkXIoaZ4aa4lKbpYuXYr4+Hh06NABAwYMQGRkJABDLU7fvn0lDZCIzMfJ50zJoQ+NM11QlMCZ3yNqn0W9NlHJzVNPPYWcnBykp6fj22+/NW4fOXIk1q5dK1lwRGQ+1g40zNGdoeV4QXHWC7yzv0fkUpNpD6I6FCsZOxSTGnEIs3zJ7X/jLH0u6pLb/8FRlDytg82HghORvMixdoAMpG4as6bWxZmbyNT2HhF7Hji6JtNemNwQqYAzVTcrkVQXFGubVdR2gbeEmt4j1p4HSpinxlpMbohkyNJvZXLoOEtNs/aCIkWti5ou8JZSy3vEmWvfLMHkhkhmxH4rc5bqZkdzVGdcKWpd7H2Bl1vHZTW8R5y59s0SopOb77//HpMmTUJkZCR+++03AMDf/vY3HD58WLLgiJyNtd/KnKG62VJSXmAdOdpGqlqX6Ghg+3bgs89se4GX68gkpb9HnLn2zRKikpudO3ciOjoanp6eyMjIQFlZGQCgsLAQy5cvlyy4FStWQKPRYP78+cZtd+/exaxZs9CqVSv4+PhgwoQJyM/Pl+w5iRzJnt/K5Pat2hakvMA6ujlAilqXmvJ45hng2WeBPXtsE6ujy0rN1NK8ZnOCCH369BGSkpIEQRAEHx8f4eLFi4IgCMKJEyeEgIAAMYes58cffxQ6dOgg9OrVS5g3b55x+8yZM4WQkBAhOTlZSE9PFwYOHCgMGjTI7OMWFhYKAITCwkJJ4iSSUm6uILi4CAJw7+bqatgupQ8/vPc8Li6Gv9VG6rLcv9/0WDW3lBTx8e3fb3k8ubmG5xTzOHucW4IgfVlRfWLPAyWz5PotquYmMzMTw4YNq7ddp9OhoKDAumwLQElJCZ577jls3rwZLVu2NG4vLCxEYmIi1qxZgxEjRiAiIgJbt27F0aNH8cMPP1j9vESOZo9vZc7yrVrqWjApmwOsqVES26xiz1rBhsrKxYVNJ1JSevOarYlKbgIDA3GhgXfE4cOH0bFjR6uDmjVrFsaOHYuoqCiT7cePH0dFRYXJ9i5duqB9+/ZITU21+nmJ5MDWnR6dpUOi1H0TpEo8HZVc2rOvRk1ZaTT3tgmC7ZrBiOoSldxMmzYN8+bNQ1paGjQaDa5cuYKPP/4Y8fHxiI2NtSqgTz/9FCdOnEBCQkK9+/Ly8uDu7o4WLVqYbA8ICEBeXl6DxysrK0NRUZHJjUjubPmtzFk6JNqiFkyKxNNRyaW9+2pER9dPbtRYQ0jy1EzMgxYtWoTq6mqMHDkSpaWlGDZsGDw8PBAfH485c+aIDiY3Nxfz5s3D3r170bx5c9HHqS0hIQF//vOfJTkWkRrUXOTqTsGuxurtmBjDRfbCBUPyJsVrDA627jg1yWXdZQDskVzaojwa01QSp8ZzjeTFqrWlysvLceHCBZSUlKBbt27w8fGxKpgvvvgCTz75JFxrvloAqKqqgkajgYuLC/bs2YOoqCjcunXLpPYmNDQU8+fPx4IFC+ods6yszDiaCzCsTRESEsK1pcjp6fX2uchRfUpe38dc9lzLSa83JFPh4TyX1cyStaVEJTdTp07FpEmTMHz4cLExNqi4uBi//vqrybYpU6agS5cueO211xASEgJ/f3988sknmDBhAgBD5+YuXbogNTUVAwcOvO9zcOFM58IPPZIrZ0gu7ZHEOetCoM7I5snN448/jj179sDf3x/PPvssJk2ahN69e4sOuCnDhw9Hnz598O677wIAYmNj8fXXX2Pbtm3QarXGZrCjR4+adTwmN86DH3pEjmfLJM5ZV/p21i9tNl8V/Msvv8TVq1exZMkSHDt2DP369UP37t2xfPlyZGdnizmk2dauXYtHH30UEyZMwLBhwxAYGIjPP//cps9JyuMsw52J5M6WneOdZeRfbXKd+VlurOpzU0Ov1+OTTz7Bli1bkJWVhcrKSiliswnW3DiHlBTDm7+h7RK3phKRgzhbzY2zvd66bF5zU1tFRQXS09ORlpaG7OxsBAQEWHtIIqs5y3BnImfmbEsROGNNlViik5uUlBRMmzYNAQEBmDx5MrRaLXbv3g096/3JhsxdD8nZPvSInJUaVvo2F7+0mU/UPDft2rXDzZs3MXr0aGzatAnjxo2Dh4eH1LERmbC0g7A95/QgIsexdu4hpXCmOaqsJarPzebNm/H000/XmylYCdjnRpmcva2ZiKiGM0wj0BBLrt+iam6mTZsmKjAisTjbKdmCsw6plQuWvzjOUlNlDbOTm7i4OLz55pvw9vZGXFxck/uuWbPG6sCIanPklPX2xA97++E8SI7F8idbMju5ycjIQEVFhfH3xmhqr5RGJBFnaGvmh7196PXA0aPAtGmGxRyBe/MgRUer65ySq8bmoWL5k1QkmedGSdjnRtnU2tbMPkX2UTuBbAjnQbIPzkNFYth8npucnBw0lhPl5OSIOSSRWWw526kjcf4K26tbW1CXGps55YpDmuXD3Ok1lEZUchMWFobr16/X237jxg2EhYVZHRSRs+GHve01lEDWUGMzp5xxHip5UPNSDqKSG0EQGuxbU1JSgubNm1sdFJGz4Ye97TWWQH72mfonf5MjZ5p8T47Uvv6eRUPBa0ZJaTQaLFmyBF5eXsb7qqqqkJaWhj59+kgaIJGz4KSDttVYp/Snn3Z0ZM6LQ5odR+3Ta1iU3NSMkhIEAadOnYK7u7vxPnd3d/Tu3Rvx8fHSRkjkRPhhb1vOlEByWgFqitqn17AouUlJSQEATJkyBe+99x58fX1tEhQRka04QwLJaQXoftQ+vYaoPjfh4eHYsWNHve1btmzBX/7yF6uDIiIicdTel8KZST2ySc39nkQlN5s2bUKXLl3qbe/evTs2btxodVBERCQOpxVQJ1uNbFLr9Bqikpu8vDy0bdu23nZ/f39cvXrV6qCIiEgcTiugPqyNs5yo5CYkJARHjhypt/3IkSMICgqyOigiIhKnoWkFVqww1OjwYmh/UjQlsTbOcqJXBZ8/fz4qKiow4r9zaCcnJ+PVV1/FK6+8ImmARERkmdqjwo4dA157jZ2LHUGqjt1qH9lkC6LWlhIEAYsWLcJ7772H8vJyAEDz5s3x2muvYenSpZIHKSWuLUVEzoJrljmO1GWfmFh/ZJOzJamWXL+tWjizpKQE586dg6enJ8LDw+Hh4SH2UHbD5IaInAUXqHQcW5S9WhcONpcl129RzVI1fHx88OCDD1pzCCIishF7Nmdw0kBTtih7Z5ijSSpmJzdxcXF488034e3tbVyGoTFr1qyxOjAiIrKOvSZq46SB9al9kjy5M7tZ6pFHHsGuXbvQokULPPLII40fUKPB/v37JQtQamyWIiJnY8vmDPbraZotyt5Za8ls0ixVs/RC3d+JiEjebNmcofYFGK0lddmzlsw8oua5ISIiZZJ6Cn9OGmg/1k7mJ/X/Xs4s6nNjLva5ISKSH1t862ffEvuxppbM2Wp8LOpzU9uJEydQWVmJzp07AwB++eUXuLq6IiIign1uiIhkxtZ9Y5x9mLI9iP0fqqVflM373KxZswa+vr5ISkpCy5YtAQC3bt3ClClTMHToUJFhExGRrdi6bwyHKdue2FoyZ+wXJWoSv3bt2uG7775D9+7dTbafPn0ao0aNwpUrVyQLUGqOqrlx1t7tRHLlbO9JtXx7VyNLz0VLa8nU8r+35PotqkNxUVERrl+/Xm/79evXUVxcLOaQqmarpeqJSBxnfE82tKAm+8Y4nphzMTjYMMuxuf87Z/zfi6q5eeGFF/D9999j9erVeOihhwAAaWlpWLhwIYYOHYqkpCTJA5WKvWtu1JIxE6mFs78n2TdGPux9Lir9f2/z5Rc2btyI+Ph4/OEPf0BFRYXhQM2aISYmBqtWrRJzSNVyxrZOIjlz9vck+8bIh73PRWf631u1cObt27dx8eJFAMADDzwAb29vyQKzFdbcEDk3vidJLnguWsbmfW4A4Pvvv8eMGTMwc+ZMtGrVCt7e3vjb3/6Gw4cPiz2kKjljWyeRnPE9SXLBc9F2RCU3O3fuRHR0NDw9PXHixAmUlZUBAAoLC7F8+XJJA1SDmBhDJp6SYvip5omTiJSA70llcIYZdXku2oaoZqm+fftiwYIFeOGFF+Dr64uffvoJHTt2REZGBsaMGYO8vDxbxCoJTuJHRCR/jppR19mmCFASmzdLZWZmYtiwYfW263Q6FBQUiDkkERERAOvXUBLLGacIUCtRyU1gYCAuXLhQb/vhw4fRsWNHq4MiIiLn1dQoIltxVEJFtiEquZk2bRrmzZuHtLQ0aDQaXLlyBR9//DHi4+MRGxsrdYxEROREHLHSuCMSKrIdUfPcLFq0CNXV1Rg5ciRKS0sxbNgweHh4ID4+HnPmzJE6RiIiciKOWGm8JqGqOyzblgmVrTlz/yGr5rkpLy/HhQsXUFJSgm7dusHHx0fK2GyCHYqJiJTB3jPqJibWT6iUOnrJUR2ybcmS67fFyU1FRQVGjx6NjRs3Ijw83KpAHYHJDRERNUbpSxQA6p0c0KbLL7i5ueHnn38WHRwREZFcqWGJAmdfYgQQ2aF40qRJSLTBGLkNGzagV69e0Gq10Gq1iIyMxDfffGO8/+7du5g1axZatWoFHx8fTJgwAfn5+ZLHQUREpFSO6JAtN6I6FFdWVmLLli3Yt28fIiIi6q0ptWbNGlHBBAcHY8WKFQgPD4cgCEhKSsLjjz+OjIwMdO/eHQsWLMBXX32FHTt2QKfTYfbs2Rg/fjyOHDki6vmIiIjUxhEdsuVGVIfiRx55pPEDajTYv3+/VUHV5ufnh1WrVuGpp56Cv78/tm/fjqeeegoAcP78eXTt2hWpqakYOHCgWcdjnxsiInIGaug/VJtN+9wAQEpKiqjALFFVVYUdO3bg9u3biIyMxPHjx1FRUYGoqCjjPl26dEH79u2bTG7KysqMa18BhsIhIiJSOzX0HxLLoj431dXV+Mtf/oLBgwfjwQcfxKJFi3Dnzh1JAzp16hR8fHzg4eGBmTNnYteuXejWrRvy8vLg7u6OFi1amOwfEBDQ5FpWCQkJ0Ol0xltISIik8RIREZG8WJTcvP3223j99dfh4+ODdu3aYd26dZg1a5akAXXu3BknT55EWloaYmNj8eKLL+Ls2bOij7d48WIUFhYab7m5uRJGS0RERHJjUbPURx99hPXr12PGjBkAgH379mHs2LH48MMP4VK3a7ZI7u7u6PTfLt0RERE4duwY1q1bh2eeeQbl5eUoKCgwqb3Jz89HYGBgo8fz8PCAh4eHJLERERGR/FmUkeTk5OB//ud/jH9HRUUZ15aylerqapSVlSEiIgJubm5ITk423peZmYmcnBxERkba7PmJiIhIWSyquamsrETz5s1Ntrm5uaGiokKSYBYvXowxY8agffv2KC4uxvbt23HgwAHs2bMHOp0OMTExiIuLg5+fH7RaLebMmYPIyEizR0oRERGR+lmU3AiCgMmTJ5s089y9exczZ840mevm888/FxXMtWvX8MILL+Dq1avQ6XTo1asX9uzZg9///vcAgLVr18LFxQUTJkxAWVkZoqOjsX79elHPRUREROpk0Tw3U6ZMMWu/rVu3ig7I1jjPDRERkfLYbJ4bOSctRERE9qTXG9ZxCg933vlk5EqaIU5EREROJDHRsPL2iBGGnzZYbpGswOSGiIjIAno9MH36vZW3q6sN6zjp9Y6Ni+5hckNERGSBrKx7iU2NqirDOk4kD0xuiIiILBAeDtSdt9bV1bBAJckDkxsiIiILBAcDmzYZEhrA8PODD8R3KtbrgZQUNmtJickNERGRhWJigOxsQ1KSnW34Wwx2TLYNi+a5UQPOc0NERHKg1xsSmtr9d1xdDckSh5bXZ8n1mzU3REREFpCqGYkdk22HyQ0REZGZpGxGYsdk22FyQ0REZAap57eRumMy3WPR8gtERETOqqlmJLEJSUwMEB1tOEanTkxspMLkhoiIyAw1zUh1OwBb24wUHMykRmpsliIiIjIDm5GUgzU3REREZmIzkjIwuSEiIrIAm5Hkj81SREREpCpMboiIiEhVmNwQERGRqjC5ISIiIlVhckNERESqwuSGiIiIVIXJDREREakKkxsiIiJSFSY3REREpCpMboiIiEhVmNwQERGRqjC5ISJSKb0eSEkx/CSyFzmcd0xuiIhUKDERCA0FRoww/ExMdHRE5Azkct5pBEEQHPPUjlFUVASdTofCwkJotVpHh0NEJDm93nBhqa6+t83VFcjOds7VrPV6ICsLCA93ztdvL7Y+7yy5frPmhohIZbKyTC8wAFBVBVy44Jh4HEkuNQnOQE7nHZMbIiKVCQ8HXOp8uru6Ap06OSYeR9HrgenT711wq6uBGTPYB8lW5HTeMbkhIlKZ4GBg0ybDhQUw/PzgA+drkpFTTYIzkNN5xz43REQqpdcbLuSdOjlfYgOw75Gj2Oq8s+T63Uy6pyUiIjkJDnbui3hNTcKMGYYaG2etwbI3OZx3TG6IiEi1YmKA6GjnrsFyRkxuiIhI1eRQk0D2xQ7FREREpCpMboiIiEhVmNwQERGRqjC5ISIiIlVhckNERESqwuSGiIiIVEVWyU1CQgIefPBB+Pr6ok2bNnjiiSeQmZlpss/du3cxa9YstGrVCj4+PpgwYQLy8/MdFDERERHJjaySm4MHD2LWrFn44YcfsHfvXlRUVGDUqFG4ffu2cZ8FCxbg3//+N3bs2IGDBw/iypUrGD9+vAOjJiIiIjmR9dpS169fR5s2bXDw4EEMGzYMhYWF8Pf3x/bt2/HUU08BAM6fP4+uXbsiNTUVAwcOvO8xubYUERGR8lhy/ZZVzU1dhYWFAAA/Pz8AwPHjx1FRUYGoqCjjPl26dEH79u2Rmpra4DHKyspQVFRkciMiIiL1km1yU11djfnz52Pw4MHo0aMHACAvLw/u7u5o0aKFyb4BAQHIy8tr8DgJCQnQ6XTGW0hIiK1DJyIimdHrgZQUw09SP9kmN7NmzcLp06fx6aefWnWcxYsXo7Cw0HjLzc2VKEIiIlKCxEQgNBQYMcLwMzHR0RGRrckyuZk9ezZ2796NlJQUBNda7SwwMBDl5eUoKCgw2T8/Px+BgYENHsvDwwNardbkRkREzkGvB6ZPB6qrDX9XVwMzZrAGR+1kldwIgoDZs2dj165d2L9/P8LCwkzuj4iIgJubG5KTk43bMjMzkZOTg8jISHuHS0REMpeVdS+xqVFVBVy44Jh4yD6aOTqA2mbNmoXt27fjyy+/hK+vr7EfjU6ng6enJ3Q6HWJiYhAXFwc/Pz9otVrMmTMHkZGRZo2UIiIi5xIeDri4mCY4rq5Ap06Oi4lsT1Y1Nxs2bEBhYSGGDx+Otm3bGm//+Mc/jPusXbsWjz76KCZMmIBhw4YhMDAQn3/+uQOjJiIiuQoOBjZtMiQ0gOHnBx8YtpN6yXqeG1vgPDdERM5Hrzc0RXXqxMRGqSy5fsuqWYqIiMgWgoOZ1DgTWTVLEREREVmLyQ0RERGpCpMbIiIiUhUmN0RERKQqTG6IiIhIVZjcEBERkaowuSEiIiJVYXJDREREqsLkhoiIiFSFyQ0RERGpitMtv1CzlFZRUZGDIyEiIiJz1Vy3zVkS0+mSm+LiYgBASEiIgyMhIiIiSxUXF0On0zW5j9OtCl5dXY0rV67A19cXGo3G0eHITlFREUJCQpCbm8tV05vAcjIPy8k8LCfzsJzMo9ZyEgQBxcXFCAoKgotL071qnK7mxsXFBcFcGva+tFqtqt4UtsJyMg/LyTwsJ/OwnMyjxnK6X41NDXYoJiIiIlVhckNERESqwuSGTHh4eOCNN96Ah4eHo0ORNZaTeVhO5mE5mYflZB6WkxN2KCYiIiJ1Y80NERERqQqTGyIiIlIVJjdERESkKkxuiIiISFWY3KjQoUOHMG7cOAQFBUGj0eCLL74wuT8/Px+TJ09GUFAQvLy8MHr0aGRlZZnsc/fuXcyaNQutWrWCj48PJkyYgPz8fJN9cnJyMHbsWHh5eaFNmzZYuHAhKisrbf3yJCNFOQ0fPhwajcbkNnPmTJN9lFxOCQkJePDBB+Hr64s2bdrgiSeeQGZmpsk+Up0rBw4cQL9+/eDh4YFOnTph27Zttn55kpKqrOqeTxqNBp9++qnJPkouK3PKadOmTRg+fDi0Wi00Gg0KCgrqHefmzZt47rnnoNVq0aJFC8TExKCkpMRkn59//hlDhw5F8+bNERISgpUrV9rypUlKqnLq0KFDvfNpxYoVJvsouZwaw+RGhW7fvo3evXvjr3/9a737BEHAE088gUuXLuHLL79ERkYGQkNDERUVhdu3bxv3W7BgAf79739jx44dOHjwIK5cuYLx48cb76+qqsLYsWNRXl6Oo0ePIikpCdu2bcPSpUvt8hqlIEU5AcC0adNw9epV4632B4PSy+ngwYOYNWsWfvjhB+zduxcVFRUYNWqU5OfK5cuXMXbsWDzyyCM4efIk5s+fj6lTp2LPnj12fb3WkKKsamzdutXknHriiSeM9ym9rMwpp9LSUowePRqvv/56o8d57rnncObMGezduxe7d+/GoUOHMH36dOP9RUVFGDVqFEJDQ3H8+HGsWrUKf/rTn7Bp0yabvj6pSFVOALBs2TKT82nOnDnG+5ReTo0SSNUACLt27TL+nZmZKQAQTp8+bdxWVVUl+Pv7C5s3bxYEQRAKCgoENzc3YceOHcZ9zp07JwAQUlNTBUEQhK+//lpwcXER8vLyjPts2LBB0Gq1QllZmY1flfTElJMgCMLDDz8szJs3r9Hjqq2crl27JgAQDh48KAiCdOfKq6++KnTv3t3kuZ555hkhOjra1i/JZsSUlSDUPxfrUltZ1S2n2lJSUgQAwq1bt0y2nz17VgAgHDt2zLjtm2++ETQajfDbb78JgiAI69evF1q2bGnyPnvttdeEzp072+aF2JiYchIEQQgNDRXWrl3b6HHVVk41WHPjZMrKygAAzZs3N25zcXGBh4cHDh8+DAA4fvw4KioqEBUVZdynS5cuaN++PVJTUwEAqamp6NmzJwICAoz7REdHo6ioCGfOnLHHS7Epc8qpxscff4zWrVujR48eWLx4MUpLS433qa2cCgsLAQB+fn4ApDtXUlNTTY5Rs0/NMZRITFnVmDVrFlq3bo2HHnoIW7ZsgVBrOjK1lVXdcjJHamoqWrRogf79+xu3RUVFwcXFBWlpacZ9hg0bBnd3d+M+0dHRyMzMxK1btySK3n7ElFONFStWoFWrVujbty9WrVpl0iSstnKq4XQLZzq7mg/TxYsX44MPPoC3tzfWrl0LvV6Pq1evAgDy8vLg7u6OFi1amDw2ICAAeXl5xn1qX6xq7q+5T+nMKScA+MMf/oDQ0FAEBQXh559/xmuvvYbMzEx8/vnnANRVTtXV1Zg/fz4GDx6MHj16AJDuXGlsn6KiIty5cweenp62eEk2I7asAEMTwogRI+Dl5YXvvvsOL7/8MkpKSjB37lzjcdRSVg2Vkzny8vLQpk0bk23NmjWDn5+fyTkVFhZmsk/t865ly5ZWRm8/YssJAObOnYt+/frBz88PR48exeLFi3H16lWsWbMGgLrKqTYmN07Gzc0Nn3/+OWJiYuDn5wdXV1dERUVhzJgxJt8OnZ255VS7jb9nz55o27YtRo4ciYsXL+KBBx5wROg2M2vWLJw+fbpezRXVZ01ZLVmyxPh73759cfv2baxatcqY3KgJzynzWFNOcXFxxt979eoFd3d3zJgxAwkJCapenoHNUk4oIiICJ0+eREFBAa5evYpvv/0WN27cQMeOHQEAgYGBKC8vr9fzPj8/H4GBgcZ96o7yqPm7Zh+lu185NWTAgAEAgAsXLgBQTznNnj0bu3fvRkpKCoKDg43bpTpXGttHq9UqqiYCsK6sGjJgwADo9XpjU6layqqxcjJHYGAgrl27ZrKtsrISN2/eVN1nlDXl1JABAwagsrIS2dnZANRTTnUxuXFiOp0O/v7+yMrKQnp6Oh5//HEAhou6m5sbkpOTjftmZmYiJycHkZGRAIDIyEicOnXK5ANm79690Gq16Natm31fiI01Vk4NOXnyJACgbdu2AJRfToIgYPbs2di1axf2799fr/paqnMlMjLS5Bg1+9QcQwmkKKuGnDx5Ei1btjR+y1Z6Wd2vnMwRGRmJgoICHD9+3Lht//79qK6uNn7BiIyMxKFDh1BRUWHcZ+/evejcubMimlqkKKeGnDx5Ei4uLsZmPaWXU6Mc2JmZbKS4uFjIyMgQMjIyBADCmjVrhIyMDOHXX38VBEEQPvvsMyElJUW4ePGi8MUXXwihoaHC+PHjTY4xc+ZMoX379sL+/fuF9PR0ITIyUoiMjDTeX1lZKfTo0UMYNWqUcPLkSeHbb78V/P39hcWLF9v1tVrD2nK6cOGCsGzZMiE9PV24fPmy8OWXXwodO3YUhg0bZtxH6eUUGxsr6HQ64cCBA8LVq1eNt9LSUuM+Upwrly5dEry8vISFCxcK586dE/76178Krq6uwrfffmvX12sNKcrqX//6l7B582bh1KlTQlZWlrB+/XrBy8tLWLp0qXEfpZeVOeV09epVISMjQ9i8ebMAQDh06JCQkZEh3Lhxw7jP6NGjhb59+wppaWnC4cOHhfDwcGHixInG+wsKCoSAgADh+eefF06fPi18+umngpeXl/DBBx/Y9fWKJUU5HT16VFi7dq1w8uRJ4eLFi8Lf//53wd/fX3jhhReMx1B6OTWGyY0K1QwLrHt78cUXBUEQhHXr1gnBwcGCm5ub0L59e+GPf/xjvWHJd+7cEV5++WWhZcuWgpeXl/Dkk08KV69eNdknOztbGDNmjODp6Sm0bt1aeOWVV4SKigp7vUyrWVtOOTk5wrBhwwQ/Pz/Bw8ND6NSpk7Bw4UKhsLDQ5HmUXE4NlQ8AYevWrcZ9pDpXUlJShD59+gju7u5Cx44dTZ5DCaQoq2+++Ubo06eP4OPjI3h7ewu9e/cWNm7cKFRVVZk8l5LLypxyeuONN+67z40bN4SJEycKPj4+glarFaZMmSIUFxebPNdPP/0kDBkyRPDw8BDatWsnrFixwk6v0npSlNPx48eFAQMGCDqdTmjevLnQtWtXYfny5cLdu3dNnkvJ5dQYjSCwFykRERGpB/vcEBERkaowuSEiIiJVYXJDREREqsLkhoiIiFSFyQ0RERGpCpMbIiIiUhUmN0RERKQqTG6IiIhIVZjcEBGpWHp6OlauXImff/7Z0aEQ2Q2TGyIFGz58OObPn6/451CTAwcOoEOHDg6N4caNG9ixYwfWr1+P69evY/r06cjOzsaGDRuwc+dOFBYWmuwvh5iJpNTM0QEQUdNSU1MxZMgQjB49Gl999ZWjw7mv4cOHo0+fPnj33XcdHYrTatWqFZ5++mmTbY899piDoiGyP9bcEMlcYmIi5syZg0OHDuHKlSuODkfWysvLLdou9nhEJG9MbohkrKSkBP/4xz8QGxuLsWPHYtu2bfX2qaysxOzZs6HT6dC6dWssWbIEtdfD/ec//4mePXvC09MTrVq1QlRUFG7fvg0AKCsrw9y5c9GmTRs0b94cQ4YMwbFjxxqNp0OHDvVqZPr06YM//elPAIDJkyfj4MGDWLduHTQaDTQaDbKzswEA1dXVSEhIQFhYGDw9PdG7d2/885//bPL13+8xw4cPx+zZszF//ny0bt0a0dHRjW4357U2drymytAcwcHBWL9+vcm2o0ePwsvLC7/++qvZx5Hr8xHJDZMbIhn77LPP0KVLF3Tu3BmTJk3Cli1bTBIXAEhKSkKzZs3w448/Yt26dVizZg0+/PBDAMDVq1cxceJEvPTSSzh37hwOHDiA8ePHG4/x6quvYufOnUhKSsKJEyfQqVMnREdH4+bNm6LiXbduHSIjIzFt2jRcvXoVV69eRUhICAAgISEBH330ETZu3IgzZ85gwYIFmDRpEg4ePNjo8cx5TFJSEtzd3XHkyBFs3Lix0e3mvta6j7tfGZpjwIABJomUIAiYP38+FixYgNDQULOPI9fnI5IdgYhka9CgQcK7774rCIIgVFRUCK1btxZSUlKM9z/88MNC165dherqauO21157TejatasgCIJw/PhxAYCQnZ1d79glJSWCm5ub8PHHHxu3lZeXC0FBQcLKlStNnmPevHmCIAhCaGiosHbtWpPj9O7dW3jjjTca3L/G3bt3BS8vL+Ho0aMm22NiYoSJEyc2+NrNeczDDz8s9O3bt95j62635LXWPV5TZdiQlJQUITQ01GTbypUrhe7duxv/TkpKEgIDA4Xi4mJBEAQhOTlZeOedd8w6vjnu93x1n7OhmImUjDU3RDKVmZmJH3/8ERMnTgQANGvWDM888wwSExNN9hs4cCA0Go3x78jISGRlZaGqqgq9e/fGyJEj0bNnTzz99NPYvHkzbt26BQC4ePEiKioqMHjwYONj3dzc8NBDD+HcuXOSvpYLFy6gtLQUv//97+Hj42O8ffTRR7h48aJVj4mIiGjw8bW3W/Ja6x6vqTI018CBA3Hu3DmUlJTg9u3beP311/HWW2/Bx8cHADBixAi88sor9R63aNEiY/NeY7fz589b/HxNPSeRGnC0FJFMJSYmorKyEkFBQcZtgiDAw8MD//d//wedTnffY7i6umLv3r04evQovvvuO7z//vv4f//v/yEtLU1UTC4uLvWaYyoqKu77uJKSEgDAV199hXbt2pnc5+HhYdVjvL29G3x8Y9vvp+7jmirDsLAws44ZEREBFxcXnDhxAvv27YO/vz+mTJlivP+xxx7D22+/jZ49e5o87pVXXsHkyZObPHbHjh0tfr6mnpNIDZjcEMlQZWUlPvroI6xevRqjRo0yue+JJ57AJ598gpkzZwJAvUTlhx9+QHh4OFxdXQEAGo0GgwcPxuDBg7F06VKEhoZi165dmDFjhrFvSU0/jIqKChw7dqzReW38/f1x9epV499FRUW4fPmyyT7u7u6oqqoy2datWzd4eHggJycHDz/8sFllIOYxjXnggQcsfq21NVaGcXFxZj2/l5cXevbsiZ07d2Lz5s34+uuv4eJyr+L8/Pnz6NKlS73H+fv7w9/f37wXacHzNfWcRGrA5IZIhnbv3o1bt24hJiamXg3NhAkTkJiYaExucnJyEBcXhxkzZuDEiRN4//33sXr1agCGxCc5ORmjRo1CmzZtkJaWhuvXr6Nr167w9vZGbGwsFi5cCD8/P7Rv3x4rV65EaWkpYmJiGoxrxIgR2LZtG8aNG4cWLVpg6dKlxiSqRocOHZCWlobs7Gz4+PjAz88Pvr6+iI+Px4IFC1BdXY0hQ4agsLAQR44cgVarxYsvvljvucQ8pjFiXmuNpsrQEgMHDsT777+Pxx9/HMOHDzduLy4uRvPmzeHm5mbR8cQ+ny2fk0gumNwQyVBiYiKioqIabHqaMGGCyXT6L7zwAu7cuYOHHnoIrq6umDdvHqZPnw4A0Gq1OHToEN59910UFRUhNDQUq1evxpgxYwAAK1asQHV1NZ5//nkUFxejf//+2LNnD1q2bNlgXIsXL8bly5fx6KOPQqfT4c0336xXcxMfH48XX3wR3bp1w507d3D58mV06NABb775Jvz9/ZGQkIBLly6hRYsW6NevH15//fVGy0HMYxpj6Wutcb8yNFfv3r3h5uaGVatWmWw/c+YMunfvbvHrEft8tnxOIrnQCHUb0ImISLQDBw5g8uTJxvl9ajzyyCPo16+fsVatxubNm3H9+nVRCVtTGnu+hp6zsZiJlIo1N0RENlJdXY3r168jMTERWVlZ+PLLL+vtc+rUKURFRdnt+aR+TiI5YnJDRGQjhw4dwogRI9ClSxfs3LkTWq223j7vvfeeXZ9P6uckkiMmN0REEurQoYNxBNbw4cNRXV1tt+cW+3y1YyZSA/a5ISIiIlXhDMVERESkKkxuiIiISFWY3BAREZGqMLkhIiIiVWFyQ0RERKrC5IaIiIhUhckNERERqQqTGyIiIlIVJjdERESkKkxuiIiISFWY3BAREZGq/H/vpqj7qGzfAAAAAABJRU5ErkJggg==", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_std\n", + "\n", + "y_pred_var = reg_proba.predict_var(X_test)\n", + "plot_crossplot_std(y_test, y_pred_var)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "C. crossplot ground truth vs loss values\n", + "\n", + "Loss and prediction type should agree.\n", + "\n", + "What to look for: association between accuracy and ground truth value\n", + "\n", + "Diagnostic of which values we can predict more accurately,\n", + "\n", + "e.g., to inform modelling or identify unusual outliers" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_loss\n", + "\n", + "crps_metric = CRPS()\n", + "plot_crossplot_loss(y_test, y_pred_proba, crps_metric)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.4 `skpro` objects - `scikit-base` interface, searching for regressors and metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.4.1 primer on `skpro` object interface " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "metrics and estimators are first-class citizens in `skpro`, with a `scikit-base` compatible interface" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "# example object 1: CRPS metric\n", + "from skpro.metrics import CRPS\n", + "\n", + "crps_metric = CRPS()\n", + "\n", + "# example object 2: ResidualDouble regressor\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "from sklearn.linear_model import LinearRegression\n", + "\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "reg_mean = LinearRegression()\n", + "reg_resid = RandomForestRegressor()\n", + "reg_proba = ResidualDouble(reg_mean, reg_resid)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "e.g., all have `get_tags` interface" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'estimator_type': 'estimator',\n", + " 'object_type': 'metric',\n", + " 'reserved_params': ['multioutput', 'score_average'],\n", + " 'scitype:y_pred': 'pred_proba',\n", + " 'lower_is_better': True}" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "crps_metric.get_tags()" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'estimator_type': 'regressor_proba',\n", + " 'object_type': 'regressor_proba',\n", + " 'capability:multioutput': False,\n", + " 'capability:missing': True}" + ] + }, + "execution_count": 27, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reg_proba.get_tags()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "the tag `object_type` indicates the type of object, e.g., metric or proba regressor" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "all objects also have the `get_params`/`set_params` interface known from `scikit-learn`\n", + "\n", + "= reading or setting hyper-parameters\n", + "\n", + "`get_params` returns `dict` `{paramname: paramvalue}`; `set_params` writes it" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'multioutput': 'uniform_average', 'multivariate': False}" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "crps_metric.get_params()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "composite objects have the nested param interface, keys `componentname__paramname`" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
ResidualDouble(estimator=LinearRegression(),\n",
+       "               estimator_resid=RandomForestRegressor())
Please rerun this cell to show the HTML repr or trust the notebook.
" + ], + "text/plain": [ + "ResidualDouble(estimator=LinearRegression(),\n", + " estimator_resid=RandomForestRegressor())" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# note that reg_proba has components LinearRegression and RandomForestaregressor\n", + "# each with their own parameters\n", + "reg_proba" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "so `reg_proba` will have parameters coming from itself and either component:" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'cv': None,\n", + " 'distr_loc_scale_name': None,\n", + " 'distr_params': None,\n", + " 'distr_type': 'Normal',\n", + " 'estimator': LinearRegression(),\n", + " 'estimator_resid': RandomForestRegressor(),\n", + " 'min_scale': 1e-10,\n", + " 'residual_trafo': 'absolute',\n", + " 'use_y_pred': False,\n", + " 'estimator__copy_X': True,\n", + " 'estimator__fit_intercept': True,\n", + " 'estimator__n_jobs': None,\n", + " 'estimator__normalize': 'deprecated',\n", + " 'estimator__positive': False,\n", + " 'estimator_resid__bootstrap': True,\n", + " 'estimator_resid__ccp_alpha': 0.0,\n", + " 'estimator_resid__criterion': 'squared_error',\n", + " 'estimator_resid__max_depth': None,\n", + " 'estimator_resid__max_features': 1.0,\n", + " 'estimator_resid__max_leaf_nodes': None,\n", + " 'estimator_resid__max_samples': None,\n", + " 'estimator_resid__min_impurity_decrease': 0.0,\n", + " 'estimator_resid__min_samples_leaf': 1,\n", + " 'estimator_resid__min_samples_split': 2,\n", + " 'estimator_resid__min_weight_fraction_leaf': 0.0,\n", + " 'estimator_resid__n_estimators': 100,\n", + " 'estimator_resid__n_jobs': None,\n", + " 'estimator_resid__oob_score': False,\n", + " 'estimator_resid__random_state': None,\n", + " 'estimator_resid__verbose': 0,\n", + " 'estimator_resid__warm_start': False}" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reg_proba.get_params()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "further common interface points are `get_config`, `set_config`, and `get_fitted_params` (only fittable estimators)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1.4.2 searching for regressors and metrics " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "as first-class citizens, all objects in `skpro` are indexed via the `registry` utility `all_objects`.\n", + "\n", + "To find probabilistic supervised regressors, use `all_objects` with the type `regressor_proba`:" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameobject
0BaggingRegressor<class 'skpro.regression.ensemble.BaggingRegre...
1BootstrapRegressor<class 'skpro.regression.bootstrap.BootstrapRe...
2GridSearchCV<class 'skpro.model_selection._tuning.GridSear...
3Pipeline<class 'skpro.regression.compose._pipeline.Pip...
4RandomizedSearchCV<class 'skpro.model_selection._tuning.Randomiz...
\n", + "
" + ], + "text/plain": [ + " name object\n", + "0 BaggingRegressor \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameobjectscitype:y_pred
0CRPS<class 'skpro.metrics._classes.CRPS'>pred_proba
1ConstraintViolation<class 'skpro.metrics._classes.ConstraintViola...pred_interval
2EmpiricalCoverage<class 'skpro.metrics._classes.EmpiricalCovera...pred_interval
3LinearizedLogLoss<class 'skpro.metrics._classes.LinearizedLogLo...pred_proba
4LogLoss<class 'skpro.metrics._classes.LogLoss'>pred_proba
5PinballLoss<class 'skpro.metrics._classes.PinballLoss'>pred_quantiles
6SquaredDistrLoss<class 'skpro.metrics._classes.SquaredDistrLoss'>pred_proba
\n", + "" + ], + "text/plain": [ + " name object \\\n", + "0 CRPS \n", + "1 ConstraintViolation \n", + "5 PinballLoss \n", + "6 SquaredDistrLoss \n", + "\n", + " scitype:y_pred \n", + "0 pred_proba \n", + "1 pred_interval \n", + "2 pred_interval \n", + "3 pred_proba \n", + "4 pred_proba \n", + "5 pred_quantiles \n", + "6 pred_proba " + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from skpro.registry import all_objects\n", + "\n", + "all_objects(\"metric\", as_dataframe=True, return_tags=\"scitype:y_pred\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "all tags can be printed by the `all_tags` utility:" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namescitypetypedescription
0lower_is_bettermetricboolwhether lower (True) or higher (False) is better
1scitype:y_predmetricstrexpected input type for y_pred in performance ...
\n", + "
" + ], + "text/plain": [ + " name scitype type \\\n", + "0 lower_is_better metric bool \n", + "1 scitype:y_pred metric str \n", + "\n", + " description \n", + "0 whether lower (True) or higher (False) is better \n", + "1 expected input type for y_pred in performance ... " + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# all tags applicable to metrics\n", + "from skpro.registry import all_tags\n", + "\n", + "all_tags(\"metric\", as_dataframe=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namescitypetypedescription
0capability:missingregressor_probaboolwhether estimator supports missing values
1capability:multioutputregressor_probaboolwhether estimator supports multioutput regression
\n", + "
" + ], + "text/plain": [ + " name scitype type \\\n", + "0 capability:missing regressor_proba bool \n", + "1 capability:multioutput regressor_proba bool \n", + "\n", + " description \n", + "0 whether estimator supports missing values \n", + "1 whether estimator supports multioutput regression " + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# all tags applicable to probabilistic regressors\n", + "from skpro.registry import all_tags\n", + "\n", + "all_tags(\"regressor_proba\", as_dataframe=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "filtering in search can be done with the `filter_tags` argument in `all_objects`, see docstring:" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameobject
0CRPS<class 'skpro.metrics._classes.CRPS'>
1LinearizedLogLoss<class 'skpro.metrics._classes.LinearizedLogLo...
2LogLoss<class 'skpro.metrics._classes.LogLoss'>
3SquaredDistrLoss<class 'skpro.metrics._classes.SquaredDistrLoss'>
\n", + "
" + ], + "text/plain": [ + " name object\n", + "0 CRPS \n", + "1 LinearizedLogLoss \n", + "3 SquaredDistrLoss " + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from skpro.registry import all_objects\n", + "\n", + "# \"retrieve all genuinely probabilistic loss functions\"\n", + "all_objects(\"metric\", as_dataframe=True, filter_tags={\"scitype:y_pred\": \"pred_proba\"})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Prediction types, metrics, benchmarking " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This section gives more details on:\n", + "\n", + "* different prediction types, including a methodological primer\n", + "* the API of metrics to compare probabilistic predictions to non-probabilistic actuals\n", + "* utilities for batch benchmarking of estimators and metrics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.1 Probabilistic predictions - methodological primer " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**readers familir with, or less interested in theory, may like to skip section 2.1**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In supervised learning - probabilistic or not:\n", + "\n", + "* we fit estimator to i.i.d samples $(X_1, Y_1), \\dots, (X_N, Y_N) \\sim (X_*, Y_*)$\n", + "* and want to predict $y$ given $x$ accurately, for $(x, y) \\sim (X_*, Y_*)$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let $y$ be the (true) value, for an observed feature $x$\n", + "\n", + "(we consider $y$ a random variable)\n", + "\n", + "| Name | param | prediction/estimate of | `skpro` |\n", + "| ---- | ----- | ---------------------- | -------- |\n", + "| point prediction | | conditional expectation $\\mathbb{E}[y\\|x]$ | `predict` |\n", + "| variance prediction | | conditional variance $Var[y\\|x]$ | `predict_var` |\n", + "| quantile prediction | $\\alpha\\in (0,1)$ | $\\alpha$-quantile of $y\\|x$ | `predict_quantiles` |\n", + "| interval prediction | $c\\in (0,1)$| $[a,b]$ s.t. $P(a\\le y \\le b\\| x) = c$ | `predict_interval` |\n", + "| distribution prediction | | the law/distribution of $y\\|x$ | `predict_proba` |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### More formal details & intuition:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "let's consider the toy example again" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_new, y_train, _ = train_test_split(X, y)" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_new, y_train, _ = train_test_split(X, y)\n", + "\n", + "\n", + "reg_mean = RandomForestRegressor()\n", + "reg_proba = ResidualDouble(reg_mean)\n", + "\n", + "reg_proba.fit(X_train, y_train)\n", + "y_pred_proba = reg_proba.predict_proba(X_new)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* a **\"point prediction\"** is a prediction/estimate of the conditional expectation $\\mathbb{E}[y|x]$.\\\n", + " **Intuition**: \"out of many repetitions/worlds, this value is the arithmetic average of all observations\"." + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
9126.436327
142228.752373
117229.274334
154190.479808
199198.487125
\n", + "
" + ], + "text/plain": [ + " target\n", + "9 126.436327\n", + "142 228.752373\n", + "117 229.274334\n", + "154 190.479808\n", + "199 198.487125" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# if y_pred_proba were *true*, here's how many repetitions would look like:\n", + "\n", + "# repeating this line is \"one repetition\"\n", + "y_pred_proba.sample().head()" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
09154.654233
142168.793099
117205.445650
154185.737977
199172.479164
.........
99336227.567609
407133.807497
354194.368428
104149.853900
417159.109449
\n", + "

11100 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " target\n", + "0 9 154.654233\n", + " 142 168.793099\n", + " 117 205.445650\n", + " 154 185.737977\n", + " 199 172.479164\n", + "... ...\n", + "99 336 227.567609\n", + " 407 133.807497\n", + " 354 194.368428\n", + " 104 149.853900\n", + " 417 159.109449\n", + "\n", + "[11100 rows x 1 columns]" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "many_samples = y_pred_proba.sample(100)\n", + "many_samples" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
9149.404360
142187.103481
117207.415288
154181.764525
199169.965189
\n", + "
" + ], + "text/plain": [ + " target\n", + "9 149.404360\n", + "142 187.103481\n", + "117 207.415288\n", + "154 181.764525\n", + "199 169.965189" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# \"doing many times and taking the mean\" -> usual point prediction\n", + "mean_prediction = many_samples.groupby(level=1, sort=False).mean()\n", + "mean_prediction.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
9151.44
142188.04
117205.71
154183.37
199168.71
\n", + "
" + ], + "text/plain": [ + " target\n", + "9 151.44\n", + "142 188.04\n", + "117 205.71\n", + "154 183.37\n", + "199 168.71" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# if we would do this infinity times instead of 100:\n", + "y_pred_proba.mean().head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* a **\"variance prediction\"** is a prediction/estimate of the conditional expectation $Var[y|x]$.\\\n", + " **Intuition:** \"out of many repetitions/worlds, this value is the average squared distance of the observation to the perfect point prediction\".\n" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
9275.380423
142307.560818
117369.623033
154263.550387
199315.443840
\n", + "
" + ], + "text/plain": [ + " target\n", + "9 275.380423\n", + "142 307.560818\n", + "117 369.623033\n", + "154 263.550387\n", + "199 315.443840" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# same as above - take many samples, and then compute element-wise statistics\n", + "var_prediction = many_samples.groupby(level=1, sort=False).var()\n", + "var_prediction.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
9296.662638
142296.662638
117296.662638
154296.662638
199296.662638
\n", + "
" + ], + "text/plain": [ + " target\n", + "9 296.662638\n", + "142 296.662638\n", + "117 296.662638\n", + "154 296.662638\n", + "199 296.662638" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# e.g., predict_var should give the same result as infinite large sample's variance\n", + "y_pred_proba.var().head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* a **\"quantile prediction\"**, at quantile point $\\alpha\\in (0,1)$ is a prediction/estimate of the $\\alpha$-quantile of $y'|y$, i.e., of $F^{-1}_{y|x}(\\alpha)$, where $F^{-1}$ is the (generalized) inverse cdf = quantile function of the random variable y|x.\\\n", + " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $\\alpha$ will have equal or smaller than this value.\"\n", + "* an **\"interval prediction\"** or \"predictive interval\" with (symmetric) coverage $c\\in (0,1)$ is a prediction/estimate pair of lower bound $a$ and upper bound $b$ such that $P(a\\le y \\le b| x) = c$ and $P(y \\gneq b| x) = P(y \\lneq a| x) = (1 - c) /2$.\\\n", + " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $c$ will be contained in the interval $[a,b]$, and being above is equally likely as being below\"." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(similar - exercise left to the reader)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* a **\"distribution prediction\"** or \"full probabilistic prediction\" is a prediction/estimate of the distribution of $y|x$, e.g., \"it's a normal distribution with mean 42 and variance 1\".\\\n", + "**Intuition**: exhaustive description of the generating mechanism of many repetitions/worlds." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "note: the true distribution is unknown, and not accessible easily!\n", + "\n", + "`y_pred_proba` is a distribution, but in general not equal to the true one!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "that is, there are:\n", + "\n", + "* *true* distribution `y_pred_proba_true` - unknown and unknowable but estimable\n", + "* `y_pred_proba` - our guess at `y_pred_proba_true`\n", + "* the actual data `y_true` is *one* `y_pred_proba_true.sample()`\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* `predict` produces guess of `y_pred_proba_true.mean()`\n", + "* `predict_var` produces guess of `y_pred_proba_true.var()`\n", + "* `predict_quantiles([0.05, 0.5, 0.95])` produces guess of `y_pred_proba_true.quantiles([0.05, 0.5, 0.95])`\n", + "* `predict_proba` produces guess of `y_pred_proba_true`\n", + "\n", + "the guesses are algorithm specific, and some algorithms are more accurate than others, given data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.2 probabilistic metrics - details " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "General usage pattern same as for `sklearn` metrics:\n", + "\n", + "1. get some actuals and predictions\n", + "2. specify the metric - similar to estimator specs\n", + "3. plug the actuals and predictions into metric to get metric values\n", + "\n", + "*but*: need to use dedicated metric for probabilistic predictions\n", + "\n", + "* ground truth: `y_true` samples\n", + "* prediction e.g., `y_predict_proba`, `y_predict_interval`\n", + "* so, match metric with type of prediction! `metric(y_true, y_predict_proba)`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Recall methods available for all probabilistic regressors:\n", + "\n", + "- `predict_interval` produces interval predictions.\n", + " Argument `coverage` (nominal interval coverage) must be provided.\n", + "- `predict_quantiles` produces quantile predictions.\n", + " Argument `alpha` (quantile values) must be provided.\n", + "- `predict_var` produces variance predictions. Same args as `predict`.\n", + "- `predict_proba` produces full distributional predictions. Same args as `predict`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "| Name | param | prediction/estimate of | `skpro` |\n", + "| ---- | ----- | ---------------------- | -------- |\n", + "| point prediction | | conditional expectation $\\mathbb{E}[y\\|x]$ | `predict` |\n", + "| variance prediction | | conditional variance $Var[y\\|x]$ | `predict_var` |\n", + "| quantile prediction | $\\alpha\\in (0,1)$ | $\\alpha$-quantile of $y\\|x$ | `predict_quantiles` |\n", + "| interval prediction | $c\\in (0,1)$| $[a,b]$ s.t. $P(a\\le y \\le b\\| x) = c$ | `predict_interval` |\n", + "| distribution prediction | | the law/distribution of $y\\|x$ | `predict_proba` |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "let's produce some probabilistic predictions!" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [], + "source": [ + "# 1. get some actuals and predictions\n", + "from sklearn.datasets import load_diabetes\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y)\n", + "# actuals = y_test" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "reg_mean = RandomForestRegressor()\n", + "reg_proba = ResidualDouble(reg_mean)\n", + "\n", + "reg_proba.fit(X_train, y_train)\n", + "\n", + "# use any of the probabilistic methods, we have seen this\n", + "y_pred_int = reg_proba.predict_interval(X_test, coverage=0.95)\n", + "y_pred_q = reg_proba.predict_quantiles(X_test, alpha=[0.05, 0.95])\n", + "y_pred_proba = reg_proba.predict_proba(X_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "recall, all have their own output format:" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
0.95
lowerupper
38557.568213128.151787
4857.438213128.021787
8451.838213122.421787
248184.628213255.211787
33167.838213138.421787
.........
34054.788213125.371787
129188.558213259.141787
313195.668213266.251787
368188.288213258.871787
6061.178213131.761787
\n", + "

111 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " target \n", + " 0.95 \n", + " lower upper\n", + "385 57.568213 128.151787\n", + "48 57.438213 128.021787\n", + "84 51.838213 122.421787\n", + "248 184.628213 255.211787\n", + "331 67.838213 138.421787\n", + ".. ... ...\n", + "340 54.788213 125.371787\n", + "129 188.558213 259.141787\n", + "313 195.668213 266.251787\n", + "368 188.288213 258.871787\n", + "60 61.178213 131.761787\n", + "\n", + "[111 rows x 2 columns]" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_int # lower/upper intervals" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
0.050.95
38563.242199122.477801
4863.112199122.347801
8457.512199116.747801
248190.302199249.537801
33173.512199132.747801
.........
34060.462199119.697801
129194.232199253.467801
313201.342199260.577801
368193.962199253.197801
6066.852199126.087801
\n", + "

111 rows × 2 columns

\n", + "
" + ], + "text/plain": [ + " target \n", + " 0.05 0.95\n", + "385 63.242199 122.477801\n", + "48 63.112199 122.347801\n", + "84 57.512199 116.747801\n", + "248 190.302199 249.537801\n", + "331 73.512199 132.747801\n", + ".. ... ...\n", + "340 60.462199 119.697801\n", + "129 194.232199 253.467801\n", + "313 201.342199 260.577801\n", + "368 193.962199 253.197801\n", + "60 66.852199 126.087801\n", + "\n", + "[111 rows x 2 columns]" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_q # quantiles" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Normal(columns=Index(['target'], dtype='object'),\n",
+       "       index=Index([385,  48,  84, 248, 331, 170, 230, 383, 309,  86,\n",
+       "       ...\n",
+       "       266, 346, 211, 171, 319, 340, 129, 313, 368,  60],\n",
+       "      dtype='int64', length=111),\n",
+       "       mu=array([[ 92.86],\n",
+       "       [ 92.73],\n",
+       "       [ 87.13],\n",
+       "       [219.92],\n",
+       "       [103.13],\n",
+       "       [ 69.04],\n",
+       "       [188.5 ],\n",
+       "       [ 93.05],\n",
+       "       [164.58],\n",
+       "       [ 95.23],\n",
+       "       [ 86.68],\n",
+       "       [107.67],\n",
+       "       [127.11],\n",
+       "       [116.98],\n",
+       "       [136.79],\n",
+       "       [185.75],\n",
+       "       [191.78],\n",
+       "       [132.92],\n",
+       "       [172.28],\n",
+       "       [116.68],\n",
+       "       [...\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441],\n",
+       "       [18.00634441]]))
Please rerun this cell to show the HTML repr or trust the notebook.
" + ], + "text/plain": [ + "Normal(columns=Index(['target'], dtype='object'),\n", + " index=Index([385, 48, 84, 248, 331, 170, 230, 383, 309, 86,\n", + " ...\n", + " 266, 346, 211, 171, 319, 340, 129, 313, 368, 60],\n", + " dtype='int64', length=111),\n", + " mu=array([[ 92.86],\n", + " [ 92.73],\n", + " [ 87.13],\n", + " [219.92],\n", + " [103.13],\n", + " [ 69.04],\n", + " [188.5 ],\n", + " [ 93.05],\n", + " [164.58],\n", + " [ 95.23],\n", + " [ 86.68],\n", + " [107.67],\n", + " [127.11],\n", + " [116.98],\n", + " [136.79],\n", + " [185.75],\n", + " [191.78],\n", + " [132.92],\n", + " [172.28],\n", + " [116.68],\n", + " [...\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441],\n", + " [18.00634441]]))" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_pred_proba # sktime/skpro BaseDistribution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "we now need to apply a suitable metric, `metric(y_test, y_pred)`\n", + "\n", + "IMPORTANT: sequence matters, `y_test` first; `y_pred` has very different type!" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "32.4116032777842" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# 2. specify metric\n", + "# CRPS = continuous ranked probability score, for distribution predictions\n", + "from skpro.metrics import CRPS\n", + "\n", + "crps = CRPS()\n", + "\n", + "# 3. evaluate metric\n", + "crps(y_test, y_pred_proba)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "how do we find a metric that fits the prediction type?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "answer: metrics are tagged\n", + "\n", + "important tag: `scitype:y_pred`\n", + "\n", + "* `\"pred_proba\"` - distributional, can applied to distributions, `predict_proba` output\n", + "* `\"pred_quantiles\"` - quantile forecast metric, can be applied to quantile predictions, interval predictions, distributional predictions\n", + " * applicable to `predict_quantiles`, `predict_interval`, `predict_proba` outputs\n", + "* `\"pred_interval\"` - interval forecast metric, can be applied to interval predictions, distributional predictions\n", + " * applicable to `predict_interval`, `predict_proba` outputs" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'estimator_type': 'estimator',\n", + " 'object_type': 'metric',\n", + " 'reserved_params': ['multioutput', 'score_average'],\n", + " 'scitype:y_pred': 'pred_proba',\n", + " 'lower_is_better': True}" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "crps.get_tags()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "listing metrics with the tag, filtering for probabilistic tags:\n", + "\n", + "(let's try to find a quantile prediction metric!)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameobjectscitype:y_pred
0CRPS<class 'skpro.metrics._classes.CRPS'>pred_proba
1ConstraintViolation<class 'skpro.metrics._classes.ConstraintViola...pred_interval
2EmpiricalCoverage<class 'skpro.metrics._classes.EmpiricalCovera...pred_interval
3LinearizedLogLoss<class 'skpro.metrics._classes.LinearizedLogLo...pred_proba
4LogLoss<class 'skpro.metrics._classes.LogLoss'>pred_proba
5PinballLoss<class 'skpro.metrics._classes.PinballLoss'>pred_quantiles
6SquaredDistrLoss<class 'skpro.metrics._classes.SquaredDistrLoss'>pred_proba
\n", + "
" + ], + "text/plain": [ + " name object \\\n", + "0 CRPS \n", + "1 ConstraintViolation \n", + "5 PinballLoss \n", + "6 SquaredDistrLoss \n", + "\n", + " scitype:y_pred \n", + "0 pred_proba \n", + "1 pred_interval \n", + "2 pred_interval \n", + "3 pred_proba \n", + "4 pred_proba \n", + "5 pred_quantiles \n", + "6 pred_proba " + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from skpro.registry import all_objects\n", + "\n", + "all_objects(\n", + " \"metric\",\n", + " as_dataframe=True,\n", + " return_tags=\"scitype:y_pred\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`PinballLoss` is a quantile forecast metric:" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "10.256319541091692" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from skpro.metrics import PinballLoss\n", + "\n", + "pinball_loss = PinballLoss()\n", + "\n", + "pinball_loss(y_test, y_pred_q)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "... this is by default an average (grand average, float)\n", + "\n", + "* averages over samples in `y_pred` / `y_test` (rows)\n", + "* averages over variables (columns)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "what if we don't want these averages?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* variable (column) averaging is controlled by the `multioutput` arg.\n", + " * `\"raw_values\"` prevents averaging, `\"uniform_average\"` computes arithmetic mean.\n", + "* evaluation by row via the `evaluate_by_index` method\n", + " * can be useful for diagnostics or statistical tests" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
target
385115.981008
4810.660141
8413.874095
24828.144593
33185.711008
......
340115.761008
12934.074696
31357.801692
36890.421008
6025.642375
\n", + "

111 rows × 1 columns

\n", + "
" + ], + "text/plain": [ + " target\n", + "385 115.981008\n", + "48 10.660141\n", + "84 13.874095\n", + "248 28.144593\n", + "331 85.711008\n", + ".. ...\n", + "340 115.761008\n", + "129 34.074696\n", + "313 57.801692\n", + "368 90.421008\n", + "60 25.642375\n", + "\n", + "[111 rows x 1 columns]" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "crps.evaluate_by_index(y_test, y_pred_proba)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Caveat: not every metric is an average over time points, e.g., RMSE\n", + "\n", + "In this case, `evaluate_by_index` computes jackknife pseudo-samples\n", + "\n", + "(for mean statistics, jackknife pseudo-samples are equal to individual samples)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3 Benchmark evaluation of probabilistic regressors " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "for quick evaluation and benchmarking,\n", + "\n", + "the `benchmarking.evaluate` utility can be used:" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
test_CRPSfit_timepred_timelen_y_train
031.7834510.0028860.000984294
133.5743290.0037540.001477295
229.9096550.0025800.001086295
\n", + "
" + ], + "text/plain": [ + " test_CRPS fit_time pred_time len_y_train\n", + "0 31.783451 0.002886 0.000984 294\n", + "1 33.574329 0.003754 0.001477 295\n", + "2 29.909655 0.002580 0.001086 295" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import KFold\n", + "\n", + "from skpro.benchmarking.evaluate import evaluate\n", + "from skpro.metrics import CRPS\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "# 1. specify dataset\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "\n", + "# 2. specify estimator\n", + "estimator = ResidualDouble(LinearRegression())\n", + "\n", + "# 3. specify cross-validation schema\n", + "cv = KFold(n_splits=3)\n", + "\n", + "# 4. specify evaluation metric\n", + "crps = CRPS()\n", + "\n", + "# 5. evaluate - run the benchmark\n", + "results = evaluate(estimator=estimator, X=X, y=y, cv=cv, scoring=crps)\n", + "\n", + "# results are pd.DataFrame\n", + "# each row is one repetition of the cross-validation on one fold fit/predict/evaluate\n", + "# columns report performance, runtime, and other optional information (see docstring)\n", + "results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Advanced composition patterns " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "we introduce a number of composition patterns available in `skpro`:\n", + "\n", + "* reducer-wrappers that turn `sklearn` regressors into probabilistic ones\n", + "* pipelines of `sklearn` transformers with `skpro` regressors\n", + "* tuning `skpro` probabilistic regressors via grid/random search, minimizing a probabilistic metric\n", + "* ensembling multiple `skpro` probabilistic regressors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "data used in this section:" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "evaluation metric used in this section:" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [], + "source": [ + "crps = CRPS()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.1 Reducers to turn `sklearn` regressors probabilistic " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "there are many common algorithms that turn a non-probabilistic tabular regressor probabilistic\n", + "\n", + "formally, this is a type of \"reduction\" - of probabilistic supervised tabular to non-probabilistic supervised tabular\n", + "\n", + "Examples:\n", + "\n", + "* predicting variance equal to training residual variance - `ResidualDouble` with standard settings\n", + " * or other unconditional distribution estimate for residuals\n", + "* \"squaring the residual\" two-step prediction - `ResidualDouble`\n", + "* boostrap prediction intervals - `BootstrapRegressor`\n", + "* conformal prediction intervals - contributions appreciated :-)\n", + "* natural gradient boosting aka NGBoost - contributions appreciated :-)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.1.1 constant variance prediction " + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "31.18872502807047" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.ensemble import RandomForestRegressor\n", + "from sklearn.model_selection import KFold\n", + "\n", + "# estimator specification - use any sklearn regressor for reg_mean\n", + "reg_mean = RandomForestRegressor()\n", + "reg_proba = ResidualDouble(reg_mean, cv=KFold(5))\n", + "# cv is used to estimate out-of-sample residual variance via 5-fold CV\n", + "# note - in-sample predictions will usually underestimate the variance!\n", + "\n", + "# fit and predict\n", + "reg_proba.fit(X_train, y_train)\n", + "y_pred_proba = reg_proba.predict_proba(X_test)\n", + "\n", + "# evaluate\n", + "crps(y_test, y_pred_proba)" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_interval\n", + "\n", + "plot_crossplot_interval(y_test, y_pred_proba, coverage=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.1.2 two-step residual prediction " + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "31.76074641872546" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.ensemble import RandomForestRegressor\n", + "from sklearn.model_selection import KFold\n", + "\n", + "# estimator specification - use any sklearn regressor for reg_mean and reg_resid\n", + "reg_mean = RandomForestRegressor()\n", + "reg_resid = RandomForestRegressor()\n", + "reg_proba = ResidualDouble(reg_mean, estimator_resid=reg_resid, cv=KFold(5))\n", + "# cv is used to estimate out-of-sample residual variance via 5-fold CV\n", + "\n", + "# fit and predict\n", + "reg_proba.fit(X_train, y_train)\n", + "y_pred_proba = reg_proba.predict_proba(X_test)\n", + "\n", + "# evaluate\n", + "crps(y_test, y_pred_proba)" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_interval\n", + "\n", + "plot_crossplot_interval(y_test, y_pred_proba, coverage=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.1.3 bootstrap prediction intervals " + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "35.73832114895831" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "\n", + "from skpro.regression.bootstrap import BootstrapRegressor\n", + "\n", + "# estimator specification - use any sklearn regressor for reg_mean\n", + "reg_mean = LinearRegression()\n", + "reg_proba = BootstrapRegressor(reg_mean, n_bootstrap_samples=100)\n", + "\n", + "# fit and predict\n", + "reg_proba.fit(X_train, y_train)\n", + "y_pred_proba = reg_proba.predict_proba(X_test)\n", + "\n", + "# evaluate\n", + "crps(y_test, y_pred_proba)" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from skpro.utils.plotting import plot_crossplot_interval\n", + "\n", + "plot_crossplot_interval(y_test, y_pred_proba, coverage=0.9)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 Pipelines of `skpro` regressor and `sklearn` transformers " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`skpro` regressors can be pipelined with `sklearn` transformers, using the `skpro` pipeline.\n", + "\n", + "This ensure presence of `predict_proba` etc in the pipeline object.\n", + "\n", + "The syntax is exactly the same as for `sklearn`'s pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y)" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.impute import SimpleImputer as Imputer\n", + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.preprocessing import MinMaxScaler\n", + "\n", + "from skpro.regression.compose import Pipeline\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "# estimator specification\n", + "reg_mean = LinearRegression()\n", + "reg_proba = ResidualDouble(reg_mean)\n", + "\n", + "# pipeline is specified as a list of tuples (name, estimator)\n", + "pipe = Pipeline(\n", + " steps=[\n", + " (\"imputer\", Imputer()), # an sklearn transformer\n", + " (\"scaler\", MinMaxScaler()), # an sklearn transformer\n", + " (\"regressor\", reg_proba), # an skpro regressor\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', MinMaxScaler()),\n",
+       "                ('regressor', ResidualDouble(estimator=LinearRegression()))])
Please rerun this cell to show the HTML repr or trust the notebook.
" + ], + "text/plain": [ + "Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', MinMaxScaler()),\n", + " ('regressor', ResidualDouble(estimator=LinearRegression()))])" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pipe" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [], + "source": [ + "# the pipeline behaves as any skpro regressor\n", + "pipe.fit(X_train, y_train)\n", + "y_pred = pipe.predict(X=X_test)\n", + "y_pred_proba = pipe.predict_proba(X=X_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "the pipeline provides the familiar nested `get_params`, `set_params` interface:\n", + "\n", + "nested parameters are keyed `componentname__parametername`" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'steps': [('imputer', SimpleImputer()),\n", + " ('scaler', MinMaxScaler()),\n", + " ('regressor', ResidualDouble(estimator=LinearRegression()))],\n", + " 'imputer': SimpleImputer(),\n", + " 'scaler': MinMaxScaler(),\n", + " 'regressor': ResidualDouble(estimator=LinearRegression()),\n", + " 'imputer__add_indicator': False,\n", + " 'imputer__copy': True,\n", + " 'imputer__fill_value': None,\n", + " 'imputer__missing_values': nan,\n", + " 'imputer__strategy': 'mean',\n", + " 'imputer__verbose': 'deprecated',\n", + " 'scaler__clip': False,\n", + " 'scaler__copy': True,\n", + " 'scaler__feature_range': (0, 1),\n", + " 'regressor__cv': None,\n", + " 'regressor__distr_loc_scale_name': None,\n", + " 'regressor__distr_params': None,\n", + " 'regressor__distr_type': 'Normal',\n", + " 'regressor__estimator': LinearRegression(),\n", + " 'regressor__estimator_resid': None,\n", + " 'regressor__min_scale': 1e-10,\n", + " 'regressor__residual_trafo': 'absolute',\n", + " 'regressor__use_y_pred': False,\n", + " 'regressor__estimator__copy_X': True,\n", + " 'regressor__estimator__fit_intercept': True,\n", + " 'regressor__estimator__n_jobs': None,\n", + " 'regressor__estimator__normalize': 'deprecated',\n", + " 'regressor__estimator__positive': False}" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pipe.get_params()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "pipelines can also be created via simple lists of estimators,\n", + "\n", + "in this case names are generated automatically:" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [], + "source": [ + "# pipeline is specified as a list of tuples (name, estimator)\n", + "pipe = Pipeline(\n", + " steps=[\n", + " Imputer(), # an sklearn transformer\n", + " MinMaxScaler(), # an sklearn transformer\n", + " reg_proba, # an skpro regressor\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.3 Tuning of `skpro` regressors via grid and random search " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`skpro` provides grid and random search tuners to tune arbitrary probabilistic regressors,\n", + "\n", + "using probabilistic metrics. Besides this, they function as the `sklearn` tuners do." + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y)" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import KFold\n", + "\n", + "from skpro.metrics import CRPS\n", + "from skpro.model_selection import GridSearchCV\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "# cross-validation specification for tuner\n", + "cv = KFold(n_splits=3)\n", + "\n", + "# estimator to be tuned\n", + "estimator = ResidualDouble(LinearRegression())\n", + "\n", + "# tuning grid - do we fit an intercept in the linear regression?\n", + "param_grid = {\"estimator__fit_intercept\": [True, False]}\n", + "\n", + "# metric to be optimized\n", + "crps_metric = CRPS()\n", + "\n", + "# specification of the grid search tuner\n", + "gscv = GridSearchCV(\n", + " estimator=estimator,\n", + " param_grid=param_grid,\n", + " cv=cv,\n", + " scoring=crps_metric,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
GridSearchCV(cv=KFold(n_splits=3, random_state=None, shuffle=False),\n",
+       "             estimator=ResidualDouble(estimator=LinearRegression()),\n",
+       "             param_grid={'estimator__fit_intercept': [True, False]},\n",
+       "             scoring=CRPS())
Please rerun this cell to show the HTML repr or trust the notebook.
" + ], + "text/plain": [ + "GridSearchCV(cv=KFold(n_splits=3, random_state=None, shuffle=False),\n", + " estimator=ResidualDouble(estimator=LinearRegression()),\n", + " param_grid={'estimator__fit_intercept': [True, False]},\n", + " scoring=CRPS())" + ] + }, + "execution_count": 71, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "gscv" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "the grid search tuner behaves like any `skpro` probabilistic regressor:" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "metadata": {}, + "outputs": [], + "source": [ + "gscv.fit(X_train, y_train)\n", + "y_pred = gscv.predict(X_test)\n", + "y_pred_proba = gscv.predict_proba(X_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "random search is similar, except that instead of a grid a parameter sampler should be specified:" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": {}, + "outputs": [], + "source": [ + "from skpro.model_selection import RandomizedSearchCV\n", + "\n", + "# only difference to GridSearchCV is the param_distributions argument\n", + "\n", + "# specification of the random search parameter sampler\n", + "param_distributions = {\"estimator__fit_intercept\": [True, False]}\n", + "\n", + "# specification of the random search tuner\n", + "rscv = RandomizedSearchCV(\n", + " estimator=estimator,\n", + " param_distributions=param_distributions,\n", + " cv=cv,\n", + " scoring=crps_metric,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.4 Bagging/mixture ensemble of probabilistic regressors " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Classical bagging does the following, for a wrapped estimator:\n", + "\n", + "In `fit`:\n", + "\n", + "1. subsample rows and/or columns of `X`, `y` to `X_subs`, `y_subs`\n", + "2. fit clone of wrapped estimator to `X_subs`, `y_subs`\n", + "3. Repeat 1-2 `n_estimators` times, store that many fitted clones.\n", + "\n", + "In `predict`, for `X_test`:\n", + "\n", + "1. for all fitted clones, obtain predictions on `X_test` - these are distributions\n", + "2. return the uniform mixture of these distributions, per test sample" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.datasets import load_diabetes\n", + "from sklearn.model_selection import train_test_split\n", + "\n", + "X, y = load_diabetes(return_X_y=True, as_frame=True)\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y)" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "\n", + "from skpro.regression.ensemble import BaggingRegressor\n", + "from skpro.regression.residual import ResidualDouble\n", + "\n", + "reg_mean = LinearRegression()\n", + "reg_proba = ResidualDouble(reg_mean)\n", + "\n", + "ens = BaggingRegressor(reg_proba, n_estimators=10)\n", + "ens.fit(X_train, y_train)\n", + "\n", + "y_pred = ens.predict_proba(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "\"Mixture(columns=Index(['target'], dtype='object'),\\n distributions=[Normal(columns=Index(['target'], dtype='object'),\\n index=Index([ 61, 354, 104, 3, 76, 318, 205, 389, 12, 193,\\n ...\\n 199, 175, 134, 280, 74, 181, 297, 350, 110, 32],\\n dtype='int64', length=111),\\n mu=array([[178.82396796],\\n [193.48352954],\\n [151.05555714],\\n [174.3113541 ],\\n [182.55152692],\\n [169.66476115],\\n [241.12041457],\\n [ 81.8989...\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626],\\n [39.20371626]]))],\\n index=Index([ 61, 354, 104, 3, 76, 318, 205, 389, 12, 193,\\n ...\\n 199, 175, 134, 280, 74, 181, 297, 350, 110, 32],\\n dtype='int64', length=111))\"" + ] + }, + "execution_count": 76, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# y_pred is a mixture distribution!\n", + "str(y_pred)" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal,\n", + " skpro.distributions.normal.Normal]" + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[type(x) for x in y_pred.distributions]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Extension guide - implementing your own probabilistic regressor " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "`skpro` is meant to be easily extensible, for direct contribution to `skpro` as well as for local/private extension with custom methods.\n", + "\n", + "To get started:\n", + "\n", + "* Follow the [\"implementing estimator\" developer guide](https://skpro.readthedocs.io/en/stable/developer_guide/add_estimators.html)\n", + "* Use the [probabilistic regressor template](https://github.com/sktime/skpro/blob/main/extension_templates/regression.py) to get started\n", + "\n", + "1. Read through the [probabilistic regression extension template](https://github.com/sktime/skpro/blob/main/extension_templates/regression.py) - this is a `python` file with `todo` blocks that mark the places in which changes need to be added.\n", + "2. Copy the proba regressor extension template to a local folder in your own repository (local/private extension), or to a suitable location in your clone of the `skpro` or affiliated repository (if contributed extension), inside `skpro.regression`; rename the file and update the file docstring appropriately.\n", + "3. Address the \"todo\" parts. Usually, this means: changing the name of the class, setting the tag values, specifying hyper-parameters, filling in `__init__`, `_fit`, and at least one of the probabilistic prediction methods, preferably `_predict_proba` (for details see the extension template). You can add private methods as long as they do not override the default public interface. For more details, see the extension template.\n", + "4. To test your estimator manually: import your estimator and run it in the worfklows in Section 1; then use it in the compositors in Section 3.\n", + "5. To test your estimator automatically: call `skpro.utils.check_estimator` on your estimator. You can call this on a class or object instance. Ensure you have specified test parameters in the `get_test_params` method, according to the extension template.\n", + "\n", + "In case of direct contribution to `skpro` or one of its affiliated packages, additionally:\n", + "\n", + "* Add yourself as an author to the code, and to the `CODEOWNERS` for the new estimator file(s).\n", + "* Create a pull request that contains only the new estimators (and their inheritance tree, if it's not just one class), as well as the automated tests as described above.\n", + "* In the pull request, describe the estimator and optimally provide a publication or other technical reference for the strategy it implements.\n", + "* Before making the pull request, ensure that you have all necessary permissions to contribute the code to a permissive license (BSD-3) open source project." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Summary\n", + "\n", + "* `skpro` is a unified interface toolbox for probabilistic supervised regression, that is, for prediction intervals, quantiles, fully distributional predictions, in a tabular regression setting. The interface is fully interoperable with `scikit-learn` and `scikit-base` interface specifications.\n", + "\n", + "* `skpro` comes with rich composition functionality that allows to build complex pipelines easily, and connect easily with other parts of the open source ecosystem, such as `scikit-learn` and individual algorithm libraries.\n", + "\n", + "* `skpro` is easy to extend, and comes with user friendly tools to facilitate implementing and testing your own probabilistic regressors and composition principles." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "### Credits:\n", + "\n", + "noteook creation: fkiraly\n", + "\n", + "skpro: https://github.com/sktime/skpro/blob/main/CONTRIBUTORS.md" + ] } ], "metadata": { @@ -142,7 +4777,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.3" + "version": "3.11.4" }, "orig_nbformat": 4, "vscode": {