diff --git a/examples/01_skpro_intro.ipynb b/examples/01_skpro_intro.ipynb
index 7d8fc3ee9..08029a61c 100644
--- a/examples/01_skpro_intro.ipynb
+++ b/examples/01_skpro_intro.ipynb
@@ -373,7 +373,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.2 simple evaluation workflow for probabilistic predictions"
+    "## 1.2 simple evaluation workflow for probabilistic predictions"
    ]
   },
   {
@@ -478,7 +478,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.2.1 diagnostic visualisations"
+    "## 1.3 diagnostic visualisations"
    ]
   },
   {
@@ -587,14 +587,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.3 `skpro` objects - `scikit-base` interface, searching for regressors and metrics"
+    "### 1.4 `skpro` objects - `scikit-base` interface, searching for regressors and metrics"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.3.1 primer on `skpro` object interface <a class=\"anchor\" id=\"section1_3_1\"></a>"
+    "### 1.4.1 primer on `skpro` object interface <a class=\"anchor\" id=\"section1_3_1\"></a>"
    ]
   },
   {
@@ -722,7 +722,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 1.3.2 searching for regressors and metrics <a class=\"anchor\" id=\"section1_3_2\"></a>"
+    "### 1.4.2 searching for regressors and metrics <a class=\"anchor\" id=\"section1_3_2\"></a>"
    ]
   },
   {
@@ -820,6 +820,237 @@
     "all_objects(\"metric\", as_dataframe=True, filter_tags={\"scitype:y_pred\": \"pred_proba\"})"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Prediction types, metrics, benchmarking <a class=\"anchor\" id=\"chapter2\"></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This section gives more details on:\n",
+    "\n",
+    "* different prediction types, including a methodological primer\n",
+    "* the API of metrics to compare probabilistic predictions to non-probabilistic actuals\n",
+    "* utilities for batch benchmarking of estimators and metrics"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.1 Probabilistic predictions - methodological primer <a class=\"anchor\" id=\"section2_1\"></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**readers familir with, or less interested in theory, may like to skip section 2.1**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In supervised learning - probabilistic or not:\n",
+    "\n",
+    "* we fit estimator to i.i.d samples $(X_1, Y_1), \\dots, (X_N, Y_N) \\sim (X_*, Y_*)$\n",
+    "* and want to predict $y$ given $x$ accurately, for $(x, y) \\sim (X_*, Y_*)$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let $y$ be the (true) value, for an observed feature $x$\n",
+    "\n",
+    "(we consider $y$ a random variable)\n",
+    "\n",
+    "| Name | param | prediction/estimate of | `skpro` |\n",
+    "| ---- | ----- | ---------------------- | -------- |\n",
+    "| point forecast | | conditional expectation $\\mathbb{E}[y\\|x]$ | `predict` |\n",
+    "| variance forecast | | conditional variance $Var[y\\|x]$ | `predict_var` |\n",
+    "| quantile forecast | $\\alpha\\in (0,1)$ | $\\alpha$-quantile of $y\\|x$ | `predict_quantiles` |\n",
+    "| interval forecast | $c\\in (0,1)$| $[a,b]$ s.t. $P(a\\le y \\le b\\| x) = c$ | `predict_interval` |\n",
+    "| distribution forecast | | the law/distribution of $y\\|x$ | `predict_proba` |"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "##### More formal details & intuition:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "let's consider the toy example again"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_diabetes\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "X, y = load_diabetes(return_X_y=True, as_frame=True)\n",
+    "X_train, X_new, y_train, _ = train_test_split(X, y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_diabetes\n",
+    "from sklearn.ensemble import RandomForestRegressor\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "from skpro.regression.residual import ResidualDouble\n",
+    "\n",
+    "X, y = load_diabetes(return_X_y=True, as_frame=True)\n",
+    "X_train, X_new, y_train, _ = train_test_split(X, y)\n",
+    "\n",
+    "\n",
+    "reg_mean = RandomForestRegressor()\n",
+    "reg_proba = ResidualDouble(reg_mean)\n",
+    "\n",
+    "reg_proba.fit(X_train, y_train)\n",
+    "y_pred_proba = reg_proba.predict_proba(X_new)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* a **\"point forecast\"** is a prediction/estimate of the conditional expectation $\\mathbb{E}[y|x]$.\\\n",
+    " **Intuition**: \"out of many repetitions/worlds, this value is the arithmetic average of all observations\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# if y_pred_proba were *true*, here's how many repetitions would look like:\n",
+    "\n",
+    "# repeating this line is \"one repetition\"\n",
+    "y_pred_proba.sample().head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# \"doing many times and taking the mean\" -> usual point prediction\n",
+    "y_pred_proba.mean().head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* a **\"variance forecast\"** is a prediction/estimate of the conditional expectation $Var[y|x]$.\\\n",
+    " **Intuition:** \"out of many repetitions/worlds, this value is the average squared distance of the observation to the perfect point forecast\".\n",
+    "* a **\"quantile forecast\"**, at quantile point $\\alpha\\in (0,1)$ is a prediction/estimate of the $\\alpha$-quantile of $y'|y$, i.e., of $F^{-1}_{y|x}(\\alpha)$, where $F^{-1}$ is the (generalized) inverse cdf = quantile function of the random variable y|x.\\\n",
+    " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $\\alpha$ will have equal or smaller than this value.\"\n",
+    "* an **\"interval forecast\"** or \"predictive interval\" with (symmetric) coverage $c\\in (0,1)$ is a prediction/estimate pair of lower bound $a$ and upper bound $b$ such that $P(a\\le y \\le b| x) = c$ and $P(y \\gneq b| x) = P(y \\lneq a| x) = (1 - c) /2$.\\\n",
+    " **Intuition**: \"out of many repetitions/worlds, a fraction of exactly $c$ will be contained in the interval $[a,b]$, and being above is equally likely as being below\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# same as above - take many samples, and then compute element-wise statistics\n",
+    "\n",
+    "# e.g., predict_var should give the same result as infinite large sample's variance\n",
+    "y_pred_proba.var().head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* a **\"distribution forecast\"** or \"full probabilistic forecast\" is a prediction/estimate of the distribution of $y|x$, e.g., \"it's a normal distribution with mean 42 and variance 1\".\\\n",
+    "**Intuition**: exhaustive description of the generating mechanism of many repetitions/worlds."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "note: the true distribution is unknown, and not accessible easily!\n",
+    "\n",
+    "`y_pred_proba` is a distribution, but in general not equal to the true one!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2.3 Benchmark evaluation of probabilistic regressors <a class=\"anchor\" id=\"section2_3\"></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "for quick evaluation and benchmarking,\n",
+    "\n",
+    "the `benchmarking.evaluate` utility can be used:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_diabetes\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import KFold\n",
+    "\n",
+    "from skpro.benchmarking.evaluate import evaluate\n",
+    "from skpro.metrics import CRPS\n",
+    "from skpro.regression.residual import ResidualDouble\n",
+    "\n",
+    "# 1. specify dataset\n",
+    "X, y = load_diabetes(return_X_y=True, as_frame=True)\n",
+    "\n",
+    "# 2. specify estimator\n",
+    "estimator = ResidualDouble(LinearRegression())\n",
+    "\n",
+    "# 3. specify cross-validation schema\n",
+    "cv = KFold(n_splits=3)\n",
+    "\n",
+    "# 4. specify evaluation metric\n",
+    "crps = CRPS()\n",
+    "\n",
+    "# 5. evaluate - run the benchmark\n",
+    "results = evaluate(estimator=estimator, X=X, y=y, cv=cv, scoring=crps)\n",
+    "\n",
+    "# results are pd.DataFrame\n",
+    "# each row is one repetition of the cross-validation on one fold fit/predict/evaluate\n",
+    "# columns report performance, runtime, and other optional information (see docstring)\n",
+    "results"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},