diff --git a/README.md b/README.md index 723111c..1237e33 100644 --- a/README.md +++ b/README.md @@ -125,7 +125,7 @@ _Detailed notes at the link_ https://compphysics.github.io/MachineLearning/doc/L | Recommended readings | Hastie et al Chapter 3 | | | Lecture material at https://compphysics.github.io/MLErasmus/doc/web/course.html sessions 3 and 4 | | | Video of Lecture at https://youtu.be/iqRKUPJr_bY | -| | Handwritten notes at Handwritten notes at https://github.com/CompPhysics/MLErasmus/blob/master/doc/HandwrittenNotes/2023/NotesOct162023.pdf | +| | Handwritten notes at https://github.com/CompPhysics/MLErasmus/blob/master/doc/HandwrittenNotes/2023/NotesOct162023.pdf | | Monday October 23 | - _Lecture 815am-10am_: Resampling Methods and Bias-Variance tradeoff (MHJ) | | Recommended readings | Hastie et al chapter 7 | | | Lecture material at https://compphysics.github.io/MLErasmus/doc/web/course.html session 4 material | diff --git a/doc/pub/day3/html/day3-bs.html b/doc/pub/day3/html/day3-bs.html index 61f10b7..ab8ceb1 100644 --- a/doc/pub/day3/html/day3-bs.html +++ b/doc/pub/day3/html/day3-bs.html @@ -281,7 +281,11 @@ ('Exercise 2: Expectation values for Ridge regression', 2, None, - 'exercise-2-expectation-values-for-ridge-regression')]} + 'exercise-2-expectation-values-for-ridge-regression'), + ('Exercise 3: Bias-Variance tradeoff', + 2, + None, + 'exercise-3-bias-variance-tradeoff')]} end of tocinfo --> @@ -405,6 +409,7 @@
  • Overarching aims of the exercises this week
  • Exercise 1: Expectation values for ordinary least squares expressions
  • Exercise 2: Expectation values for Ridge regression
  • +
  • Exercise 3: Bias-Variance tradeoff
  • @@ -433,7 +438,7 @@

    Data Analysis and Machine Learning: Ridge and Lasso Regression and Resamplin
    -

    October 15 and 22, 2023

    +

    October 16 and 23, 2023


    @@ -447,8 +452,8 @@

    Plans for Sessions 4-6

  • More on Ridge and Lasso Regression
  • Statistics, probability theory and resampling methods
  • @@ -3571,6 +3576,74 @@

    Exerc

    and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of the Ridge parameters \( \boldsymbol{\beta} \) goes to zero.

    + + + +

    Exercise 3: Bias-Variance tradeoff

    + +

    The aim of the exercises is to derive the equations for the bias-variance tradeoff to be used in project 1 as well as testing this for a simpler function using the bootstrap method.

    + +

    Consider a +dataset \( \mathcal{L} \) consisting of the data +\( \mathbf{X}_\mathcal{L}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). +

    + +

    We assume that the true data is generated from a noisy model

    + +$$ +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}. +$$ + +

    Here \( \epsilon \) is normally distributed with mean zero and standard +deviation \( \sigma^2 \). +

    + +

    In our derivation of the ordinary least squares method we defined +an approximation to the function \( f \) in terms of the parameters +\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, +that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). +

    + +

    The parameters \( \boldsymbol{\beta} \) are in turn found by optimizing the mean +squared error via the so-called cost function +

    + +$$ +C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. +$$ + +

    Here the expected value \( \mathbb{E} \) is the sample value.

    + +

    Show that you can rewrite this in terms of a term which contains the variance of the model itself (the so-called variance term), a +term which measures the deviation from the true data and the mean value of the model (the bias term) and finally the variance of the noise. +That is, show that +

    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2, +$$ + +

    with

    +$$ +\mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right], +$$ + +

    and

    +$$ +\mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\boldsymbol{y}}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2. +$$ + +

    Explain what the terms mean and discuss their interpretations.

    + +

    Perform then a bias-variance analysis of a simple one-dimensional (or other models of your choice) function by +studying the MSE value as function of the complexity of your model. Use ordinary least squares only. +

    + +

    Discuss the bias and variance trade-off as function +of your model complexity (the degree of the polynomial) and the number +of data points, and possibly also your training and test data using the bootstrap resampling method. +You can follow the code example in the jupyter-book at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff. +

    + diff --git a/doc/pub/day3/html/day3-reveal.html b/doc/pub/day3/html/day3-reveal.html index 759739b..2f7c403 100644 --- a/doc/pub/day3/html/day3-reveal.html +++ b/doc/pub/day3/html/day3-reveal.html @@ -184,7 +184,7 @@

    Data Analysis and Machine Learning: Ridge and La
    -

    October 15 and 22, 2023

    +

    October 16 and 23, 2023


    @@ -202,9 +202,9 @@

    Plans for Sessions 4-6

  • Statistics, probability theory and resampling methods
  • @@ -3667,6 +3667,84 @@

    Exercise 2: Expectat

    and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of the Ridge parameters \( \boldsymbol{\beta} \) goes to zero.

    + + + +

    Exercise 3: Bias-Variance tradeoff

    + +

    The aim of the exercises is to derive the equations for the bias-variance tradeoff to be used in project 1 as well as testing this for a simpler function using the bootstrap method.

    + +

    Consider a +dataset \( \mathcal{L} \) consisting of the data +\( \mathbf{X}_\mathcal{L}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). +

    + +

    We assume that the true data is generated from a noisy model

    + +

     
    +$$ +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}. +$$ +

     
    + +

    Here \( \epsilon \) is normally distributed with mean zero and standard +deviation \( \sigma^2 \). +

    + +

    In our derivation of the ordinary least squares method we defined +an approximation to the function \( f \) in terms of the parameters +\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, +that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). +

    + +

    The parameters \( \boldsymbol{\beta} \) are in turn found by optimizing the mean +squared error via the so-called cost function +

    + +

     
    +$$ +C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. +$$ +

     
    + +

    Here the expected value \( \mathbb{E} \) is the sample value.

    + +

    Show that you can rewrite this in terms of a term which contains the variance of the model itself (the so-called variance term), a +term which measures the deviation from the true data and the mean value of the model (the bias term) and finally the variance of the noise. +That is, show that +

    +

     
    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2, +$$ +

     
    + +

    with

    +

     
    +$$ +\mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right], +$$ +

     
    + +

    and

    +

     
    +$$ +\mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\boldsymbol{y}}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2. +$$ +

     
    + +

    Explain what the terms mean and discuss their interpretations.

    + +

    Perform then a bias-variance analysis of a simple one-dimensional (or other models of your choice) function by +studying the MSE value as function of the complexity of your model. Use ordinary least squares only. +

    + +

    Discuss the bias and variance trade-off as function +of your model complexity (the degree of the polynomial) and the number +of data points, and possibly also your training and test data using the bootstrap resampling method. +You can follow the code example in the jupyter-book at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff. +

    + diff --git a/doc/pub/day3/html/day3-solarized.html b/doc/pub/day3/html/day3-solarized.html index 1fb9c0d..75423f1 100644 --- a/doc/pub/day3/html/day3-solarized.html +++ b/doc/pub/day3/html/day3-solarized.html @@ -308,7 +308,11 @@ ('Exercise 2: Expectation values for Ridge regression', 2, None, - 'exercise-2-expectation-values-for-ridge-regression')]} + 'exercise-2-expectation-values-for-ridge-regression'), + ('Exercise 3: Bias-Variance tradeoff', + 2, + None, + 'exercise-3-bias-variance-tradeoff')]} end of tocinfo --> @@ -346,7 +350,7 @@

    Data Analysis and Machine Learning: Ridge and Lasso Regression and Resamplin
    -

    October 15 and 22, 2023

    +

    October 16 and 23, 2023


    @@ -357,8 +361,8 @@

    Plans for Sessions 4-6

  • More on Ridge and Lasso Regression
  • Statistics, probability theory and resampling methods










  • @@ -3472,6 +3476,74 @@

    Exercise 2: Expectat

    and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of the Ridge parameters \( \boldsymbol{\beta} \) goes to zero.

    + + + +

    Exercise 3: Bias-Variance tradeoff

    + +

    The aim of the exercises is to derive the equations for the bias-variance tradeoff to be used in project 1 as well as testing this for a simpler function using the bootstrap method.

    + +

    Consider a +dataset \( \mathcal{L} \) consisting of the data +\( \mathbf{X}_\mathcal{L}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). +

    + +

    We assume that the true data is generated from a noisy model

    + +$$ +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}. +$$ + +

    Here \( \epsilon \) is normally distributed with mean zero and standard +deviation \( \sigma^2 \). +

    + +

    In our derivation of the ordinary least squares method we defined +an approximation to the function \( f \) in terms of the parameters +\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, +that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). +

    + +

    The parameters \( \boldsymbol{\beta} \) are in turn found by optimizing the mean +squared error via the so-called cost function +

    + +$$ +C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. +$$ + +

    Here the expected value \( \mathbb{E} \) is the sample value.

    + +

    Show that you can rewrite this in terms of a term which contains the variance of the model itself (the so-called variance term), a +term which measures the deviation from the true data and the mean value of the model (the bias term) and finally the variance of the noise. +That is, show that +

    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2, +$$ + +

    with

    +$$ +\mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right], +$$ + +

    and

    +$$ +\mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\boldsymbol{y}}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2. +$$ + +

    Explain what the terms mean and discuss their interpretations.

    + +

    Perform then a bias-variance analysis of a simple one-dimensional (or other models of your choice) function by +studying the MSE value as function of the complexity of your model. Use ordinary least squares only. +

    + +

    Discuss the bias and variance trade-off as function +of your model complexity (the degree of the polynomial) and the number +of data points, and possibly also your training and test data using the bootstrap resampling method. +You can follow the code example in the jupyter-book at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff. +

    +
    diff --git a/doc/pub/day3/html/day3.html b/doc/pub/day3/html/day3.html index 94818c6..def74b7 100644 --- a/doc/pub/day3/html/day3.html +++ b/doc/pub/day3/html/day3.html @@ -385,7 +385,11 @@ ('Exercise 2: Expectation values for Ridge regression', 2, None, - 'exercise-2-expectation-values-for-ridge-regression')]} + 'exercise-2-expectation-values-for-ridge-regression'), + ('Exercise 3: Bias-Variance tradeoff', + 2, + None, + 'exercise-3-bias-variance-tradeoff')]} end of tocinfo --> @@ -423,7 +427,7 @@

    Data Analysis and Machine Learning: Ridge and Lasso Regression and Resamplin


    -

    October 15 and 22, 2023

    +

    October 16 and 23, 2023


    @@ -434,8 +438,8 @@

    Plans for Sessions 4-6

  • More on Ridge and Lasso Regression
  • Statistics, probability theory and resampling methods










  • @@ -3549,6 +3553,74 @@

    Exercise 2: Expectat

    and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of the Ridge parameters \( \boldsymbol{\beta} \) goes to zero.

    + + + +

    Exercise 3: Bias-Variance tradeoff

    + +

    The aim of the exercises is to derive the equations for the bias-variance tradeoff to be used in project 1 as well as testing this for a simpler function using the bootstrap method.

    + +

    Consider a +dataset \( \mathcal{L} \) consisting of the data +\( \mathbf{X}_\mathcal{L}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). +

    + +

    We assume that the true data is generated from a noisy model

    + +$$ +\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}. +$$ + +

    Here \( \epsilon \) is normally distributed with mean zero and standard +deviation \( \sigma^2 \). +

    + +

    In our derivation of the ordinary least squares method we defined +an approximation to the function \( f \) in terms of the parameters +\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model, +that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). +

    + +

    The parameters \( \boldsymbol{\beta} \) are in turn found by optimizing the mean +squared error via the so-called cost function +

    + +$$ +C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]. +$$ + +

    Here the expected value \( \mathbb{E} \) is the sample value.

    + +

    Show that you can rewrite this in terms of a term which contains the variance of the model itself (the so-called variance term), a +term which measures the deviation from the true data and the mean value of the model (the bias term) and finally the variance of the noise. +That is, show that +

    +$$ +\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2, +$$ + +

    with

    +$$ +\mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right], +$$ + +

    and

    +$$ +\mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\boldsymbol{y}}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2. +$$ + +

    Explain what the terms mean and discuss their interpretations.

    + +

    Perform then a bias-variance analysis of a simple one-dimensional (or other models of your choice) function by +studying the MSE value as function of the complexity of your model. Use ordinary least squares only. +

    + +

    Discuss the bias and variance trade-off as function +of your model complexity (the degree of the polynomial) and the number +of data points, and possibly also your training and test data using the bootstrap resampling method. +You can follow the code example in the jupyter-book at https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff. +

    +
    diff --git a/doc/pub/day3/ipynb/day3.ipynb b/doc/pub/day3/ipynb/day3.ipynb index 8cdade7..96aee94 100644 --- a/doc/pub/day3/ipynb/day3.ipynb +++ b/doc/pub/day3/ipynb/day3.ipynb @@ -2,8 +2,10 @@ "cells": [ { "cell_type": "markdown", - "id": "7fcd4e2f", - "metadata": {}, + "id": "dcfb8aab", + "metadata": { + "editable": true + }, "source": [ "\n", @@ -12,19 +14,23 @@ }, { "cell_type": "markdown", - "id": "4a75243e", - "metadata": {}, + "id": "7befd095", + "metadata": { + "editable": true + }, "source": [ "# Data Analysis and Machine Learning: Ridge and Lasso Regression and Resampling Methods\n", "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway and Department of Physics and Astronomy and Facility for Rare Isotope Beams and National Superconducting Cyclotron Laboratory, Michigan State University, USA\n", "\n", - "Date: **October 15 and 22, 2023**" + "Date: **October 16 and 23, 2023**" ] }, { "cell_type": "markdown", - "id": "7c319be3", - "metadata": {}, + "id": "c2018078", + "metadata": { + "editable": true + }, "source": [ "## Plans for Sessions 4-6\n", "\n", @@ -32,15 +38,17 @@ "\n", "* Statistics, probability theory and resampling methods\n", "\n", - " * [Video of Lecture October 15 to be added](https://youtu.be/)\n", + " * [Video of Lecture October 16 to be added](https://youtu.be/iqRKUPJr_bY)\n", "\n", - " * [Video of Lecture October 22 to be added](https://youtu.be/)" + " * [Video of Lecture October 23 to be added](https://youtu.be/)" ] }, { "cell_type": "markdown", - "id": "5217a54a", - "metadata": {}, + "id": "db5859b5", + "metadata": { + "editable": true + }, "source": [ "## Ridge and LASSO Regression\n", "\n", @@ -50,8 +58,10 @@ }, { "cell_type": "markdown", - "id": "6b16a092", - "metadata": {}, + "id": "86774668", + "metadata": { + "editable": true + }, "source": [ "$$\n", "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in {\\mathbb{R}}^{p}}}\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right)\\right\\}.\n", @@ -60,16 +70,20 @@ }, { "cell_type": "markdown", - "id": "9eaad472", - "metadata": {}, + "id": "3c2c57bc", + "metadata": { + "editable": true + }, "source": [ "or we can state it as" ] }, { "cell_type": "markdown", - "id": "72444a78", - "metadata": {}, + "id": "bbb2dfb8", + "metadata": { + "editable": true + }, "source": [ "$$\n", "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in\n", @@ -79,16 +93,20 @@ }, { "cell_type": "markdown", - "id": "891fdce0", - "metadata": {}, + "id": "148b386e", + "metadata": { + "editable": true + }, "source": [ "where we have used the definition of a norm-2 vector, that is" ] }, { "cell_type": "markdown", - "id": "79e19a07", - "metadata": {}, + "id": "ecb6c1e6", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\vert\\vert \\boldsymbol{x}\\vert\\vert_2 = \\sqrt{\\sum_i x_i^2}.\n", @@ -97,8 +115,10 @@ }, { "cell_type": "markdown", - "id": "33387932", - "metadata": {}, + "id": "550bb691", + "metadata": { + "editable": true + }, "source": [ "## From OLS to Ridge and Lasso\n", "\n", @@ -110,8 +130,10 @@ }, { "cell_type": "markdown", - "id": "216ae801", - "metadata": {}, + "id": "e0166782", + "metadata": { + "editable": true + }, "source": [ "$$\n", "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in\n", @@ -121,8 +143,10 @@ }, { "cell_type": "markdown", - "id": "a71c6474", - "metadata": {}, + "id": "869ba708", + "metadata": { + "editable": true + }, "source": [ "which leads to the Ridge regression minimization problem where we\n", "require that $\\vert\\vert \\boldsymbol{\\beta}\\vert\\vert_2^2\\le t$, where $t$ is\n", @@ -131,8 +155,10 @@ }, { "cell_type": "markdown", - "id": "29bc8526", - "metadata": {}, + "id": "420257c6", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{X},\\boldsymbol{\\beta})=\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\beta}\\vert\\vert_1,\n", @@ -141,16 +167,20 @@ }, { "cell_type": "markdown", - "id": "440d1dcd", - "metadata": {}, + "id": "80f0707b", + "metadata": { + "editable": true + }, "source": [ "we have a new optimization equation" ] }, { "cell_type": "markdown", - "id": "053066f3", - "metadata": {}, + "id": "a3b4e965", + "metadata": { + "editable": true + }, "source": [ "$$\n", "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in\n", @@ -160,8 +190,10 @@ }, { "cell_type": "markdown", - "id": "ab631a74", - "metadata": {}, + "id": "6b3e8c06", + "metadata": { + "editable": true + }, "source": [ "which leads to Lasso regression. Lasso stands for least absolute shrinkage and selection operator. \n", "\n", @@ -170,8 +202,10 @@ }, { "cell_type": "markdown", - "id": "ee57164b", - "metadata": {}, + "id": "e5301748", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\vert\\vert \\boldsymbol{x}\\vert\\vert_1 = \\sum_i \\vert x_i\\vert.\n", @@ -180,8 +214,10 @@ }, { "cell_type": "markdown", - "id": "72acfab8", - "metadata": {}, + "id": "55d97304", + "metadata": { + "editable": true + }, "source": [ "## Deriving the Ridge Regression Equations\n", "\n", @@ -190,8 +226,10 @@ }, { "cell_type": "markdown", - "id": "e91dae68", - "metadata": {}, + "id": "1d40b7bb", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{X},\\boldsymbol{\\beta})=\\left\\{(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})^T(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\right\\}+\\lambda\\boldsymbol{\\beta}^T\\boldsymbol{\\beta},\n", @@ -200,8 +238,10 @@ }, { "cell_type": "markdown", - "id": "960b85a2", - "metadata": {}, + "id": "154fbe54", + "metadata": { + "editable": true + }, "source": [ "and \n", "taking the derivatives with respect to $\\boldsymbol{\\beta}$ we obtain then\n", @@ -212,8 +252,10 @@ }, { "cell_type": "markdown", - "id": "bec7054f", - "metadata": {}, + "id": "301adcda", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}_{\\mathrm{Ridge}} = \\left(\\boldsymbol{X}^T\\boldsymbol{X}+\\lambda\\boldsymbol{I}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y},\n", @@ -222,16 +264,20 @@ }, { "cell_type": "markdown", - "id": "d9f7bf1e", - "metadata": {}, + "id": "a039a074", + "metadata": { + "editable": true + }, "source": [ "with $\\boldsymbol{I}$ being a $p\\times p$ identity matrix with the constraint that" ] }, { "cell_type": "markdown", - "id": "9eaa21b8", - "metadata": {}, + "id": "2343998c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\sum_{i=0}^{p-1} \\beta_i^2 \\leq t,\n", @@ -240,8 +286,10 @@ }, { "cell_type": "markdown", - "id": "7ae4b47c", - "metadata": {}, + "id": "379978c2", + "metadata": { + "editable": true + }, "source": [ "with $t$ a finite positive number. \n", "\n", @@ -250,8 +298,10 @@ }, { "cell_type": "markdown", - "id": "b7761c1f", - "metadata": {}, + "id": "887c145a", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}_{\\mathrm{OLS}} = \\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y},\n", @@ -260,8 +310,10 @@ }, { "cell_type": "markdown", - "id": "13159e48", - "metadata": {}, + "id": "7146c525", + "metadata": { + "editable": true + }, "source": [ "which can lead to singular matrices. However, with the SVD, we can always compute the inverse of the matrix $\\boldsymbol{X}^T\\boldsymbol{X}$.\n", "\n", @@ -274,8 +326,10 @@ }, { "cell_type": "markdown", - "id": "199ba41c", - "metadata": {}, + "id": "3449285f", + "metadata": { + "editable": true + }, "source": [ "## SVD analysis\n", "\n", @@ -285,8 +339,10 @@ }, { "cell_type": "markdown", - "id": "a39fd7fb", - "metadata": {}, + "id": "dd7fafee", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\tilde{\\boldsymbol{y}}_{\\mathrm{OLS}}=\\boldsymbol{X}\\boldsymbol{\\beta} =\\boldsymbol{U}\\boldsymbol{U}^T\\boldsymbol{y}.\n", @@ -295,16 +351,20 @@ }, { "cell_type": "markdown", - "id": "8d703fef", - "metadata": {}, + "id": "07394ecb", + "metadata": { + "editable": true + }, "source": [ "For Ridge regression this becomes" ] }, { "cell_type": "markdown", - "id": "8d65d498", - "metadata": {}, + "id": "87d7bca7", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\tilde{\\boldsymbol{y}}_{\\mathrm{Ridge}}=\\boldsymbol{X}\\boldsymbol{\\beta}_{\\mathrm{Ridge}} = \\boldsymbol{U\\Sigma V^T}\\left(\\boldsymbol{V}\\boldsymbol{\\Sigma}^2\\boldsymbol{V}^T+\\lambda\\boldsymbol{I} \\right)^{-1}(\\boldsymbol{U\\Sigma V^T})^T\\boldsymbol{y}=\\sum_{j=0}^{p-1}\\boldsymbol{u}_j\\boldsymbol{u}_j^T\\frac{\\sigma_j^2}{\\sigma_j^2+\\lambda}\\boldsymbol{y},\n", @@ -313,16 +373,20 @@ }, { "cell_type": "markdown", - "id": "b115c851", - "metadata": {}, + "id": "762fe137", + "metadata": { + "editable": true + }, "source": [ "with the vectors $\\boldsymbol{u}_j$ being the columns of $\\boldsymbol{U}$ from the SVD of the matrix $\\boldsymbol{X}$." ] }, { "cell_type": "markdown", - "id": "795d1be1", - "metadata": {}, + "id": "0a1f2b28", + "metadata": { + "editable": true + }, "source": [ "## Interpreting the Ridge results\n", "\n", @@ -331,8 +395,10 @@ }, { "cell_type": "markdown", - "id": "ead04830", - "metadata": {}, + "id": "01f4c52d", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\sigma_j^2}{\\sigma_j^2+\\lambda} \\leq 1.\n", @@ -341,8 +407,10 @@ }, { "cell_type": "markdown", - "id": "cfb9f42f", - "metadata": {}, + "id": "c81e7d01", + "metadata": { + "editable": true + }, "source": [ "Ridge regression finds the coordinates of $\\boldsymbol{y}$ with respect to the\n", "orthonormal basis $\\boldsymbol{U}$, it then shrinks the coordinates by\n", @@ -355,8 +423,10 @@ }, { "cell_type": "markdown", - "id": "0b9fae34", - "metadata": {}, + "id": "b0ebc908", + "metadata": { + "editable": true + }, "source": [ "## More interpretations\n", "\n", @@ -365,8 +435,10 @@ }, { "cell_type": "markdown", - "id": "72dbba94", - "metadata": {}, + "id": "803e29ff", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{X}^T\\boldsymbol{X}=(\\boldsymbol{X}^T\\boldsymbol{X})^{-1} =\\boldsymbol{I}.\n", @@ -375,16 +447,20 @@ }, { "cell_type": "markdown", - "id": "ac72dbfb", - "metadata": {}, + "id": "e67d95f1", + "metadata": { + "editable": true + }, "source": [ "In this case the standard OLS results in" ] }, { "cell_type": "markdown", - "id": "d13f7879", - "metadata": {}, + "id": "477f02f4", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{\\beta}^{\\mathrm{OLS}} = \\boldsymbol{X}^T\\boldsymbol{y}=\\sum_{i=0}^{p-1}\\boldsymbol{u}_j\\boldsymbol{u}_j^T\\boldsymbol{y},\n", @@ -393,16 +469,20 @@ }, { "cell_type": "markdown", - "id": "53f823d6", - "metadata": {}, + "id": "d2c53221", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "f6500c4a", - "metadata": {}, + "id": "7514e7fa", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{\\beta}^{\\mathrm{Ridge}} = \\left(\\boldsymbol{I}+\\lambda\\boldsymbol{I}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}=\\left(1+\\lambda\\right)^{-1}\\boldsymbol{\\beta}^{\\mathrm{OLS}},\n", @@ -411,8 +491,10 @@ }, { "cell_type": "markdown", - "id": "5f4aa2e3", - "metadata": {}, + "id": "9f1253ba", + "metadata": { + "editable": true + }, "source": [ "that is the Ridge estimator scales the OLS estimator by the inverse of a factor $1+\\lambda$, and\n", "the Ridge estimator converges to zero when the hyperparameter goes to\n", @@ -426,8 +508,10 @@ }, { "cell_type": "markdown", - "id": "0f0562cd", - "metadata": {}, + "id": "be0831cd", + "metadata": { + "editable": true + }, "source": [ "## Deriving the Lasso Regression Equations\n", "\n", @@ -436,8 +520,10 @@ }, { "cell_type": "markdown", - "id": "5184af64", - "metadata": {}, + "id": "62a26e89", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{X},\\boldsymbol{\\beta})=\\left\\{(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})^T(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\right\\}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n", @@ -446,16 +532,20 @@ }, { "cell_type": "markdown", - "id": "82b509c8", - "metadata": {}, + "id": "e3ae918e", + "metadata": { + "editable": true + }, "source": [ "Taking the derivative with respect to $\\boldsymbol{\\beta}$ and recalling that the derivative of the absolute value is (we drop the boldfaced vector symbol for simplicty)" ] }, { "cell_type": "markdown", - "id": "cb53949c", - "metadata": {}, + "id": "9ac2c6ad", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{d \\vert \\beta\\vert}{d \\boldsymbol{\\beta}}=\\mathrm{sgn}(\\boldsymbol{\\beta})=\\left\\{\\begin{array}{cc} 1 & \\beta > 0 \\\\-1 & \\beta < 0, \\end{array}\\right.\n", @@ -464,16 +554,20 @@ }, { "cell_type": "markdown", - "id": "98dffbc2", - "metadata": {}, + "id": "5c04e729", + "metadata": { + "editable": true + }, "source": [ "we have that the derivative of the cost function is" ] }, { "cell_type": "markdown", - "id": "f6b8e1f1", - "metadata": {}, + "id": "7f0d97a0", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C(\\boldsymbol{X},\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}}=-2\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})+\\lambda sgn(\\boldsymbol{\\beta})=0,\n", @@ -482,16 +576,20 @@ }, { "cell_type": "markdown", - "id": "3340097f", - "metadata": {}, + "id": "30af9fc7", + "metadata": { + "editable": true + }, "source": [ "and reordering we have" ] }, { "cell_type": "markdown", - "id": "75b76a69", - "metadata": {}, + "id": "c03f3828", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{X}^T\\boldsymbol{X}\\boldsymbol{\\beta}+\\lambda sgn(\\boldsymbol{\\beta})=2\\boldsymbol{X}^T\\boldsymbol{y}.\n", @@ -500,16 +598,20 @@ }, { "cell_type": "markdown", - "id": "99446adc", - "metadata": {}, + "id": "0ea59c2e", + "metadata": { + "editable": true + }, "source": [ "This equation does not lead to a nice analytical equation as in Ridge regression or ordinary least squares. This equation can however be solved by using standard convex optimization algorithms using for example the Python package [CVXOPT](https://cvxopt.org/). We will discuss this later." ] }, { "cell_type": "markdown", - "id": "ed2fbc69", - "metadata": {}, + "id": "c5f70b50", + "metadata": { + "editable": true + }, "source": [ "## Simple example to illustrate Ordinary Least Squares, Ridge and Lasso Regression\n", "\n", @@ -521,8 +623,10 @@ }, { "cell_type": "markdown", - "id": "f59ad285", - "metadata": {}, + "id": "98168425", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=\\sum_{i=0}^{p-1}(y_i-\\beta_i)^2,\n", @@ -531,16 +635,20 @@ }, { "cell_type": "markdown", - "id": "34c46d15", - "metadata": {}, + "id": "132b37ba", + "metadata": { + "editable": true + }, "source": [ "and minimizing we have that" ] }, { "cell_type": "markdown", - "id": "b58a39d0", - "metadata": {}, + "id": "9f5aeaa7", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\beta}_i^{\\mathrm{OLS}} = y_i.\n", @@ -549,8 +657,10 @@ }, { "cell_type": "markdown", - "id": "afbd5077", - "metadata": {}, + "id": "586753f9", + "metadata": { + "editable": true + }, "source": [ "## Ridge Regression\n", "\n", @@ -559,8 +669,10 @@ }, { "cell_type": "markdown", - "id": "22412c03", - "metadata": {}, + "id": "7a9a7e0f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=\\sum_{i=0}^{p-1}(y_i-\\beta_i)^2+\\lambda\\sum_{i=0}^{p-1}\\beta_i^2,\n", @@ -569,16 +681,20 @@ }, { "cell_type": "markdown", - "id": "8f6f22f0", - "metadata": {}, + "id": "c9937d2c", + "metadata": { + "editable": true + }, "source": [ "and minimizing we have that" ] }, { "cell_type": "markdown", - "id": "2a762fc5", - "metadata": {}, + "id": "7cc6e211", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\beta}_i^{\\mathrm{Ridge}} = \\frac{y_i}{1+\\lambda}.\n", @@ -587,8 +703,10 @@ }, { "cell_type": "markdown", - "id": "f745bcfc", - "metadata": {}, + "id": "abcbbea3", + "metadata": { + "editable": true + }, "source": [ "## Lasso Regression\n", "\n", @@ -597,8 +715,10 @@ }, { "cell_type": "markdown", - "id": "d19b1a18", - "metadata": {}, + "id": "d03cbdc2", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=\\sum_{i=0}^{p-1}(y_i-\\beta_i)^2+\\lambda\\sum_{i=0}^{p-1}\\vert\\beta_i\\vert=\\sum_{i=0}^{p-1}(y_i-\\beta_i)^2+\\lambda\\sum_{i=0}^{p-1}\\sqrt{\\beta_i^2},\n", @@ -607,16 +727,20 @@ }, { "cell_type": "markdown", - "id": "83d1effc", - "metadata": {}, + "id": "8bca4327", + "metadata": { + "editable": true + }, "source": [ "and minimizing we have that" ] }, { "cell_type": "markdown", - "id": "3d4b494b", - "metadata": {}, + "id": "0da2fc2e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "-2\\sum_{i=0}^{p-1}(y_i-\\beta_i)+\\lambda \\sum_{i=0}^{p-1}\\frac{(\\beta_i)}{\\vert\\beta_i\\vert}=0,\n", @@ -625,16 +749,20 @@ }, { "cell_type": "markdown", - "id": "f256fb7a", - "metadata": {}, + "id": "da0627ac", + "metadata": { + "editable": true + }, "source": [ "which leads to" ] }, { "cell_type": "markdown", - "id": "e4385b4b", - "metadata": {}, + "id": "be38f7a8", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}_i^{\\mathrm{Lasso}} = \\left\\{\\begin{array}{ccc}y_i-\\frac{\\lambda}{2} &\\mathrm{if} & y_i> \\frac{\\lambda}{2}\\\\\n", @@ -645,16 +773,20 @@ }, { "cell_type": "markdown", - "id": "bc046af0", - "metadata": {}, + "id": "51a6bfab", + "metadata": { + "editable": true + }, "source": [ "Plotting these results ([figure in handwritten notes for week 36](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2021/NotesSeptember9.pdf)) shows clearly that Lasso regression suppresses (sets to zero) values of $\\beta_i$ for specific values of $\\lambda$. Ridge regression reduces on the other hand the values of $\\beta_i$ as function of $\\lambda$." ] }, { "cell_type": "markdown", - "id": "ef728521", - "metadata": {}, + "id": "3f05154c", + "metadata": { + "editable": true + }, "source": [ "## Yet another Example\n", "\n", @@ -663,8 +795,10 @@ }, { "cell_type": "markdown", - "id": "511c35c9", - "metadata": {}, + "id": "a3aa94ba", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{y}=\\begin{bmatrix}4 \\\\ 2 \\\\3\\end{bmatrix},\n", @@ -673,16 +807,20 @@ }, { "cell_type": "markdown", - "id": "a4a4d4cd", - "metadata": {}, + "id": "5e96cd27", + "metadata": { + "editable": true + }, "source": [ "and our inputs as a $3\\times 2$ design matrix" ] }, { "cell_type": "markdown", - "id": "c07e4afb", - "metadata": {}, + "id": "6b6056d1", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{X}=\\begin{bmatrix}2 & 0\\\\ 0 & 1 \\\\ 0 & 0\\end{bmatrix},\n", @@ -691,16 +829,20 @@ }, { "cell_type": "markdown", - "id": "58b41942", - "metadata": {}, + "id": "da66af0a", + "metadata": { + "editable": true + }, "source": [ "meaning that we have two features and two unknown parameters $\\beta_0$ and $\\beta_1$ to be determined either by ordinary least squares, Ridge or Lasso regression." ] }, { "cell_type": "markdown", - "id": "377f8025", - "metadata": {}, + "id": "991f82d7", + "metadata": { + "editable": true + }, "source": [ "## The OLS case\n", "\n", @@ -709,8 +851,10 @@ }, { "cell_type": "markdown", - "id": "d7447f1b", - "metadata": {}, + "id": "99f0d5f1", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}=\\left( \\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}.\n", @@ -719,16 +863,20 @@ }, { "cell_type": "markdown", - "id": "8f4785f7", - "metadata": {}, + "id": "3ace3ef6", + "metadata": { + "editable": true + }, "source": [ "Inserting the above values we obtain that" ] }, { "cell_type": "markdown", - "id": "99c7045b", - "metadata": {}, + "id": "a19a7f99", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}=\\begin{bmatrix}2 \\\\ 2\\end{bmatrix},\n", @@ -737,16 +885,20 @@ }, { "cell_type": "markdown", - "id": "059838a5", - "metadata": {}, + "id": "ff0e54e2", + "metadata": { + "editable": true + }, "source": [ "The code which implements this simpler case is presented after the discussion of Ridge and Lasso." ] }, { "cell_type": "markdown", - "id": "3e012182", - "metadata": {}, + "id": "431feca6", + "metadata": { + "editable": true + }, "source": [ "## The Ridge case\n", "\n", @@ -755,8 +907,10 @@ }, { "cell_type": "markdown", - "id": "b10085df", - "metadata": {}, + "id": "e912f4d2", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}=\\left( \\boldsymbol{X}^T\\boldsymbol{X}+\\lambda\\boldsymbol{I}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}.\n", @@ -765,16 +919,20 @@ }, { "cell_type": "markdown", - "id": "932e05a5", - "metadata": {}, + "id": "2731c827", + "metadata": { + "editable": true + }, "source": [ "Inserting the above values we obtain that" ] }, { "cell_type": "markdown", - "id": "0eec6072", - "metadata": {}, + "id": "8b6e6b07", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}=\\begin{bmatrix}\\frac{8}{4+\\lambda} \\\\ \\frac{2}{1+\\lambda}\\end{bmatrix},\n", @@ -783,8 +941,10 @@ }, { "cell_type": "markdown", - "id": "b2e38999", - "metadata": {}, + "id": "ea774406", + "metadata": { + "editable": true + }, "source": [ "There is normally a constraint on the value of $\\vert\\vert \\boldsymbol{\\beta}\\vert\\vert_2$ via the parameter $\\lambda$.\n", "Let us for simplicity assume that $\\beta_0^2+\\beta_1^2=1$ as constraint. This will allow us to find an expression for the optimal values of $\\beta$ and $\\lambda$.\n", @@ -794,8 +954,10 @@ }, { "cell_type": "markdown", - "id": "31fe413e", - "metadata": {}, + "id": "7554f52c", + "metadata": { + "editable": true + }, "source": [ "## Writing the Cost Function\n", "\n", @@ -804,8 +966,10 @@ }, { "cell_type": "markdown", - "id": "35b1242e", - "metadata": {}, + "id": "d2a7275f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{X}\\boldsymbol{\\beta}=\\begin{bmatrix} 2\\beta_0 \\\\ \\beta_1 \\\\0 \\end{bmatrix},\n", @@ -814,8 +978,10 @@ }, { "cell_type": "markdown", - "id": "77dbd497", - "metadata": {}, + "id": "e8a2669d", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=(4-2\\beta_0)^2+(2-\\beta_1)^2+\\lambda(\\beta_0^2+\\beta_1^2),\n", @@ -824,16 +990,20 @@ }, { "cell_type": "markdown", - "id": "53267765", - "metadata": {}, + "id": "2d892e13", + "metadata": { + "editable": true + }, "source": [ "and taking the derivative with respect to $\\beta_0$ we get" ] }, { "cell_type": "markdown", - "id": "6a498c50", - "metadata": {}, + "id": "450f74e0", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\beta_0=\\frac{8}{4+\\lambda},\n", @@ -842,16 +1012,20 @@ }, { "cell_type": "markdown", - "id": "80f46853", - "metadata": {}, + "id": "dae76b52", + "metadata": { + "editable": true + }, "source": [ "and for $\\beta_1$ we obtain" ] }, { "cell_type": "markdown", - "id": "0c51bb9b", - "metadata": {}, + "id": "3c4d3b7c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\beta_1=\\frac{2}{1+\\lambda},\n", @@ -860,16 +1034,20 @@ }, { "cell_type": "markdown", - "id": "f78ca610", - "metadata": {}, + "id": "d7a158ed", + "metadata": { + "editable": true + }, "source": [ "Using the constraint for $\\beta_0^2+\\beta_1^2=1$ we can constrain $\\lambda$ by solving" ] }, { "cell_type": "markdown", - "id": "ad42b2e8", - "metadata": {}, + "id": "91ef4b72", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\left(\\frac{8}{4+\\lambda}\\right)^2+\\left(\\frac{2}{1+\\lambda}\\right)^2=1,\n", @@ -878,16 +1056,20 @@ }, { "cell_type": "markdown", - "id": "630f551a", - "metadata": {}, + "id": "a5a766b2", + "metadata": { + "editable": true + }, "source": [ "which gives $\\lambda=4.571$ and $\\beta_0=0.933$ and $\\beta_1=0.359$." ] }, { "cell_type": "markdown", - "id": "625f9a79", - "metadata": {}, + "id": "ca4c8552", + "metadata": { + "editable": true + }, "source": [ "## Lasso case\n", "\n", @@ -897,8 +1079,10 @@ }, { "cell_type": "markdown", - "id": "4d3a230c", - "metadata": {}, + "id": "49650ce3", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=(4-2\\beta_0)^2+(2-\\beta_1)^2+\\lambda(\\vert\\beta_0\\vert+\\vert\\beta_1\\vert),\n", @@ -907,8 +1091,10 @@ }, { "cell_type": "markdown", - "id": "28f81c66", - "metadata": {}, + "id": "a3f92b3e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C(\\boldsymbol{\\beta})}{\\partial \\beta_0}=-4(4-2\\beta_0)+\\lambda\\mathrm{sgn}(\\beta_0)=0,\n", @@ -917,16 +1103,20 @@ }, { "cell_type": "markdown", - "id": "240943b2", - "metadata": {}, + "id": "1036291d", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "69a63eb2", - "metadata": {}, + "id": "75f30dff", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\frac{\\partial C(\\boldsymbol{\\beta})}{\\partial \\beta_1}=-2(2-\\beta_1)+\\lambda\\mathrm{sgn}(\\beta_1)=0.\n", @@ -935,8 +1125,10 @@ }, { "cell_type": "markdown", - "id": "27d7b28a", - "metadata": {}, + "id": "1e1e7506", + "metadata": { + "editable": true + }, "source": [ "We have now four cases to solve besides the trivial cases $\\beta_0$ and/or $\\beta_1$ are zero, namely\n", "1. $\\beta_0 > 0$ and $\\beta_1 > 0$,\n", @@ -950,8 +1142,10 @@ }, { "cell_type": "markdown", - "id": "d1419711", - "metadata": {}, + "id": "38a11a8f", + "metadata": { + "editable": true + }, "source": [ "## The first Case\n", "\n", @@ -960,8 +1154,10 @@ }, { "cell_type": "markdown", - "id": "56fdbb72", - "metadata": {}, + "id": "8346dd0f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "-4(4-2\\beta_0)+\\lambda=0,\n", @@ -970,16 +1166,20 @@ }, { "cell_type": "markdown", - "id": "de38ca69", - "metadata": {}, + "id": "068a7b14", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "94f54723", - "metadata": {}, + "id": "d74af879", + "metadata": { + "editable": true + }, "source": [ "$$\n", "-2(2-\\beta_1)+\\lambda=0.\n", @@ -988,16 +1188,20 @@ }, { "cell_type": "markdown", - "id": "faffc4f4", - "metadata": {}, + "id": "75eb5818", + "metadata": { + "editable": true + }, "source": [ "which yields" ] }, { "cell_type": "markdown", - "id": "d54aed47", - "metadata": {}, + "id": "82a6ca01", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\beta_0=\\frac{16+\\lambda}{8},\n", @@ -1006,16 +1210,20 @@ }, { "cell_type": "markdown", - "id": "717b2d10", - "metadata": {}, + "id": "3df03602", + "metadata": { + "editable": true + }, "source": [ "and" ] }, { "cell_type": "markdown", - "id": "d0e516c1", - "metadata": {}, + "id": "4dc31587", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\beta_1=\\frac{4+\\lambda}{2}.\n", @@ -1024,16 +1232,20 @@ }, { "cell_type": "markdown", - "id": "45cd8d70", - "metadata": {}, + "id": "39d594b7", + "metadata": { + "editable": true + }, "source": [ "Using the constraint on $\\beta_0$ and $\\beta_1$ we can then find the optimal value of $\\lambda$ for the different cases. We leave this as an exercise to you." ] }, { "cell_type": "markdown", - "id": "4c701337", - "metadata": {}, + "id": "da4b6d3e", + "metadata": { + "editable": true + }, "source": [ "## Simple code for solving the above problem\n", "\n", @@ -1044,30 +1256,13 @@ }, { "cell_type": "code", - "execution_count": 5, - "id": "1086f536", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[2. 2.]\n", - "Training MSE for OLS\n", - "3.0\n" - ] - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "execution_count": 1, + "id": "2e3cf910", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], "source": [ "%matplotlib inline\n", "\n", @@ -1104,7 +1299,7 @@ "# Decide which values of lambda to use\n", "nlambdas = 100\n", "MSEPredict = np.zeros(nlambdas)\n", - "lambdas = np.logspace(-4, 6, nlambdas)\n", + "lambdas = np.logspace(-4, 4, nlambdas)\n", "for i in range(nlambdas):\n", " lmb = lambdas[i]\n", " Ridgebeta = np.linalg.inv(X.T @ X+lmb*I) @ X.T @ y\n", @@ -1124,246 +1319,33 @@ }, { "cell_type": "markdown", - "id": "d72ebdaa", - "metadata": {}, + "id": "03cc93ef", + "metadata": { + "editable": true + }, "source": [ "We see here that we reach a plateau. What is actually happening?" ] }, { "cell_type": "markdown", - "id": "59d5a66e", - "metadata": {}, + "id": "49b48768", + "metadata": { + "editable": true + }, "source": [ "## With Lasso Regression" ] }, { "cell_type": "code", - "execution_count": 3, - "id": "0477b499", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[2. 2.]\n", - "Training MSE for OLS\n", - "3.0\n", - "[1.99995 1.99980002]\n", - "[1.999925 1.9997 ]\n", - "[1.99993978 1.99975913]\n", - "[1.99990966 1.99963865]\n", - "[1.99992746 1.99970988]\n", - "[1.99989119 1.99956475]\n", - "[1.99991263 1.99965056]\n", - "[1.99986894 1.99947574]\n", - "[1.99989476 1.99957911]\n", - "[1.99984213 1.99936853]\n", - "[1.99987324 1.99949306]\n", - "[1.99980985 1.99923939]\n", - "[1.99984732 1.99938942]\n", - "[1.99977096 1.99908384]\n", - "[1.9998161 1.99926459]\n", - "[1.99972412 1.99889649]\n", - "[1.99977849 1.99911427]\n", - "[1.9996677 1.99867081]\n", - "[1.9997332 1.99893323]\n", - "[1.99959975 1.99839899]\n", - "[1.99967865 1.99871521]\n", - "[1.99951789 1.99807158]\n", - "[1.99961294 1.99845267]\n", - "[1.9994193 1.99767721]\n", - "[1.99953381 1.99813653]\n", - "[1.99930055 1.99720219]\n", - "[1.9994385 1.99775587]\n", - "[1.99915751 1.99663003]\n", - "[1.9993237 1.99729756]\n", - "[1.99898521 1.99594086]\n", - "[1.99918546 1.9967458 ]\n", - "[1.99877769 1.99511075]\n", - "[1.99901896 1.99608161]\n", - "[1.99852772 1.99411088]\n", - "[1.99881845 1.99528218]\n", - "[1.99822663 1.99290653]\n", - "[1.998577 1.9943201]\n", - "[1.99786397 1.99145589]\n", - "[1.99828624 1.99316252]\n", - "[1.99742715 1.98970859]\n", - "[1.99793613 1.99176998]\n", - "[1.99690099 1.98760396]\n", - "[1.99751458 1.99009525]\n", - "[1.99626723 1.98506893]\n", - "[1.99700706 1.98808176]\n", - "[1.99550387 1.98201547]\n", - "[1.9963961 1.98566191]\n", - "[1.99458439 1.97833757]\n", - "[1.99566069 1.98275501]\n", - "[1.99347688 1.97390753]\n", - "[1.9947756 1.97926491]\n", - "[1.99214288 1.96857153]\n", - "[1.99371056 1.97507735]\n", - "[1.99053607 1.96214429]\n", - "[1.99242921 1.97005689]\n", - "[1.98860067 1.95440267]\n", - "[1.99088801 1.9640435 ]\n", - "[1.98626946 1.94507785]\n", - "[1.9890348 1.95684892]\n", - "[1.98346152 1.93384608]\n", - "[1.98680716 1.9482527 ]\n", - "[1.98007934 1.92031737]\n", - "[1.98413059 1.93799826]\n", - "[1.9760055 1.90402199]\n", - "[1.98091621 1.92578916]\n", - "[1.97109854 1.88439414]\n", - "[1.97705827 1.91128596]\n", - "[1.96518808 1.86075233]\n", - "[1.97243128 1.89410423]\n", - "[1.95806892 1.83227569]\n", - "[1.96688672 1.87381451]\n", - "[1.94949387 1.79797548]\n", - "[1.96024953 1.84994524]\n", - "[1.93916519 1.75666075]\n", - "[1.95231424 1.82198978]\n", - "[1.92672425 1.70689701]\n", - "[1.94284104 1.78941903]\n", - "[1.9117391 1.64695641]\n", - "[1.93155188 1.75170092]\n", - "[1.89368944 1.57475775]\n", - "[1.91812702 1.70832814]\n", - "[1.87194855 1.48779421]\n", - "[1.90220243 1.65885453]\n", - "[1.84576158 1.38304631]\n", - "[1.88336879 1.60293962]\n", - "[1.81421927 1.25687709]\n", - "[1.86117291 1.54039921]\n", - "[1.77622646 1.10490583]\n", - "[1.83512277 1.47125748]\n", - "[1.73046398 0.9218559 ]\n", - "[1.80469739 1.39579407]\n", - "[1.6753429 0.70137162]\n", - "[1.76936315 1.31457796]\n", - "[1.60894938 0.43579751]\n", - "[1.72859758 1.22847924]\n", - "[1.52897814 0.11591257]\n", - "[1.68192193 1.13865173]\n", - "[1.4326525 0. ]\n", - "[1.62894215 1.04648335]\n", - "[1.31662793 0. ]\n", - "[1.56939714 0.95351665]\n", - "[1.17687593 0. ]\n", - "[1.50321091 0.86134827]\n", - "[1.00854414 0. ]\n", - "[1.43054282 0.77152076]\n", - "[0.8057879 0. ]\n", - "[1.35182854 0.68542204]\n", - "[0.5615673 0. ]\n", - "[1.26780278 0.60420593]\n", - "[0.26740272 0. ]\n", - "[1.17949575 0.52874252]\n", - "[0. 0.]\n", - "[1.0881981 0.45960079]\n", - "[0. 0.]\n", - "[0.99539415 0.39706038]\n", - "[0. 0.]\n", - "[0.90266948 0.34114547]\n", - "[0. 0.]\n", - "[0.81160425 0.29167186]\n", - "[0. 0.]\n", - "[0.7236674 0.24829908]\n", - "[0. 0.]\n", - "[0.64012627 0.21058097]\n", - "[0. 0.]\n", - "[0.56198284 0.17801022]\n", - "[0. 0.]\n", - "[0.48994188 0.15005476]\n", - "[0. 0.]\n", - "[0.42441033 0.12618549]\n", - "[0. 0.]\n", - "[0.3655222 0.10589577]\n", - "[0. 0.]\n", - "[0.31318084 0.08871404]\n", - "[0. 0.]\n", - "[0.26710969 0.07421084]\n", - "[0. 0.]\n", - "[0.22690428 0.06200174]\n", - "[0. 0.]\n", - "[0.19207979 0.0517473 ]\n", - "[0. 0.]\n", - "[0.16211139 0.04315108]\n", - "[0. 0.]\n", - "[0.13646574 0.0359565 ]\n", - "[0. 0.]\n", - "[0.11462415 0.02994311]\n", - "[0. 0.]\n", - "[0.09609807 0.02492265]\n", - "[0. 0.]\n", - "[0.08043851 0.02073509]\n", - "[0. 0.]\n", - "[0.06724062 0.01724499]\n", - "[0. 0.]\n", - "[0.05614483 0.01433809]\n", - "[0. 0.]\n", - "[0.04683565 0.01191824]\n", - "[0. 0.]\n", - "[0.039039 0.00990475]\n", - "[0. 0.]\n", - "[0.03251863 0.00823002]\n", - "[0. 0.]\n", - "[0.02707227 0.00683748]\n", - "[0. 0.]\n", - "[0.02252765 0.0056799 ]\n", - "[0. 0.]\n", - "[0.01873869 0.00471782]\n", - "[0. 0.]\n", - "[0.01558197 0.00391839]\n", - "[0. 0.]\n", - "[0.01295356 0.0032542 ]\n", - "[0. 0.]\n", - "[0.01076611 0.00270244]\n", - "[0. 0.]\n", - "[0.00894639 0.00224413]\n", - "[0. 0.]\n", - "[0.0074331 0.00186347]\n", - "[0. 0.]\n", - "[0.00617499 0.00154733]\n", - "[0. 0.]\n", - "[0.00512927 0.00128479]\n", - "[0. 0.]\n", - "[0.00426027 0.00106677]\n", - "[0. 0.]\n", - "[0.00353823 0.00088573]\n", - "[0. 0.]\n", - "[0.00293838 0.00073541]\n", - "[0. 0.]\n", - "[0.0024401 0.00061058]\n", - "[0. 0.]\n", - "[0.00202624 0.00050694]\n", - "[0. 0.]\n", - "[0.00168251 0.00042089]\n", - "[0. 0.]\n", - "[0.00139705 0.00034944]\n", - "[0. 0.]\n", - "[0.00115999 0.00029012]\n", - "[0. 0.]\n", - "[0.00096314 0.00024087]\n", - "[0. 0.]\n", - "[0.00079968 0.00019998]\n", - "[0. 0.]\n" - ] - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "execution_count": 2, + "id": "8401e900", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], "source": [ "import os\n", "import numpy as np\n", @@ -1408,7 +1390,7 @@ " # and then make the prediction\n", " ypredictRidge = X @ Ridgebeta\n", " MSERidgePredict[i] = MSE(y,ypredictRidge)\n", - " RegLasso = linear_model.Lasso(lmb,fit_intercept=False)\n", + " RegLasso = linear_model.Lasso(lmb)\n", " RegLasso.fit(X,y)\n", " ypredictLasso = RegLasso.predict(X)\n", " print(RegLasso.coef_)\n", @@ -1425,40 +1407,23 @@ }, { "cell_type": "markdown", - "id": "4cb5ab6a", - "metadata": {}, + "id": "32a34305", + "metadata": { + "editable": true + }, "source": [ "## Another Example, now with a polynomial fit" ] }, { "cell_type": "code", - "execution_count": 8, - "id": "53eb2ec4", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[2.0000000e+00 7.1942452e-14 5.0000000e+00]\n", - "Training MSE for OLS\n", - "9.647251180883923e-28\n", - "Test MSE OLS\n", - "1.2167390602128887e-27\n" - ] - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
    " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "execution_count": 3, + "id": "de2c4052", + "metadata": { + "collapsed": false, + "editable": true + }, + "outputs": [], "source": [ "import os\n", "import numpy as np\n", @@ -1479,7 +1444,7 @@ "np.random.seed(3155)\n", "\n", "x = np.random.rand(100)\n", - "y = 2.0+5*x*x#+0.1*np.random.randn(100)\n", + "y = 2.0+5*x*x+0.1*np.random.randn(100)\n", "\n", "# number of features p (here degree of polynomial\n", "p = 3\n", @@ -1510,12 +1475,12 @@ "MSETrain = np.zeros(nlambdas)\n", "MSELassoPredict = np.zeros(nlambdas)\n", "MSELassoTrain = np.zeros(nlambdas)\n", - "lambdas = np.logspace(-4, 0, nlambdas)\n", + "lambdas = np.logspace(-4, 4, nlambdas)\n", "for i in range(nlambdas):\n", " lmb = lambdas[i]\n", " Ridgebeta = np.linalg.inv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n", " # include lasso using Scikit-Learn\n", - " RegLasso = linear_model.Lasso(lmb,fit_intercept=False)\n", + " RegLasso = linear_model.Lasso(lmb)\n", " RegLasso.fit(X_train,y_train)\n", " # and then make the prediction\n", " ytildeRidge = X_train @ Ridgebeta\n", @@ -1542,8 +1507,10 @@ }, { "cell_type": "markdown", - "id": "b9ef9e8f", - "metadata": {}, + "id": "9cce830e", + "metadata": { + "editable": true + }, "source": [ "## Linking the regression analysis with a statistical interpretation\n", "\n", @@ -1569,8 +1536,10 @@ }, { "cell_type": "markdown", - "id": "ecc76b71", - "metadata": {}, + "id": "5161a229", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{align*} \n", @@ -1583,8 +1552,10 @@ }, { "cell_type": "markdown", - "id": "55711f96", - "metadata": {}, + "id": "7bba88f0", + "metadata": { + "editable": true + }, "source": [ "The randomness of $\\varepsilon_i$ implies that\n", "$\\mathbf{y}_i$ is also a random variable. In particular,\n", @@ -1600,8 +1571,10 @@ }, { "cell_type": "markdown", - "id": "c308d8ab", - "metadata": {}, + "id": "2a2a2293", + "metadata": { + "editable": true + }, "source": [ "## Assumptions made\n", "\n", @@ -1612,8 +1585,10 @@ }, { "cell_type": "markdown", - "id": "8947c746", - "metadata": {}, + "id": "16cbf75c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", @@ -1622,8 +1597,10 @@ }, { "cell_type": "markdown", - "id": "f8b2eeba", - "metadata": {}, + "id": "175c5b85", + "metadata": { + "editable": true + }, "source": [ "We approximate this function with our model from the solution of the linear regression equations, that is our\n", "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with" @@ -1631,8 +1608,10 @@ }, { "cell_type": "markdown", - "id": "ef7bd7b3", - "metadata": {}, + "id": "c6f44f70", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\beta}.\n", @@ -1641,8 +1620,10 @@ }, { "cell_type": "markdown", - "id": "45571ed5", - "metadata": {}, + "id": "7b55c677", + "metadata": { + "editable": true + }, "source": [ "## Expectation value and variance\n", "\n", @@ -1651,8 +1632,10 @@ }, { "cell_type": "markdown", - "id": "50d372c4", - "metadata": {}, + "id": "e6fc4a97", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{align*} \n", @@ -1665,8 +1648,10 @@ }, { "cell_type": "markdown", - "id": "1e17772d", - "metadata": {}, + "id": "829c23cf", + "metadata": { + "editable": true + }, "source": [ "while\n", "its variance is" @@ -1674,8 +1659,10 @@ }, { "cell_type": "markdown", - "id": "54d7619a", - "metadata": {}, + "id": "b76ecfb6", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n", @@ -1695,8 +1682,10 @@ }, { "cell_type": "markdown", - "id": "510d4042", - "metadata": {}, + "id": "7d7a418d", + "metadata": { + "editable": true + }, "source": [ "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n", "mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)." @@ -1704,8 +1693,10 @@ }, { "cell_type": "markdown", - "id": "c1b2cec7", - "metadata": {}, + "id": "c773b5a1", + "metadata": { + "editable": true + }, "source": [ "## Expectation value and variance for $\\boldsymbol{\\beta}$\n", "\n", @@ -1714,8 +1705,10 @@ }, { "cell_type": "markdown", - "id": "5229aab4", - "metadata": {}, + "id": "83da9dfe", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E}(\\boldsymbol{\\beta}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\beta}=\\boldsymbol{\\beta}.\n", @@ -1724,8 +1717,10 @@ }, { "cell_type": "markdown", - "id": "d64c421e", - "metadata": {}, + "id": "c7133c36", + "metadata": { + "editable": true + }, "source": [ "This means that the estimator of the regression parameters is unbiased.\n", "\n", @@ -1736,8 +1731,10 @@ }, { "cell_type": "markdown", - "id": "fb80e64f", - "metadata": {}, + "id": "5b471cd7", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{eqnarray*}\n", @@ -1765,8 +1762,10 @@ }, { "cell_type": "markdown", - "id": "3883988a", - "metadata": {}, + "id": "b1c59e2f", + "metadata": { + "editable": true + }, "source": [ "where we have used that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n", "\\mathbf{X} \\, \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T} \\, \\mathbf{X}^{T} +\n", @@ -1785,8 +1784,10 @@ }, { "cell_type": "markdown", - "id": "d3b9fe91", - "metadata": {}, + "id": "0bc822be", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E} \\big[ \\boldsymbol{\\beta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}^{\\mathrm{OLS}}.\n", @@ -1795,8 +1796,10 @@ }, { "cell_type": "markdown", - "id": "1cc565ef", - "metadata": {}, + "id": "78f3bc26", + "metadata": { + "editable": true + }, "source": [ "We see clearly that \n", "$\\mathbb{E} \\big[ \\boldsymbol{\\beta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\beta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n", @@ -1806,8 +1809,10 @@ }, { "cell_type": "markdown", - "id": "d45602c7", - "metadata": {}, + "id": "d0cf1c6e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mbox{Var}[\\boldsymbol{\\beta}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T} \\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n", @@ -1816,8 +1821,10 @@ }, { "cell_type": "markdown", - "id": "18b63cb2", - "metadata": {}, + "id": "8e6e0610", + "metadata": { + "editable": true + }, "source": [ "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\beta}$ goes to zero. \n", "\n", @@ -1826,8 +1833,10 @@ }, { "cell_type": "markdown", - "id": "a5fcd345", - "metadata": {}, + "id": "ba0e0b34", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mbox{Var}[\\boldsymbol{\\beta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\beta}^{\\mathrm{Ridge}})=\\sigma^2 [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n", @@ -1836,8 +1845,10 @@ }, { "cell_type": "markdown", - "id": "1820cc18", - "metadata": {}, + "id": "55705cba", + "metadata": { + "editable": true + }, "source": [ "The difference is non-negative definite since each component of the\n", "matrix product is non-negative definite. \n", @@ -1846,8 +1857,10 @@ }, { "cell_type": "markdown", - "id": "b9568e86", - "metadata": {}, + "id": "abf4bfdf", + "metadata": { + "editable": true + }, "source": [ "## Deriving OLS from a probability distribution\n", "\n", @@ -1867,8 +1880,10 @@ }, { "cell_type": "markdown", - "id": "5b51dd32", - "metadata": {}, + "id": "e77e798c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n", @@ -1877,8 +1892,10 @@ }, { "cell_type": "markdown", - "id": "446c1afd", - "metadata": {}, + "id": "26d4690d", + "metadata": { + "editable": true + }, "source": [ "## Independent and Identically Distrubuted (iid)\n", "\n", @@ -1888,8 +1905,10 @@ }, { "cell_type": "markdown", - "id": "8965e119", - "metadata": {}, + "id": "22462772", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]},\n", @@ -1898,8 +1917,10 @@ }, { "cell_type": "markdown", - "id": "c49af092", - "metadata": {}, + "id": "905d65a6", + "metadata": { + "editable": true + }, "source": [ "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\beta}$.\n", "\n", @@ -1908,8 +1929,10 @@ }, { "cell_type": "markdown", - "id": "ccd43fe6", - "metadata": {}, + "id": "4cf9d56d", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta}).\n", @@ -1918,8 +1941,10 @@ }, { "cell_type": "markdown", - "id": "047a9630", - "metadata": {}, + "id": "c100c74c", + "metadata": { + "editable": true + }, "source": [ "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n", "in case we have a simple one-dimensional input and output case" @@ -1927,8 +1952,10 @@ }, { "cell_type": "markdown", - "id": "a5e47ffd", - "metadata": {}, + "id": "36f8ccf2", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n", @@ -1937,8 +1964,10 @@ }, { "cell_type": "markdown", - "id": "aea090b7", - "metadata": {}, + "id": "60b59fd2", + "metadata": { + "editable": true + }, "source": [ "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n", "We can now rewrite the above probability as" @@ -1946,8 +1975,10 @@ }, { "cell_type": "markdown", - "id": "8b499343", - "metadata": {}, + "id": "ea63b747", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n", @@ -1956,16 +1987,20 @@ }, { "cell_type": "markdown", - "id": "91844422", - "metadata": {}, + "id": "5ca95669", + "metadata": { + "editable": true + }, "source": [ "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\beta}$." ] }, { "cell_type": "markdown", - "id": "2eab09fd", - "metadata": {}, + "id": "4712ec27", + "metadata": { + "editable": true + }, "source": [ "## Maximum Likelihood Estimation (MLE)\n", "\n", @@ -1993,8 +2028,10 @@ }, { "cell_type": "markdown", - "id": "e8d5d443", - "metadata": {}, + "id": "7d52ddc4", + "metadata": { + "editable": true + }, "source": [ "## A new Cost Function\n", "\n", @@ -2003,8 +2040,10 @@ }, { "cell_type": "markdown", - "id": "c8b99cce", - "metadata": {}, + "id": "1c32ecc2", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})},\n", @@ -2013,16 +2052,20 @@ }, { "cell_type": "markdown", - "id": "12497fc6", - "metadata": {}, + "id": "5249db96", + "metadata": { + "editable": true + }, "source": [ "which becomes" ] }, { "cell_type": "markdown", - "id": "6cf6ea28", - "metadata": {}, + "id": "0712fbb0", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}.\n", @@ -2031,16 +2074,20 @@ }, { "cell_type": "markdown", - "id": "b2cb5aa1", - "metadata": {}, + "id": "5e4c844b", + "metadata": { + "editable": true + }, "source": [ "Taking the derivative of the *new* cost function with respect to the parameters $\\beta$ we recognize our familiar OLS equation, namely" ] }, { "cell_type": "markdown", - "id": "ebf349ab", - "metadata": {}, + "id": "5b56b5bb", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right) =0,\n", @@ -2049,16 +2096,20 @@ }, { "cell_type": "markdown", - "id": "6f257fe9", - "metadata": {}, + "id": "e7c21df5", + "metadata": { + "editable": true + }, "source": [ "which leads to the well-known OLS equation for the optimal paramters $\\beta$" ] }, { "cell_type": "markdown", - "id": "145222d0", - "metadata": {}, + "id": "7f35e661", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n", @@ -2067,16 +2118,20 @@ }, { "cell_type": "markdown", - "id": "62ff20c8", - "metadata": {}, + "id": "72dc84b0", + "metadata": { + "editable": true + }, "source": [ "Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics." ] }, { "cell_type": "markdown", - "id": "d0c36c09", - "metadata": {}, + "id": "0071468f", + "metadata": { + "editable": true + }, "source": [ "## More basic Statistics and Bayes' theorem\n", "\n", @@ -2093,8 +2148,10 @@ }, { "cell_type": "markdown", - "id": "fbe75b0a", - "metadata": {}, + "id": "6f471644", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X \\cup Y)= p(X)+p(Y)-p(X \\cap Y).\n", @@ -2103,16 +2160,20 @@ }, { "cell_type": "markdown", - "id": "1631de7c", - "metadata": {}, + "id": "4a31022d", + "metadata": { + "editable": true + }, "source": [ "**The product rule (aka joint probability) is given by.**" ] }, { "cell_type": "markdown", - "id": "c94e2fdd", - "metadata": {}, + "id": "6b7d8e8f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X \\cup Y)= p(X,Y)= p(X\\vert Y)p(Y)=p(Y\\vert X)p(X),\n", @@ -2121,8 +2182,10 @@ }, { "cell_type": "markdown", - "id": "1444e890", - "metadata": {}, + "id": "9b67f32e", + "metadata": { + "editable": true + }, "source": [ "where we read $p(X\\vert Y)$ as the likelihood of obtaining $X$ given $Y$.\n", "\n", @@ -2131,8 +2194,10 @@ }, { "cell_type": "markdown", - "id": "13cccaf3", - "metadata": {}, + "id": "1deaa595", + "metadata": { + "editable": true + }, "source": [ "## Marginal Probability\n", "\n", @@ -2141,8 +2206,10 @@ }, { "cell_type": "markdown", - "id": "b0ea5ddc", - "metadata": {}, + "id": "67ae1f0a", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X)=\\sum_{i=0}^{n-1}p(X,Y=y_i)=\\sum_{i=0}^{n-1}p(X\\vert Y=y_i)p(Y=y_i)=\\sum_{i=0}^{n-1}p(X\\vert y_i)p(y_i).\n", @@ -2151,8 +2218,10 @@ }, { "cell_type": "markdown", - "id": "0d807fa1", - "metadata": {}, + "id": "645b3b06", + "metadata": { + "editable": true + }, "source": [ "## Conditional Probability\n", "\n", @@ -2161,8 +2230,10 @@ }, { "cell_type": "markdown", - "id": "5204e804", - "metadata": {}, + "id": "899643ce", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X\\vert Y)= \\frac{p(X,Y)}{p(Y)}=\\frac{p(X,Y)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)}.\n", @@ -2171,8 +2242,10 @@ }, { "cell_type": "markdown", - "id": "4b4e8c55", - "metadata": {}, + "id": "4ef190c8", + "metadata": { + "editable": true + }, "source": [ "## Bayes' Theorem\n", "\n", @@ -2181,8 +2254,10 @@ }, { "cell_type": "markdown", - "id": "b60be092", - "metadata": {}, + "id": "ea14b40c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X\\vert Y)= \\frac{p(X,Y)}{p(Y)},\n", @@ -2191,16 +2266,20 @@ }, { "cell_type": "markdown", - "id": "9d9acb12", - "metadata": {}, + "id": "17d256d9", + "metadata": { + "editable": true + }, "source": [ "which we can rewrite as" ] }, { "cell_type": "markdown", - "id": "318ad87c", - "metadata": {}, + "id": "07483acd", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X\\vert Y)= \\frac{p(X,Y)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)}=\\frac{p(Y\\vert X)p(X)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)},\n", @@ -2209,16 +2288,20 @@ }, { "cell_type": "markdown", - "id": "cdf2921d", - "metadata": {}, + "id": "b5a3ee7e", + "metadata": { + "editable": true + }, "source": [ "which is Bayes' theorem. It allows us to evaluate the uncertainty in in $X$ after we have observed $Y$. We can easily interchange $X$ with $Y$." ] }, { "cell_type": "markdown", - "id": "83bd31f6", - "metadata": {}, + "id": "1116169a", + "metadata": { + "editable": true + }, "source": [ "## Interpretations of Bayes' Theorem\n", "\n", @@ -2234,8 +2317,10 @@ }, { "cell_type": "markdown", - "id": "48d7057c", - "metadata": {}, + "id": "88861a9a", + "metadata": { + "editable": true + }, "source": [ "## Example of Usage of Bayes' theorem\n", "\n", @@ -2253,8 +2338,10 @@ }, { "cell_type": "markdown", - "id": "2b19e099", - "metadata": {}, + "id": "469f8261", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X=1\\vert Y=1) =0.8.\n", @@ -2263,8 +2350,10 @@ }, { "cell_type": "markdown", - "id": "ed04f0b1", - "metadata": {}, + "id": "829b1728", + "metadata": { + "editable": true + }, "source": [ "This obviously sounds scary since many would conclude that if the test is positive, there is a likelihood of $80\\%$ for having cancer.\n", "It is however not correct, as the following Bayesian analysis shows." @@ -2272,8 +2361,10 @@ }, { "cell_type": "markdown", - "id": "2260fa5e", - "metadata": {}, + "id": "4d99133c", + "metadata": { + "editable": true + }, "source": [ "## Doing it correctly\n", "\n", @@ -2283,8 +2374,10 @@ }, { "cell_type": "markdown", - "id": "c9c48a17", - "metadata": {}, + "id": "fb649782", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(Y=1) =0.004.\n", @@ -2293,16 +2386,20 @@ }, { "cell_type": "markdown", - "id": "f49df4a7", - "metadata": {}, + "id": "8079bdf4", + "metadata": { + "editable": true + }, "source": [ "We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have" ] }, { "cell_type": "markdown", - "id": "f34c4bc8", - "metadata": {}, + "id": "6b8d6139", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X=1\\vert Y=0) =0.1.\n", @@ -2311,16 +2408,20 @@ }, { "cell_type": "markdown", - "id": "44347f5c", - "metadata": {}, + "id": "7fea340f", + "metadata": { + "editable": true + }, "source": [ "Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute" ] }, { "cell_type": "markdown", - "id": "64a31efc", - "metadata": {}, + "id": "30609da9", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(Y=1\\vert X=1)=\\frac{p(X=1\\vert Y=1)p(Y=1)}{p(X=1\\vert Y=1)p(Y=1)+p(X=1\\vert Y=0)p(Y=0)}=\\frac{0.8\\times 0.004}{0.8\\times 0.004+0.1\\times 0.996}=0.031.\n", @@ -2329,16 +2430,20 @@ }, { "cell_type": "markdown", - "id": "d2a13b09", - "metadata": {}, + "id": "9f9352c3", + "metadata": { + "editable": true + }, "source": [ "That is, in case of a positive test, there is only a $3\\%$ chance of having breast cancer!" ] }, { "cell_type": "markdown", - "id": "0f1aec4a", - "metadata": {}, + "id": "b9a9af20", + "metadata": { + "editable": true + }, "source": [ "## Bayes' Theorem and Ridge and Lasso Regression\n", "\n", @@ -2352,8 +2457,10 @@ }, { "cell_type": "markdown", - "id": "1c278acf", - "metadata": {}, + "id": "c17cc808", + "metadata": { + "editable": true + }, "source": [ "## Test Function for what happens with OLS, Ridge and Lasso\n", "\n", @@ -2369,8 +2476,11 @@ { "cell_type": "code", "execution_count": 4, - "id": "692eede0", - "metadata": {}, + "id": "da52ef4c", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import numpy as np\n", @@ -2440,16 +2550,20 @@ }, { "cell_type": "markdown", - "id": "f7333256", - "metadata": {}, + "id": "baf1d84c", + "metadata": { + "editable": true + }, "source": [ "How can we understand this?" ] }, { "cell_type": "markdown", - "id": "d1b2354d", - "metadata": {}, + "id": "e4fd5c75", + "metadata": { + "editable": true + }, "source": [ "## Invoking Bayes' theorem\n", "\n", @@ -2460,8 +2574,10 @@ }, { "cell_type": "markdown", - "id": "03e1739d", - "metadata": {}, + "id": "908fb3e8", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})],\n", @@ -2470,16 +2586,20 @@ }, { "cell_type": "markdown", - "id": "b6a41543", - "metadata": {}, + "id": "42a4fecc", + "metadata": { + "editable": true + }, "source": [ "is given by" ] }, { "cell_type": "markdown", - "id": "9bf01ae9", - "metadata": {}, + "id": "1838ea96", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n", @@ -2488,16 +2608,20 @@ }, { "cell_type": "markdown", - "id": "b8a5b2b2", - "metadata": {}, + "id": "4a0ea7f0", + "metadata": { + "editable": true + }, "source": [ "In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\\boldsymbol{\\beta}$ given a domain of events $\\boldsymbol{D}$? That is, how can we define the posterior probability" ] }, { "cell_type": "markdown", - "id": "e217276b", - "metadata": {}, + "id": "4cb46e3c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D}).\n", @@ -2506,16 +2630,20 @@ }, { "cell_type": "markdown", - "id": "26a0e9e6", - "metadata": {}, + "id": "db9d993c", + "metadata": { + "editable": true + }, "source": [ "Bayes' theorem comes to our rescue here since (omitting the normalization constant)" ] }, { "cell_type": "markdown", - "id": "48f76275", - "metadata": {}, + "id": "f6efeb5c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})\\propto p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})p(\\boldsymbol{\\beta}).\n", @@ -2524,16 +2652,20 @@ }, { "cell_type": "markdown", - "id": "89908237", - "metadata": {}, + "id": "acaf9f2d", + "metadata": { + "editable": true + }, "source": [ "We have a model for $p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})$ but need one for the **prior** $p(\\boldsymbol{\\beta}$!" ] }, { "cell_type": "markdown", - "id": "488d1d81", - "metadata": {}, + "id": "a73cf73a", + "metadata": { + "editable": true + }, "source": [ "## Ridge and Bayes\n", "\n", @@ -2546,8 +2678,10 @@ }, { "cell_type": "markdown", - "id": "26057db7", - "metadata": {}, + "id": "6bdfe707", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n", @@ -2556,16 +2690,20 @@ }, { "cell_type": "markdown", - "id": "b532ac2d", - "metadata": {}, + "id": "6a71e2d7", + "metadata": { + "editable": true + }, "source": [ "Our posterior probability becomes then (omitting the normalization factor which is just a constant)" ] }, { "cell_type": "markdown", - "id": "f2ea1634", - "metadata": {}, + "id": "ae0f27b9", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta\\vert\\boldsymbol{D})}=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n", @@ -2574,8 +2712,10 @@ }, { "cell_type": "markdown", - "id": "66de1d0b", - "metadata": {}, + "id": "1dd65d98", + "metadata": { + "editable": true + }, "source": [ "We can now optimize this quantity with respect to $\\boldsymbol{\\beta}$. As we\n", "did for OLS, this is most conveniently done by taking the negative\n", @@ -2585,8 +2725,10 @@ }, { "cell_type": "markdown", - "id": "2b931fbc", - "metadata": {}, + "id": "8a4d9e5b", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{2\\tau^2}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n", @@ -2595,16 +2737,20 @@ }, { "cell_type": "markdown", - "id": "ddec27fc", - "metadata": {}, + "id": "61ba5bcd", + "metadata": { + "editable": true + }, "source": [ "and replacing $1/2\\tau^2$ with $\\lambda$ we have" ] }, { "cell_type": "markdown", - "id": "1ddbe4ec", - "metadata": {}, + "id": "23b69c58", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n", @@ -2613,16 +2759,20 @@ }, { "cell_type": "markdown", - "id": "341422d4", - "metadata": {}, + "id": "899f2235", + "metadata": { + "editable": true + }, "source": [ "which is our Ridge cost function! Nice, isn't it?" ] }, { "cell_type": "markdown", - "id": "388e8810", - "metadata": {}, + "id": "dbfeab4a", + "metadata": { + "editable": true + }, "source": [ "## Lasso and Bayes\n", "\n", @@ -2631,8 +2781,10 @@ }, { "cell_type": "markdown", - "id": "93fe5a38", - "metadata": {}, + "id": "7242122e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n", @@ -2641,16 +2793,20 @@ }, { "cell_type": "markdown", - "id": "71aa863c", - "metadata": {}, + "id": "5d8302f0", + "metadata": { + "editable": true + }, "source": [ "Our posterior probability becomes then (omitting the normalization factor which is just a constant)" ] }, { "cell_type": "markdown", - "id": "a8c9f8be", - "metadata": {}, + "id": "0ab7dd23", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n", @@ -2659,8 +2815,10 @@ }, { "cell_type": "markdown", - "id": "44397b9e", - "metadata": {}, + "id": "319acfe8", + "metadata": { + "editable": true + }, "source": [ "Taking the negative\n", "logarithm of the posterior probability and leaving out the\n", @@ -2669,8 +2827,10 @@ }, { "cell_type": "markdown", - "id": "328f04d9", - "metadata": {}, + "id": "f22117ce", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{\\tau}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n", @@ -2679,16 +2839,20 @@ }, { "cell_type": "markdown", - "id": "a84bc3c7", - "metadata": {}, + "id": "54f622e6", + "metadata": { + "editable": true + }, "source": [ "and replacing $1/\\tau$ with $\\lambda$ we have" ] }, { "cell_type": "markdown", - "id": "56346230", - "metadata": {}, + "id": "0adbb936", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n", @@ -2697,16 +2861,20 @@ }, { "cell_type": "markdown", - "id": "d1b52def", - "metadata": {}, + "id": "6c067f7f", + "metadata": { + "editable": true + }, "source": [ "which is our Lasso cost function!" ] }, { "cell_type": "markdown", - "id": "66cdc40f", - "metadata": {}, + "id": "3960583d", + "metadata": { + "editable": true + }, "source": [ "## Deriving OLS from a probability distribution\n", "\n", @@ -2726,8 +2894,10 @@ }, { "cell_type": "markdown", - "id": "bf51888d", - "metadata": {}, + "id": "b5519ef3", + "metadata": { + "editable": true + }, "source": [ "$$\n", "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n", @@ -2736,8 +2906,10 @@ }, { "cell_type": "markdown", - "id": "a855c7a1", - "metadata": {}, + "id": "ad965079", + "metadata": { + "editable": true + }, "source": [ "## Independent and Identically Distrubuted (iid)\n", "\n", @@ -2747,8 +2919,10 @@ }, { "cell_type": "markdown", - "id": "3664ccdb", - "metadata": {}, + "id": "e786100f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]},\n", @@ -2757,8 +2931,10 @@ }, { "cell_type": "markdown", - "id": "7893fe17", - "metadata": {}, + "id": "c620ff98", + "metadata": { + "editable": true + }, "source": [ "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\beta}$.\n", "\n", @@ -2767,8 +2943,10 @@ }, { "cell_type": "markdown", - "id": "4179b956", - "metadata": {}, + "id": "b8636581", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta}).\n", @@ -2777,8 +2955,10 @@ }, { "cell_type": "markdown", - "id": "1e53d72b", - "metadata": {}, + "id": "c95c3402", + "metadata": { + "editable": true + }, "source": [ "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n", "in case we have a simple one-dimensional input and output case" @@ -2786,8 +2966,10 @@ }, { "cell_type": "markdown", - "id": "c54d0c81", - "metadata": {}, + "id": "8a1f4160", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n", @@ -2796,8 +2978,10 @@ }, { "cell_type": "markdown", - "id": "fbecd0ad", - "metadata": {}, + "id": "85444cf8", + "metadata": { + "editable": true + }, "source": [ "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n", "We can now rewrite the above probability as" @@ -2805,8 +2989,10 @@ }, { "cell_type": "markdown", - "id": "0abd00aa", - "metadata": {}, + "id": "a51ad674", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n", @@ -2815,16 +3001,20 @@ }, { "cell_type": "markdown", - "id": "71478f61", - "metadata": {}, + "id": "27de9788", + "metadata": { + "editable": true + }, "source": [ "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\beta}$." ] }, { "cell_type": "markdown", - "id": "6352fa5a", - "metadata": {}, + "id": "fec2d139", + "metadata": { + "editable": true + }, "source": [ "## Maximum Likelihood Estimation (MLE)\n", "\n", @@ -2852,8 +3042,10 @@ }, { "cell_type": "markdown", - "id": "51af15f4", - "metadata": {}, + "id": "d28cabe5", + "metadata": { + "editable": true + }, "source": [ "## A new Cost Function\n", "\n", @@ -2862,8 +3054,10 @@ }, { "cell_type": "markdown", - "id": "cebbeb23", - "metadata": {}, + "id": "853b3797", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})},\n", @@ -2872,16 +3066,20 @@ }, { "cell_type": "markdown", - "id": "fb31577d", - "metadata": {}, + "id": "42952aa8", + "metadata": { + "editable": true + }, "source": [ "which becomes" ] }, { "cell_type": "markdown", - "id": "1e57fdc1", - "metadata": {}, + "id": "6f9b202f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}.\n", @@ -2890,16 +3088,20 @@ }, { "cell_type": "markdown", - "id": "d47da897", - "metadata": {}, + "id": "c531bc45", + "metadata": { + "editable": true + }, "source": [ "Taking the derivative of the *new* cost function with respect to the parameters $\\beta$ we recognize our familiar OLS equation, namely" ] }, { "cell_type": "markdown", - "id": "cafa1620", - "metadata": {}, + "id": "83d5af93", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right) =0,\n", @@ -2908,16 +3110,20 @@ }, { "cell_type": "markdown", - "id": "1d0a4b40", - "metadata": {}, + "id": "10a67fb3", + "metadata": { + "editable": true + }, "source": [ "which leads to the well-known OLS equation for the optimal paramters $\\beta$" ] }, { "cell_type": "markdown", - "id": "f19df38b", - "metadata": {}, + "id": "8eaa8af1", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n", @@ -2926,8 +3132,10 @@ }, { "cell_type": "markdown", - "id": "7e66279f", - "metadata": {}, + "id": "191ae18b", + "metadata": { + "editable": true + }, "source": [ "## Bayes' Theorem\n", "\n", @@ -2936,8 +3144,10 @@ }, { "cell_type": "markdown", - "id": "dd2a6dce", - "metadata": {}, + "id": "9f9aa209", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X\\vert Y)= \\frac{p(X,Y)}{p(Y)},\n", @@ -2946,16 +3156,20 @@ }, { "cell_type": "markdown", - "id": "5553ddfd", - "metadata": {}, + "id": "ec245678", + "metadata": { + "editable": true + }, "source": [ "which we can rewrite as" ] }, { "cell_type": "markdown", - "id": "bbc1dc01", - "metadata": {}, + "id": "3cb237fb", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(X\\vert Y)= \\frac{p(X,Y)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)}=\\frac{p(Y\\vert X)p(X)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)},\n", @@ -2964,16 +3178,20 @@ }, { "cell_type": "markdown", - "id": "1d934976", - "metadata": {}, + "id": "91a6a9a2", + "metadata": { + "editable": true + }, "source": [ "which is Bayes' theorem. It allows us to evaluate the uncertainty in in $X$ after we have observed $Y$. We can easily interchange $X$ with $Y$." ] }, { "cell_type": "markdown", - "id": "af91acff", - "metadata": {}, + "id": "a194c6bd", + "metadata": { + "editable": true + }, "source": [ "## Interpretations of Bayes' Theorem\n", "\n", @@ -2987,8 +3205,10 @@ }, { "cell_type": "markdown", - "id": "6cd0aef9", - "metadata": {}, + "id": "f6da60ea", + "metadata": { + "editable": true + }, "source": [ "## Test Function for what happens with OLS, Ridge and Lasso\n", "\n", @@ -3004,8 +3224,11 @@ { "cell_type": "code", "execution_count": 5, - "id": "138aefde", - "metadata": {}, + "id": "9f0dc5f6", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import numpy as np\n", @@ -3075,16 +3298,20 @@ }, { "cell_type": "markdown", - "id": "f2f0e4e2", - "metadata": {}, + "id": "e51aac1d", + "metadata": { + "editable": true + }, "source": [ "How can we understand this?" ] }, { "cell_type": "markdown", - "id": "70a740db", - "metadata": {}, + "id": "24853f82", + "metadata": { + "editable": true + }, "source": [ "## Rerunning the above code\n", "\n", @@ -3111,8 +3338,11 @@ { "cell_type": "code", "execution_count": 6, - "id": "d78d3856", - "metadata": {}, + "id": "d75c01b1", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import numpy as np\n", @@ -3157,8 +3387,10 @@ }, { "cell_type": "markdown", - "id": "56b5bd43", - "metadata": {}, + "id": "ba2e83ae", + "metadata": { + "editable": true + }, "source": [ "## Invoking Bayes' theorem\n", "\n", @@ -3169,8 +3401,10 @@ }, { "cell_type": "markdown", - "id": "6323ead1", - "metadata": {}, + "id": "831bdde3", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})],\n", @@ -3179,16 +3413,20 @@ }, { "cell_type": "markdown", - "id": "6c5a9d05", - "metadata": {}, + "id": "de3da0a6", + "metadata": { + "editable": true + }, "source": [ "is given by" ] }, { "cell_type": "markdown", - "id": "c44868f3", - "metadata": {}, + "id": "091f76a6", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n", @@ -3197,16 +3435,20 @@ }, { "cell_type": "markdown", - "id": "32c5a08a", - "metadata": {}, + "id": "763e3b26", + "metadata": { + "editable": true + }, "source": [ "In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\\boldsymbol{\\beta}$ given a domain of events $\\boldsymbol{D}$? That is, how can we define the posterior probability" ] }, { "cell_type": "markdown", - "id": "6aee2553", - "metadata": {}, + "id": "5d412a98", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D}).\n", @@ -3215,16 +3457,20 @@ }, { "cell_type": "markdown", - "id": "ae17b4da", - "metadata": {}, + "id": "0ba3e207", + "metadata": { + "editable": true + }, "source": [ "Bayes' theorem comes to our rescue here since (omitting the normalization constant)" ] }, { "cell_type": "markdown", - "id": "07edfc9d", - "metadata": {}, + "id": "75c3c8dd", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})\\propto p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})p(\\boldsymbol{\\beta}).\n", @@ -3233,16 +3479,20 @@ }, { "cell_type": "markdown", - "id": "168fa7f1", - "metadata": {}, + "id": "8c04ebfb", + "metadata": { + "editable": true + }, "source": [ "We have a model for $p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})$ but need one for the **prior** $p(\\boldsymbol{\\beta}$!" ] }, { "cell_type": "markdown", - "id": "ba943592", - "metadata": {}, + "id": "a1971089", + "metadata": { + "editable": true + }, "source": [ "## Ridge and Bayes\n", "\n", @@ -3255,8 +3505,10 @@ }, { "cell_type": "markdown", - "id": "2febbf21", - "metadata": {}, + "id": "9a5608dc", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n", @@ -3265,16 +3517,20 @@ }, { "cell_type": "markdown", - "id": "e72040ed", - "metadata": {}, + "id": "03ea7796", + "metadata": { + "editable": true + }, "source": [ "Our posterior probability becomes then (omitting the normalization factor which is just a constant)" ] }, { "cell_type": "markdown", - "id": "d9c1b70a", - "metadata": {}, + "id": "e4261565", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta\\vert\\boldsymbol{D})}=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n", @@ -3283,8 +3539,10 @@ }, { "cell_type": "markdown", - "id": "c241b580", - "metadata": {}, + "id": "e10cbbc6", + "metadata": { + "editable": true + }, "source": [ "We can now optimize this quantity with respect to $\\boldsymbol{\\beta}$. As we\n", "did for OLS, this is most conveniently done by taking the negative\n", @@ -3294,8 +3552,10 @@ }, { "cell_type": "markdown", - "id": "04e7c00b", - "metadata": {}, + "id": "c5daa671", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{2\\tau^2}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n", @@ -3304,16 +3564,20 @@ }, { "cell_type": "markdown", - "id": "6bc23829", - "metadata": {}, + "id": "ec6d384a", + "metadata": { + "editable": true + }, "source": [ "and replacing $1/2\\tau^2$ with $\\lambda$ we have" ] }, { "cell_type": "markdown", - "id": "b4110e51", - "metadata": {}, + "id": "f810360e", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n", @@ -3322,16 +3586,20 @@ }, { "cell_type": "markdown", - "id": "c9945279", - "metadata": {}, + "id": "694d4a3b", + "metadata": { + "editable": true + }, "source": [ "which is our Ridge cost function! Nice, isn't it?" ] }, { "cell_type": "markdown", - "id": "03964466", - "metadata": {}, + "id": "84fa0f29", + "metadata": { + "editable": true + }, "source": [ "## Lasso and Bayes\n", "\n", @@ -3340,8 +3608,10 @@ }, { "cell_type": "markdown", - "id": "4ab9227c", - "metadata": {}, + "id": "0a9bc6c3", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n", @@ -3350,16 +3620,20 @@ }, { "cell_type": "markdown", - "id": "d1677169", - "metadata": {}, + "id": "426a688b", + "metadata": { + "editable": true + }, "source": [ "Our posterior probability becomes then (omitting the normalization factor which is just a constant)" ] }, { "cell_type": "markdown", - "id": "7a3792ad", - "metadata": {}, + "id": "4f32b202", + "metadata": { + "editable": true + }, "source": [ "$$\n", "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n", @@ -3368,8 +3642,10 @@ }, { "cell_type": "markdown", - "id": "f187678a", - "metadata": {}, + "id": "faea2c8f", + "metadata": { + "editable": true + }, "source": [ "Taking the negative\n", "logarithm of the posterior probability and leaving out the\n", @@ -3378,8 +3654,10 @@ }, { "cell_type": "markdown", - "id": "681825df", - "metadata": {}, + "id": "c7f21d34", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{\\tau}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n", @@ -3388,16 +3666,20 @@ }, { "cell_type": "markdown", - "id": "bedf4fc9", - "metadata": {}, + "id": "ab07e217", + "metadata": { + "editable": true + }, "source": [ "and replacing $1/\\tau$ with $\\lambda$ we have" ] }, { "cell_type": "markdown", - "id": "edd7db8f", - "metadata": {}, + "id": "63f80d0b", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n", @@ -3406,16 +3688,20 @@ }, { "cell_type": "markdown", - "id": "6617f9bb", - "metadata": {}, + "id": "a2fba07c", + "metadata": { + "editable": true + }, "source": [ "which is our Lasso cost function!" ] }, { "cell_type": "markdown", - "id": "77e9e54e", - "metadata": {}, + "id": "43fdf691", + "metadata": { + "editable": true + }, "source": [ "## Why resampling methods\n", "\n", @@ -3431,8 +3717,10 @@ }, { "cell_type": "markdown", - "id": "935d4fe8", - "metadata": {}, + "id": "087509a5", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods\n", "Resampling methods are an indispensable tool in modern\n", @@ -3457,8 +3745,10 @@ }, { "cell_type": "markdown", - "id": "508f611a", - "metadata": {}, + "id": "0df9fa6e", + "metadata": { + "editable": true + }, "source": [ "## Resampling approaches can be computationally expensive\n", "\n", @@ -3481,8 +3771,10 @@ }, { "cell_type": "markdown", - "id": "fce867a0", - "metadata": {}, + "id": "f7b93afa", + "metadata": { + "editable": true + }, "source": [ "## Why resampling methods ?\n", "**Statistical analysis.**\n", @@ -3496,8 +3788,10 @@ }, { "cell_type": "markdown", - "id": "0701c4d6", - "metadata": {}, + "id": "a14b4f27", + "metadata": { + "editable": true + }, "source": [ "## Statistical analysis\n", "\n", @@ -3514,8 +3808,10 @@ }, { "cell_type": "markdown", - "id": "7db598fa", - "metadata": {}, + "id": "eb00cc6a", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods\n", "\n", @@ -3541,8 +3837,10 @@ }, { "cell_type": "markdown", - "id": "0fcb4a86", - "metadata": {}, + "id": "2e9ced9e", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods: Jackknife and Bootstrap\n", "\n", @@ -3564,8 +3862,10 @@ }, { "cell_type": "markdown", - "id": "a29c2835", - "metadata": {}, + "id": "d0696d2a", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods: Jackknife\n", "\n", @@ -3576,8 +3876,10 @@ }, { "cell_type": "markdown", - "id": "9b3e5c68", - "metadata": {}, + "id": "31962112", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{x}_i = (x_1,x_2,\\cdots,x_{i-1},x_{i+1},\\cdots,x_n),\n", @@ -3586,8 +3888,10 @@ }, { "cell_type": "markdown", - "id": "c17dce67", - "metadata": {}, + "id": "dc02cc7f", + "metadata": { + "editable": true + }, "source": [ "which equals the vector $\\boldsymbol{x}$ with the exception that observation\n", "number $i$ is left out. Using this notation, define\n", @@ -3597,8 +3901,10 @@ }, { "cell_type": "markdown", - "id": "388f2cab", - "metadata": {}, + "id": "0faa5a99", + "metadata": { + "editable": true + }, "source": [ "## Jackknife code example" ] @@ -3606,8 +3912,11 @@ { "cell_type": "code", "execution_count": 7, - "id": "2fd22f69", - "metadata": {}, + "id": "9bc3ec1f", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "from numpy import *\n", @@ -3642,8 +3951,10 @@ }, { "cell_type": "markdown", - "id": "bca52c67", - "metadata": {}, + "id": "4be7d2c2", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods: Bootstrap\n", "Bootstrapping is a non-parametric approach to statistical inference\n", @@ -3665,8 +3976,10 @@ }, { "cell_type": "markdown", - "id": "d56da2f1", - "metadata": {}, + "id": "c1e61d5b", + "metadata": { + "editable": true + }, "source": [ "## The Central Limit Theorem\n", "\n", @@ -3683,8 +3996,10 @@ }, { "cell_type": "markdown", - "id": "b3237b5a", - "metadata": {}, + "id": "25922aa6", + "metadata": { + "editable": true + }, "source": [ "$$\n", "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n", @@ -3693,16 +4008,20 @@ }, { "cell_type": "markdown", - "id": "105a5395", - "metadata": {}, + "id": "cfad9318", + "metadata": { + "editable": true + }, "source": [ "the question we pose is which is the PDF of the new variable $z$." ] }, { "cell_type": "markdown", - "id": "3fe83209", - "metadata": {}, + "id": "eff78e0a", + "metadata": { + "editable": true + }, "source": [ "## Finding the Limit\n", "\n", @@ -3714,8 +4033,10 @@ }, { "cell_type": "markdown", - "id": "c39470af", - "metadata": {}, + "id": "b06c20d2", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n", @@ -3725,8 +4046,10 @@ }, { "cell_type": "markdown", - "id": "5dfcba40", - "metadata": {}, + "id": "1d6f749b", + "metadata": { + "editable": true + }, "source": [ "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n", "All measurements that lead to each individual $x_i$ are expected to\n", @@ -3736,8 +4059,10 @@ }, { "cell_type": "markdown", - "id": "d0bf9a86", - "metadata": {}, + "id": "4e969b8d", + "metadata": { + "editable": true + }, "source": [ "## Rewriting the $\\delta$-function\n", "\n", @@ -3746,8 +4071,10 @@ }, { "cell_type": "markdown", - "id": "e28b1469", - "metadata": {}, + "id": "8bde7f7f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", @@ -3757,8 +4084,10 @@ }, { "cell_type": "markdown", - "id": "66bf195d", - "metadata": {}, + "id": "3754744e", + "metadata": { + "editable": true + }, "source": [ "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n", "we arrive at" @@ -3766,8 +4095,10 @@ }, { "cell_type": "markdown", - "id": "a0c7411a", - "metadata": {}, + "id": "e2d07bda", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n", @@ -3778,16 +4109,20 @@ }, { "cell_type": "markdown", - "id": "7a60c876", - "metadata": {}, + "id": "7e7f68ca", + "metadata": { + "editable": true + }, "source": [ "with the integral over $x$ resulting in" ] }, { "cell_type": "markdown", - "id": "f8d92005", - "metadata": {}, + "id": "7094c944", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n", @@ -3798,8 +4133,10 @@ }, { "cell_type": "markdown", - "id": "de3350f6", - "metadata": {}, + "id": "3f94fc13", + "metadata": { + "editable": true + }, "source": [ "## Identifying Terms\n", "\n", @@ -3809,8 +4146,10 @@ }, { "cell_type": "markdown", - "id": "23267d7e", - "metadata": {}, + "id": "9437e2c7", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n", @@ -3820,16 +4159,20 @@ }, { "cell_type": "markdown", - "id": "226eb6f6", - "metadata": {}, + "id": "bb47dde2", + "metadata": { + "editable": true + }, "source": [ "resulting in" ] }, { "cell_type": "markdown", - "id": "5137d145", - "metadata": {}, + "id": "e4831f7c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n", @@ -3839,16 +4182,20 @@ }, { "cell_type": "markdown", - "id": "e6ff93f6", - "metadata": {}, + "id": "f99710c8", + "metadata": { + "editable": true + }, "source": [ "and in the limit $m\\rightarrow \\infty$ we obtain" ] }, { "cell_type": "markdown", - "id": "da8b2b1a", - "metadata": {}, + "id": "c28fdadf", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n", @@ -3858,8 +4205,10 @@ }, { "cell_type": "markdown", - "id": "b3c5c051", - "metadata": {}, + "id": "5912be3b", + "metadata": { + "editable": true + }, "source": [ "which is the normal distribution with variance\n", "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n", @@ -3868,8 +4217,10 @@ }, { "cell_type": "markdown", - "id": "06dd6575", - "metadata": {}, + "id": "719d7288", + "metadata": { + "editable": true + }, "source": [ "## Wrapping it up\n", "\n", @@ -3885,8 +4236,10 @@ }, { "cell_type": "markdown", - "id": "bac1f242", - "metadata": {}, + "id": "3de2aead", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\sigma_m=\n", @@ -3896,8 +4249,10 @@ }, { "cell_type": "markdown", - "id": "96b515b5", - "metadata": {}, + "id": "2e6028b6", + "metadata": { + "editable": true + }, "source": [ "The latter is true only if the average value is known exactly. This is obtained in the limit\n", "$m\\rightarrow \\infty$ only. Because the mean and the variance are measured quantities we obtain \n", @@ -3906,8 +4261,10 @@ }, { "cell_type": "markdown", - "id": "3e0528cc", - "metadata": {}, + "id": "91809444", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\sigma_m\\approx \n", @@ -3917,8 +4274,10 @@ }, { "cell_type": "markdown", - "id": "8b1ceed8", - "metadata": {}, + "id": "ebc4cced", + "metadata": { + "editable": true + }, "source": [ "In many cases however the above estimate for the standard deviation,\n", "in particular if correlations are strong, may be too simplistic. Keep\n", @@ -3935,8 +4294,10 @@ }, { "cell_type": "markdown", - "id": "267c90d6", - "metadata": {}, + "id": "927fa786", + "metadata": { + "editable": true + }, "source": [ "## Confidence Intervals\n", "\n", @@ -3956,8 +4317,10 @@ }, { "cell_type": "markdown", - "id": "3a9aef51", - "metadata": {}, + "id": "e165b45a", + "metadata": { + "editable": true + }, "source": [ "## Standard Approach based on the Normal Distribution\n", "\n", @@ -3969,8 +4332,10 @@ }, { "cell_type": "markdown", - "id": "bd85a32f", - "metadata": {}, + "id": "a2d17c0f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\left(\\mu_{\\beta}\\pm \\frac{z\\sigma_{\\beta}}{\\sqrt{n}}\\right),\n", @@ -3979,8 +4344,10 @@ }, { "cell_type": "markdown", - "id": "5cafffd4", - "metadata": {}, + "id": "b1f0a225", + "metadata": { + "editable": true + }, "source": [ "where $z$ defines the level of certainty (or confidence). For a normal\n", "distribution typical parameters are $z=2.576$ which corresponds to a\n", @@ -3997,8 +4364,10 @@ }, { "cell_type": "markdown", - "id": "3547420c", - "metadata": {}, + "id": "1670f8f5", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods: Bootstrap background\n", "\n", @@ -4015,8 +4384,10 @@ }, { "cell_type": "markdown", - "id": "57d59b35", - "metadata": {}, + "id": "a23b34bd", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods: More Bootstrap background\n", "\n", @@ -4037,8 +4408,10 @@ }, { "cell_type": "markdown", - "id": "2009b293", - "metadata": {}, + "id": "b60714c3", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods: Bootstrap approach\n", "\n", @@ -4056,8 +4429,10 @@ }, { "cell_type": "markdown", - "id": "3b3f7c23", - "metadata": {}, + "id": "2797c438", + "metadata": { + "editable": true + }, "source": [ "## Resampling methods: Bootstrap steps\n", "\n", @@ -4084,8 +4459,10 @@ }, { "cell_type": "markdown", - "id": "e315f571", - "metadata": {}, + "id": "8aa53514", + "metadata": { + "editable": true + }, "source": [ "## Code example for the Bootstrap method\n", "\n", @@ -4106,8 +4483,11 @@ { "cell_type": "code", "execution_count": 8, - "id": "e97da267", - "metadata": {}, + "id": "7e395f53", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import numpy as np\n", @@ -4140,16 +4520,20 @@ }, { "cell_type": "markdown", - "id": "7bfbf631", - "metadata": {}, + "id": "5ef4e750", + "metadata": { + "editable": true + }, "source": [ "We see that our new variance and from that the standard deviation, agrees with the central limit theorem." ] }, { "cell_type": "markdown", - "id": "5a1c8639", - "metadata": {}, + "id": "f7e08e78", + "metadata": { + "editable": true + }, "source": [ "## Plotting the Histogram" ] @@ -4157,8 +4541,11 @@ { "cell_type": "code", "execution_count": 9, - "id": "a86d297a", - "metadata": {}, + "id": "9b80e633", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# the histogram of the bootstrapped data (normalized data if density = True)\n", @@ -4174,8 +4561,10 @@ }, { "cell_type": "markdown", - "id": "4c024047", - "metadata": {}, + "id": "9a8b0f53", + "metadata": { + "editable": true + }, "source": [ "## The bias-variance tradeoff\n", "\n", @@ -4190,8 +4579,10 @@ }, { "cell_type": "markdown", - "id": "a7a56e18", - "metadata": {}, + "id": "8196e56f", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n", @@ -4200,8 +4591,10 @@ }, { "cell_type": "markdown", - "id": "3e18c5de", - "metadata": {}, + "id": "73f47045", + "metadata": { + "editable": true + }, "source": [ "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n", "\n", @@ -4215,8 +4608,10 @@ }, { "cell_type": "markdown", - "id": "bcce7420", - "metadata": {}, + "id": "6270c25d", + "metadata": { + "editable": true + }, "source": [ "$$\n", "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", @@ -4225,16 +4620,20 @@ }, { "cell_type": "markdown", - "id": "121433f0", - "metadata": {}, + "id": "38f50b48", + "metadata": { + "editable": true + }, "source": [ "We can rewrite this as" ] }, { "cell_type": "markdown", - "id": "94d035af", - "metadata": {}, + "id": "ea869e50", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n", @@ -4243,8 +4642,10 @@ }, { "cell_type": "markdown", - "id": "b6502ac4", - "metadata": {}, + "id": "779e7714", + "metadata": { + "editable": true + }, "source": [ "The three terms represent the square of the bias of the learning\n", "method, which can be thought of as the error caused by the simplifying\n", @@ -4258,8 +4659,10 @@ }, { "cell_type": "markdown", - "id": "66910ad4", - "metadata": {}, + "id": "0e3bc25d", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n", @@ -4268,16 +4671,20 @@ }, { "cell_type": "markdown", - "id": "af07a8d0", - "metadata": {}, + "id": "dd6346d4", + "metadata": { + "editable": true + }, "source": [ "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get" ] }, { "cell_type": "markdown", - "id": "56e8518d", - "metadata": {}, + "id": "a85c7fe2", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n", @@ -4286,16 +4693,20 @@ }, { "cell_type": "markdown", - "id": "259c8f56", - "metadata": {}, + "id": "a474a09e", + "metadata": { + "editable": true + }, "source": [ "which, using the abovementioned expectation values can be rewritten as" ] }, { "cell_type": "markdown", - "id": "96fcb975", - "metadata": {}, + "id": "f8392d4c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n", @@ -4304,16 +4715,20 @@ }, { "cell_type": "markdown", - "id": "ba665645", - "metadata": {}, + "id": "88d30a5d", + "metadata": { + "editable": true + }, "source": [ "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$." ] }, { "cell_type": "markdown", - "id": "0d692042", - "metadata": {}, + "id": "9aa0a436", + "metadata": { + "editable": true + }, "source": [ "## A way to Read the Bias-Variance Tradeoff\n", "\n", @@ -4326,8 +4741,10 @@ }, { "cell_type": "markdown", - "id": "149e2573", - "metadata": {}, + "id": "23654b68", + "metadata": { + "editable": true + }, "source": [ "## Example code for Bias-Variance tradeoff" ] @@ -4335,8 +4752,11 @@ { "cell_type": "code", "execution_count": 10, - "id": "49178e41", - "metadata": {}, + "id": "fcb04b3d", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", @@ -4397,8 +4817,10 @@ }, { "cell_type": "markdown", - "id": "7572388e", - "metadata": {}, + "id": "c4f13eaa", + "metadata": { + "editable": true + }, "source": [ "## Understanding what happens" ] @@ -4406,8 +4828,11 @@ { "cell_type": "code", "execution_count": 11, - "id": "27bde540", - "metadata": {}, + "id": "b4212a78", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", @@ -4460,8 +4885,10 @@ }, { "cell_type": "markdown", - "id": "23ae6594", - "metadata": {}, + "id": "15e01f77", + "metadata": { + "editable": true + }, "source": [ "## Summing up\n", "\n", @@ -4496,8 +4923,10 @@ }, { "cell_type": "markdown", - "id": "168bc9ba", - "metadata": {}, + "id": "eec42478", + "metadata": { + "editable": true + }, "source": [ "## Another Example from Scikit-Learn's Repository" ] @@ -4505,8 +4934,11 @@ { "cell_type": "code", "execution_count": 12, - "id": "0b5340dc", - "metadata": {}, + "id": "8fcab853", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "\"\"\"\n", @@ -4584,8 +5016,10 @@ }, { "cell_type": "markdown", - "id": "837fbf8e", - "metadata": {}, + "id": "49f69f02", + "metadata": { + "editable": true + }, "source": [ "## Various steps in cross-validation\n", "\n", @@ -4607,8 +5041,10 @@ }, { "cell_type": "markdown", - "id": "b3ea4b1f", - "metadata": {}, + "id": "2a43c500", + "metadata": { + "editable": true + }, "source": [ "## How to set up the cross-validation for Ridge and/or Lasso\n", "\n", @@ -4621,8 +5057,10 @@ }, { "cell_type": "markdown", - "id": "cce015ce", - "metadata": {}, + "id": "3bf76366", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{align*}\n", @@ -4635,8 +5073,10 @@ }, { "cell_type": "markdown", - "id": "61403da6", - "metadata": {}, + "id": "2865226e", + "metadata": { + "editable": true + }, "source": [ "* Evaluate the prediction performance of these models on the test set by $[y_i, \\boldsymbol{X}_{i, \\ast}; \\boldsymbol{\\beta}_{-i}(\\lambda), \\boldsymbol{\\sigma}_{-i}^2(\\lambda)]$. Or, by the prediction error $|y_i - \\boldsymbol{X}_{i, \\ast} \\boldsymbol{\\beta}_{-i}(\\lambda)|$, the relative error, the error squared or the R2 score function.\n", "\n", @@ -4647,8 +5087,10 @@ }, { "cell_type": "markdown", - "id": "bbd26e8b", - "metadata": {}, + "id": "c28a4bda", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\begin{align*}\n", @@ -4659,8 +5101,10 @@ }, { "cell_type": "markdown", - "id": "56cf97f5", - "metadata": {}, + "id": "4328f2a6", + "metadata": { + "editable": true + }, "source": [ "## Cross-validation in brief\n", "\n", @@ -4685,8 +5129,10 @@ }, { "cell_type": "markdown", - "id": "d2161943", - "metadata": {}, + "id": "f8ab4371", + "metadata": { + "editable": true + }, "source": [ "## Code Example for Cross-validation and $k$-fold Cross-validation\n", "\n", @@ -4696,8 +5142,11 @@ { "cell_type": "code", "execution_count": 13, - "id": "280eff45", - "metadata": {}, + "id": "8a7f8126", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "import numpy as np\n", @@ -4793,8 +5242,10 @@ }, { "cell_type": "markdown", - "id": "6c93e890", - "metadata": {}, + "id": "e5d7cb66", + "metadata": { + "editable": true + }, "source": [ "## More examples on bootstrap and cross-validation and errors" ] @@ -4802,8 +5253,11 @@ { "cell_type": "code", "execution_count": 14, - "id": "8ac1b77b", - "metadata": {}, + "id": "03d23637", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# Common imports\n", @@ -4888,16 +5342,20 @@ }, { "cell_type": "markdown", - "id": "501003fd", - "metadata": {}, + "id": "211fa442", + "metadata": { + "editable": true + }, "source": [ "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones." ] }, { "cell_type": "markdown", - "id": "fe6913fe", - "metadata": {}, + "id": "e42d4d92", + "metadata": { + "editable": true + }, "source": [ "## The same example but now with cross-validation\n", "\n", @@ -4907,8 +5365,11 @@ { "cell_type": "code", "execution_count": 15, - "id": "cddd5a5c", - "metadata": {}, + "id": "9cba801d", + "metadata": { + "collapsed": false, + "editable": true + }, "outputs": [], "source": [ "# Common imports\n", @@ -4982,8 +5443,10 @@ }, { "cell_type": "markdown", - "id": "00dbf855", - "metadata": {}, + "id": "c0142fc4", + "metadata": { + "editable": true + }, "source": [ "## Overarching aims of the exercises this week\n", "\n", @@ -5005,8 +5468,10 @@ }, { "cell_type": "markdown", - "id": "96a5f8dd", - "metadata": {}, + "id": "80c71c8c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n", @@ -5015,8 +5480,10 @@ }, { "cell_type": "markdown", - "id": "77a692f8", - "metadata": {}, + "id": "960da1af", + "metadata": { + "editable": true + }, "source": [ "We then approximate this function $f(\\boldsymbol{x})$ with our model $\\boldsymbol{\\tilde{y}}$ from the solution of the linear regression equations (ordinary least squares OLS), that is our\n", "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we minimized $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, with" @@ -5024,8 +5491,10 @@ }, { "cell_type": "markdown", - "id": "bbbedeba", - "metadata": {}, + "id": "3cc81e1c", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\beta}.\n", @@ -5034,16 +5503,20 @@ }, { "cell_type": "markdown", - "id": "3a58e150", - "metadata": {}, + "id": "e09d6ec4", + "metadata": { + "editable": true + }, "source": [ "The matrix $\\boldsymbol{X}$ is the so-called design or feature matrix." ] }, { "cell_type": "markdown", - "id": "792c0c41", - "metadata": {}, + "id": "b4c428ce", + "metadata": { + "editable": true + }, "source": [ "## Exercise 1: Expectation values for ordinary least squares expressions\n", "\n", @@ -5052,8 +5525,10 @@ }, { "cell_type": "markdown", - "id": "be05a547", - "metadata": {}, + "id": "76925bba", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E}(y_i) =\\sum_{j}x_{ij} \\beta_j=\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta},\n", @@ -5062,8 +5537,10 @@ }, { "cell_type": "markdown", - "id": "a3362a89", - "metadata": {}, + "id": "661303fc", + "metadata": { + "editable": true + }, "source": [ "and that\n", "its variance is" @@ -5071,8 +5548,10 @@ }, { "cell_type": "markdown", - "id": "6c741120", - "metadata": {}, + "id": "4dfa5687", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mbox{Var}(y_i) = \\sigma^2.\n", @@ -5081,8 +5560,10 @@ }, { "cell_type": "markdown", - "id": "261662ab", - "metadata": {}, + "id": "9773149b", + "metadata": { + "editable": true + }, "source": [ "Hence, $y_i \\sim N( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n", "mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n", @@ -5092,8 +5573,10 @@ }, { "cell_type": "markdown", - "id": "98f68e9b", - "metadata": {}, + "id": "48801ec9", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}}) = \\boldsymbol{\\beta}.\n", @@ -5102,16 +5585,20 @@ }, { "cell_type": "markdown", - "id": "62d36b90", - "metadata": {}, + "id": "a57119a4", + "metadata": { + "editable": true + }, "source": [ "Show finally that the variance of $\\boldsymbol{\\boldsymbol{\\beta}}$ is" ] }, { "cell_type": "markdown", - "id": "cdd7640b", - "metadata": {}, + "id": "98bd6f19", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mbox{Var}(\\boldsymbol{\\hat{\\beta}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n", @@ -5120,8 +5607,10 @@ }, { "cell_type": "markdown", - "id": "85f6cbfd", - "metadata": {}, + "id": "bd33f144", + "metadata": { + "editable": true + }, "source": [ "We can use the last expression when we define a [so-called confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\beta$. \n", "A given parameter $\\beta_j$ is given by the diagonal matrix element of the above matrix." @@ -5129,8 +5618,10 @@ }, { "cell_type": "markdown", - "id": "9ddba3ca", - "metadata": {}, + "id": "b35f3e96", + "metadata": { + "editable": true + }, "source": [ "## Exercise 2: Expectation values for Ridge regression\n", "\n", @@ -5139,8 +5630,10 @@ }, { "cell_type": "markdown", - "id": "45aa8dd2", - "metadata": {}, + "id": "d3c4c45a", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}.\n", @@ -5149,8 +5642,10 @@ }, { "cell_type": "markdown", - "id": "32a9f4d1", - "metadata": {}, + "id": "9c302b69", + "metadata": { + "editable": true + }, "source": [ "We see clearly that\n", "$\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n", @@ -5160,8 +5655,10 @@ }, { "cell_type": "markdown", - "id": "5a3613f1", - "metadata": {}, + "id": "bbcfa633", + "metadata": { + "editable": true + }, "source": [ "$$\n", "\\mbox{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[ \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1} \\mathbf{X}^{T}\\mathbf{X} \\{ [ \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n", @@ -5170,32 +5667,165 @@ }, { "cell_type": "markdown", - "id": "6760b10e", - "metadata": {}, + "id": "af584cdb", + "metadata": { + "editable": true + }, "source": [ "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero." ] + }, + { + "cell_type": "markdown", + "id": "3c9ef45f", + "metadata": { + "editable": true + }, + "source": [ + "## Exercise 3: Bias-Variance tradeoff\n", + "\n", + "The aim of the exercises is to derive the equations for the bias-variance tradeoff to be used in project 1 as well as testing this for a simpler function using the bootstrap method. \n", + "\n", + "Consider a\n", + "dataset $\\mathcal{L}$ consisting of the data\n", + "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n", + "\n", + "We assume that the true data is generated from a noisy model" + ] + }, + { + "cell_type": "markdown", + "id": "3459f52e", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "061f9980", + "metadata": { + "editable": true + }, + "source": [ + "Here $\\epsilon$ is normally distributed with mean zero and standard\n", + "deviation $\\sigma^2$.\n", + "\n", + "In our derivation of the ordinary least squares method we defined \n", + "an approximation to the function $f$ in terms of the parameters\n", + "$\\boldsymbol{\\beta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n", + "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\beta}$.\n", + "\n", + "The parameters $\\boldsymbol{\\beta}$ are in turn found by optimizing the mean\n", + "squared error via the so-called cost function" + ] + }, + { + "cell_type": "markdown", + "id": "65fe1ada", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "cbea2b54", + "metadata": { + "editable": true + }, + "source": [ + "Here the expected value $\\mathbb{E}$ is the sample value. \n", + "\n", + "Show that you can rewrite this in terms of a term which contains the variance of the model itself (the so-called variance term), a\n", + "term which measures the deviation from the true data and the mean value of the model (the bias term) and finally the variance of the noise.\n", + "That is, show that" + ] + }, + { + "cell_type": "markdown", + "id": "b1d514f5", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "e14d012c", + "metadata": { + "editable": true + }, + "source": [ + "with" + ] + }, + { + "cell_type": "markdown", + "id": "88c42b43", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "035dd127", + "metadata": { + "editable": true + }, + "source": [ + "and" + ] + }, + { + "cell_type": "markdown", + "id": "e594908a", + "metadata": { + "editable": true + }, + "source": [ + "$$\n", + "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "id": "0e6d9419", + "metadata": { + "editable": true + }, + "source": [ + "Explain what the terms mean and discuss their interpretations.\n", + "\n", + "Perform then a bias-variance analysis of a simple one-dimensional (or other models of your choice) function by\n", + "studying the MSE value as function of the complexity of your model. Use ordinary least squares only.\n", + "\n", + "Discuss the bias and variance trade-off as function\n", + "of your model complexity (the degree of the polynomial) and the number\n", + "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n", + "You can follow the code example in the jupyter-book at ." + ] } ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.10" - } - }, + "metadata": {}, "nbformat": 4, "nbformat_minor": 5 } diff --git a/doc/pub/day3/ipynb/ipynb-day3-src.tar.gz b/doc/pub/day3/ipynb/ipynb-day3-src.tar.gz index ef39b82..c27df2b 100644 Binary files a/doc/pub/day3/ipynb/ipynb-day3-src.tar.gz and b/doc/pub/day3/ipynb/ipynb-day3-src.tar.gz differ diff --git a/doc/pub/day3/pdf/day3.pdf b/doc/pub/day3/pdf/day3.pdf index fa1bf66..13be624 100644 Binary files a/doc/pub/day3/pdf/day3.pdf and b/doc/pub/day3/pdf/day3.pdf differ diff --git a/doc/src/Day3/Day3.do.txt b/doc/src/Day3/Day3.do.txt index 9f56e1d..60d8068 100644 --- a/doc/src/Day3/Day3.do.txt +++ b/doc/src/Day3/Day3.do.txt @@ -1,14 +1,14 @@ TITLE: Data Analysis and Machine Learning: Ridge and Lasso Regression and Resampling Methods AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo, Norway & Department of Physics and Astronomy and Facility for Rare Isotope Beams and National Superconducting Cyclotron Laboratory, Michigan State University, USA -DATE: October 15 and 22, 2023 +DATE: October 16 and 23, 2023 !split ===== Plans for Sessions 4-6 ===== * More on Ridge and Lasso Regression * Statistics, probability theory and resampling methods - * "Video of Lecture October 15 to be added":"https://youtu.be/" - * "Video of Lecture October 22 to be added":"https://youtu.be/" + * "Video of Lecture October 16 to be added":"https://youtu.be/iqRKUPJr_bY" + * "Video of Lecture October 23 to be added":"https://youtu.be/" !split @@ -2972,4 +2972,73 @@ and it is easy to see that if the parameter $\lambda$ goes to infinity then the +===== Exercise: Bias-Variance tradeoff ===== + +The aim of the exercises is to derive the equations for the bias-variance tradeoff to be used in project 1 as well as testing this for a simpler function using the bootstrap method. + +Consider a +dataset $\mathcal{L}$ consisting of the data +$\mathbf{X}_\mathcal{L}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}$. + +We assume that the true data is generated from a noisy model + +!bt +\[ +\bm{y}=f(\boldsymbol{x}) + \bm{\epsilon}. +\] +!et + +Here $\epsilon$ is normally distributed with mean zero and standard +deviation $\sigma^2$. + +In our derivation of the ordinary least squares method we defined +an approximation to the function $f$ in terms of the parameters +$\bm{\beta}$ and the design matrix $\bm{X}$ which embody our model, +that is $\bm{\tilde{y}}=\bm{X}\bm{\beta}$. + +The parameters $\bm{\beta}$ are in turn found by optimizing the mean +squared error via the so-called cost function + +!bt +\[ +C(\bm{X},\bm{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]. +\] +!et +Here the expected value $\mathbb{E}$ is the sample value. + +Show that you can rewrite this in terms of a term which contains the variance of the model itself (the so-called variance term), a +term which measures the deviation from the true data and the mean value of the model (the bias term) and finally the variance of the noise. +That is, show that +!bt +\[ +\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2, +\] +!et +with +!bt +\[ +\mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\bm{y}-\mathbb{E}\left[\bm{\tilde{y}}\right]\right)^2\right], +\] +!et +and +!bt +\[ +\mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\bm{y}}-\mathbb{E}\left[\bm{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2. +\] +!et + + + +Explain what the terms mean and discuss their interpretations. + +Perform then a bias-variance analysis of a simple one-dimensional (or other models of your choice) function by +studying the MSE value as function of the complexity of your model. Use ordinary least squares only. + +Discuss the bias and variance trade-off as function +of your model complexity (the degree of the polynomial) and the number +of data points, and possibly also your training and test data using the _bootstrap_ resampling method. +You can follow the code example in the jupyter-book at URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff". + + +