Skip to content

Commit

Permalink
day 2 labs
Browse files Browse the repository at this point in the history
  • Loading branch information
justheuristic committed Jan 17, 2018
1 parent 7a27ec1 commit 20acf8a
Show file tree
Hide file tree
Showing 2 changed files with 866 additions and 0 deletions.
382 changes: 382 additions & 0 deletions 02_lab/lab1_regression_faces.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,382 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Today's data\n",
"\n",
"400 fotos of human faces. Each face is a 2d array [64x64] of pixel brightness."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_olivetti_faces\n",
"data = fetch_olivetti_faces().images"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# @this code showcases matplotlib subplots. The syntax is: plt.subplot(height, width, index_starting_from_1)\n",
"plt.subplot(2,2,1)\n",
"plt.imshow(data[0],cmap='gray')\n",
"plt.subplot(2,2,2)\n",
"plt.imshow(data[1],cmap='gray')\n",
"plt.subplot(2,2,3)\n",
"plt.imshow(data[2],cmap='gray')\n",
"plt.subplot(2,2,4)\n",
"plt.imshow(data[3],cmap='gray')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Face reconstruction problem\n",
"\n",
"Let's solve the face reconstruction problem: given left halves of facex __(X)__, our algorithm shall predict the right half __(y)__. Our first step is to slice the photos into X and y using slices.\n",
"\n",
"* In regular python, slice looks roughly like this: `a[2:5]` _(select elements from 2 to 5)_\n",
"* Numpy allows you to slice N-dimensional arrays along each dimension: [image_index, height, width]\n",
" * `data[:10]` - Select first 10 images\n",
" * `data[:, :10]` - For all images, select a horizontal stripe 10 pixels high at the top of the image\n",
" * `data[10:20, :, -25:-15]` - Take images [10, 11, ..., 19], for each image select a _vetrical stripe_ of width 10 pixels, 15 pixels away from the _right_ side.\n",
"\n",
"\n",
"Let's use slices to select all __left image halves as X__ and all __right halves as y__."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# select left half of each face as X, right half as Y\n",
"X = <Slice left half-images>\n",
"y = <Slice right half-images>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If you did everything right, you're gonna see left half-image and right half-image drawn separately in natural order\n",
"plt.subplot(1,2,1)\n",
"plt.imshow(X[0],cmap='gray')\n",
"plt.subplot(1,2,2)\n",
"plt.imshow(y[0],cmap='gray')\n",
"\n",
"assert X.shape == y.shape == (len(data), 64, 32), \"Please slice exactly the left half-face to X and right half-face to Y\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def glue(left_half,right_half):\n",
" # merge photos back together\n",
" left_half = left_half.reshape([-1,64,32])\n",
" right_half = right_half.reshape([-1,64,32])\n",
" return np.concatenate([left_half,right_half],axis=-1)\n",
"\n",
"\n",
"# if you did everything right, you're gonna see a valid face\n",
"plt.imshow(glue(X,y)[99],cmap='gray')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Machine learning stuff"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train,X_test,Y_train,Y_test = train_test_split(X.reshape([len(X),-1]),\n",
" y.reshape([len(y),-1]),\n",
" test_size=0.05,random_state=42)\n",
"\n",
"print(X_test.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression\n",
"model = LinearRegression()\n",
"model.fit(X_train,Y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"measure mean squared error"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import mean_squared_error\n",
"\n",
"print(\"Train MSE:\", mean_squared_error(Y_train,model.predict(X_train)))\n",
"print(\"Test MSE:\", mean_squared_error(Y_test,model.predict(X_test)))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train predictions\n",
"pics = glue(X_train,model.predict(X_train))\n",
"plt.figure(figsize=[16,12])\n",
"for i in range(20):\n",
" plt.subplot(4,5,i+1)\n",
" plt.imshow(pics[i],cmap='gray')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test predictions\n",
"pics = glue(X_test,model.predict(X_test))\n",
"plt.figure(figsize=[16,12])\n",
"for i in range(20):\n",
" plt.subplot(4,5,i+1)\n",
" plt.imshow(pics[i],cmap='gray')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"\n",
"# Ridge regression\n",
"RidgeRegression is just a LinearRegression, with l2 regularization - penalized for $ \\alpha \\cdot \\sum _i w_i^2$\n",
"\n",
"Let's train such a model with alpha=0.5"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.linear_model import Ridge\n",
"\n",
"ridge = Ridge(alpha=0.5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"<YOUR CODE: fit the model on training set>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"<YOUR CODE: predict and measure MSE on train and test>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Test predictions\n",
"pics = glue(X_test,ridge.predict(X_test))\n",
"plt.figure(figsize=[16,12])\n",
"for i in range(20):\n",
" plt.subplot(4,5,i+1)\n",
" plt.imshow(pics[i],cmap='gray')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"```\n",
"\n",
"# Grid search\n",
"\n",
"Train model with diferent $\\alpha$ and find one that has minimal test MSE. It's okay to use loops or any other python stuff here."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"<YOUR CODE>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Test predictions\n",
"pics = glue(X_test,<predict with your best model>)\n",
"plt.figure(figsize=[16,12])\n",
"for i in range(20):\n",
" plt.subplot(4,5,i+1)\n",
" plt.imshow(pics[i],cmap='gray')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Loading

0 comments on commit 20acf8a

Please sign in to comment.