From 4f84b213592d03591324a3fd61db37ced9064f40 Mon Sep 17 00:00:00 2001 From: SounakPal212 <44639229+SounakPal212@users.noreply.github.com> Date: Wed, 31 Oct 2018 19:26:03 +0530 Subject: [PATCH 1/7] Create Sounak.c --- Sounak.c | 5 +++++ 1 file changed, 5 insertions(+) create mode 100644 Sounak.c diff --git a/Sounak.c b/Sounak.c new file mode 100644 index 0000000..1296fbc --- /dev/null +++ b/Sounak.c @@ -0,0 +1,5 @@ +#include +void main() +{ + cout<<"Hello World"; + } From 137929a6f603a0d2e79b5724e9cf0e8a64ae0cd9 Mon Sep 17 00:00:00 2001 From: Sounak Pal <44639229+SounakPal212@users.noreply.github.com> Date: Fri, 14 Aug 2020 20:37:29 +0530 Subject: [PATCH 2/7] Created using Colaboratory --- Neural_Networks.ipynb | 826 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 826 insertions(+) create mode 100644 Neural_Networks.ipynb diff --git a/Neural_Networks.ipynb b/Neural_Networks.ipynb new file mode 100644 index 0000000..f0ec0cd --- /dev/null +++ b/Neural_Networks.ipynb @@ -0,0 +1,826 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Neural Networks.ipynb", + "provenance": [], + "collapsed_sections": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jqVqT_Cxh4Ho", + "colab_type": "text" + }, + "source": [ + "#Introduction to Neural Networks\n", + "In this notebook you will learn how to create and use a neural network to classify articles of clothing. To achieve this, we will use a sub module of TensorFlow called *keras*.\n", + "\n", + "*This guide is based on the following TensorFlow documentation.*\n", + "\n", + "https://www.tensorflow.org/tutorials/keras/classification\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZFQqW9r-ikJb", + "colab_type": "text" + }, + "source": [ + "##Keras\n", + "Before we dive in and start discussing neural networks, I'd like to give a breif introduction to keras.\n", + "\n", + "From the keras official documentation (https://keras.io/) keras is described as follows.\n", + "\n", + "\"Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. \n", + "\n", + "Use Keras if you need a deep learning library that:\n", + "\n", + "- Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility).\n", + "- Supports both convolutional networks and recurrent networks, as well as combinations of the two.\n", + "- Runs seamlessly on CPU and GPU.\"\n", + "\n", + "Keras is a very powerful module that allows us to avoid having to build neural networks from scratch. It also hides a lot of mathematical complexity (that otherwise we would have to implement) inside of helpful packages, modules and methods.\n", + "\n", + "In this guide we will use keras to quickly develop neural networks.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Hivk879ZQhxU", + "colab_type": "text" + }, + "source": [ + "##What is a Neural Network\n", + "So, what are these magical things that have been beating chess grandmasters, driving cars, detecting cancer cells and winning video games? \n", + "\n", + "A deep neural network is a layered representation of data. The term \"deep\" refers to the presence of multiple layers. Recall that in our core learning algorithms (like linear regression) data was not transformed or modified within the model, it simply existed in one layer. We passed some features to our model, some math was done, an answer was returned. The data was not changed or transformed throughout this process. A neural network processes our data differently. It attempts to represent our data in different ways and in different dimensions by applying specific operations to transform our data at each layer. Another way to express this is that at each layer our data is transformed in order to learn more about it. By performing these transformations, the model can better understand our data and therefore provide a better prediction. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GOqUCZ2klTAq", + "colab_type": "text" + }, + "source": [ + "##How it Works\n", + "Before going into too much detail I will provide a very surface level explination of how neural networks work on a mathematical level. All the terms and concepts I discuss will be defined and explained in more detail below.\n", + "\n", + "On a lower level neural networks are simply a combination of elementry math operations and some more advanced linear algebra. Each neural network consists of a sequence of layers in which data passes through. These layers are made up on neurons and the neurons of one layer are connected to the next (see below). These connections are defined by what we call a weight (some numeric value). Each layer also has something called a bias, this is simply an extra neuron that has no connections and holds a single numeric value. Data starts at the input layer and is trasnformed as it passes through subsequent layers. The data at each subsequent neuron is defined as the following.\n", + "\n", + "> $Y =(\\sum_{i=0}^n w_i x_i) + b$\n", + "\n", + "> $w$ stands for the weight of each connection to the neuron\n", + "\n", + "> $x$ stands for the value of the connected neuron from the previous value\n", + "\n", + "> $b$ stands for the bias at each layer, this is a constant\n", + "\n", + "> $n$ is the number of connections\n", + "\n", + "> $Y$ is the output of the current neuron\n", + "\n", + "> $\\sum$ stands for sum\n", + "\n", + "The equation you just read is called a weighed sum. We will take this weighted sum at each and every neuron as we pass information through the network. Then we will add what's called a bias to this sum. The bias allows us to shift the network up or down by a constant value. It is like the y-intercept of a line.\n", + "\n", + "But that equation is the not complete one! We forgot a crucial part, **the activation function**. This is a function that we apply to the equation seen above to add complexity and dimensionality to our network. Our new equation with the addition of an activation function $F(x)$ is seen below.\n", + "\n", + "> $Y =F((\\sum_{i=0}^n w_i x_i) + b)$\n", + "\n", + "Our network will start with predefined activation functions (they may be different at each layer) but random weights and biases. As we train the network by feeding it data it will learn the correct weights and biases and adjust the network accordingly using a technqiue called **backpropagation** (explained below). Once the correct weights and biases have been learned our network will hopefully be able to give us meaningful predictions. We get these predictions by observing the values at our final layer, the output layer. \n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o-oMh18_j5kl", + "colab_type": "text" + }, + "source": [ + "##Breaking Down The Neural Network!\n", + "\n", + "Before we dive into any code lets break down how a neural network works and what it does.\n", + "\n", + "![alt text](http://www.extremetech.com/wp-content/uploads/2015/07/NeuralNetwork.png)\n", + "*Figure 1*\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-9hd-R1ulSdp", + "colab_type": "text" + }, + "source": [ + "###Data\n", + "The type of data a neural network processes varies drastically based on the problem being solved. When we build a neural network, we define what shape and kind of data it can accept. It may sometimes be neccessary to modify our dataset so that it can be passed to our neural network. \n", + "\n", + "Some common types of data a neural network uses are listed below.\n", + "- Vector Data (2D)\n", + "- Timeseries or Sequence (3D)\n", + "- Image Data (4D)\n", + "- Video Data (5D)\n", + "\n", + "There are of course many different types or data, but these are the main categories.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Xyxxs7oMlWtz", + "colab_type": "text" + }, + "source": [ + "###Layers\n", + "As we mentioned earlier each neural network consists of multiple layers. At each layer a different transformation of data occurs. Our initial input data is fed through the layers and eventually arrives at the output layer where we will obtain the result.\n", + "####Input Layer\n", + "The input layer is the layer that our initial data is passed to. It is the first layer in our neural network.\n", + "####Output Layer\n", + "The output layer is the layer that we will retrive our results from. Once the data has passed through all other layers it will arrive here.\n", + "####Hidden Layer(s)\n", + "All the other layers in our neural network are called \"hidden layers\". This is because they are hidden to us, we cannot observe them. Most neural networks consist of at least one hidden layer but can have an unlimited amount. Typically, the more complex the model the more hidden layers.\n", + "####Neurons\n", + "Each layer is made up of what are called neurons. Neurons have a few different properties that we will discuss later. The important aspect to understand now is that each neuron is responsible for generating/holding/passing ONE numeric value. \n", + "\n", + "This means that in the case of our input layer it will have as many neurons as we have input information. For example, say we want to pass an image that is 28x28 pixels, thats 784 pixels. We would need 784 neurons in our input layer to capture each of these pixels. \n", + "\n", + "This also means that our output layer will have as many neurons as we have output information. The output is a little more complicated to understand so I'll refrain from an example right now but hopefully you're getting the idea.\n", + "\n", + "But what about our hidden layers? Well these have as many neurons as we decide. We'll discuss how we can pick these values later but understand a hidden layer can have any number of neurons.\n", + "####Connected Layers\n", + "So how are all these layers connected? Well the neurons in one layer will be connected to neurons in the subsequent layer. However, the neurons can be connected in a variety of different ways. \n", + "\n", + "Take for example *Figure 1* (look above). Each neuron in one layer is connected to every neuron in the next layer. This is called a **dense** layer. There are many other ways of connecting layers but well discuss those as we see them. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "a_bM6nQ-PZBY", + "colab_type": "text" + }, + "source": [ + "###Weights\n", + "Weights are associated with each connection in our neural network. Every pair of connected nodes will have one weight that denotes the strength of the connection between them. These are vital to the inner workings of a neural network and will be tweaked as the neural network is trained. The model will try to determine what these weights should be to achieve the best result. Weights start out at a constant or random value and will change as the network sees training data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XwYq9doXeIl-", + "colab_type": "text" + }, + "source": [ + "###Biases\n", + "Biases are another important part of neural networks and will also be tweaked as the model is trained. A bias is simply a constant value associated with each layer. It can be thought of as an extra neuron that has no connections. The purpose of a bias is to shift an entire activation function by a constant value. This allows a lot more flexibllity when it comes to choosing an activation and training the network. There is one bias for each layer." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F92rhvd6PcRI", + "colab_type": "text" + }, + "source": [ + "###Activation Function\n", + "Activation functions are simply a function that is applied to the weighed sum of a neuron. They can be anything we want but are typically higher order/degree functions that aim to add a higher dimension to our data. We would want to do this to introduce more comolexity to our model. By transforming our data to a higher dimension, we can typically make better, more complex predictions.\n", + "\n", + "A list of some common activation functions and their graphs can be seen below.\n", + "\n", + "- Relu (Rectified Linear Unit)\n", + "\n", + "![alt text](https://yashuseth.files.wordpress.com/2018/02/relu-function.png?w=309&h=274)\n", + "- Tanh (Hyperbolic Tangent)\n", + "\n", + "![alt text](http://mathworld.wolfram.com/images/interactive/TanhReal.gif)\n", + "- Sigmoid \n", + "\n", + "![alt text](https://miro.medium.com/max/970/1*Xu7B5y9gp0iL5ooBj7LtWw.png)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Q2xNjpctlBUM", + "colab_type": "text" + }, + "source": [ + "###Backpropagation\n", + "Backpropagation is the fundemental algorithm behind training neural networks. It is what changes the weights and biases of our network. To fully explain this process, we need to start by discussing something called a cost/loss function.\n", + "\n", + "####Loss/Cost Function\n", + "As we now know our neural network feeds information through the layers until it eventually reaches an output layer. This layer contains the results that we look at to determine the prediciton from our network. In the training phase it is likely that our network will make many mistakes and poor predicitions. In fact, at the start of training our network doesn't know anything (it has random weights and biases)! \n", + "\n", + "We need some way of evaluating if the network is doing well and how well it is doing. For our training data we have the features (input) and the labels (expected output), because of this we can compare the output from our network to the expected output. Based on the difference between these values we can determine if our network has done a good job or poor job. If the network has done a good job, we'll make minor changes to the weights and biases. If it has done a poor job our changes may be more drastic.\n", + "\n", + "So, this is where the cost/loss function comes in. This function is responsible for determining how well the network did. We pass it the output and the expected output, and it returns to us some value representing the cost/loss of the network. This effectively makes the networks job to optimize this cost function, trying to make it as low as possible. \n", + "\n", + "Some common loss/cost functions include.\n", + "- Mean Squared Error\n", + "- Mean Absolute Error\n", + "- Hinge Loss\n", + "\n", + "####Gradient Descent\n", + "Gradient descent and backpropagation are closely related. Gradient descent is the algorithm used to find the optimal paramaters (weights and biases) for our network, while backpropagation is the process of calculating the gradient that is used in the gradient descent step. \n", + "\n", + "Gradient descent requires some pretty advanced calculus and linear algebra to understand so we'll stay away from that for now. Let's just read the formal definition for now.\n", + "\n", + "\"Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters of our model.\" (https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html)\n", + "\n", + "And that's all we really need to know for now. I'll direct you to the video for a more in depth explination.\n", + "\n", + "![alt text](https://cdn-images-1.medium.com/max/1000/1*iU1QCnSTKrDjIPjSAENLuQ.png)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0KiTMDCKlBI7", + "colab_type": "text" + }, + "source": [ + "###Optimizer\n", + "You may sometimes see the term optimizer or optimization function. This is simply the function that implements the backpropagation algorithm described above. Here's a list of a few common ones.\n", + "- Gradient Descent\n", + "- Stochastic Gradient Descent\n", + "- Mini-Batch Gradient Descent\n", + "- Momentum\n", + "- Nesterov Accelerated Gradient\n", + "\n", + "*This article explains them quite well is where I've pulled this list from.*\n", + "\n", + "(https://medium.com/@sdoshi579/optimizers-for-training-neural-network-59450d71caf6)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Kc5hFCLSiDNr", + "colab_type": "text" + }, + "source": [ + "##Creating a Neural Network\n", + "Okay now you have reached the exciting part of this tutorial! No more math and complex explinations. Time to get hands on and train a very basic neural network.\n", + "\n", + "*As stated earlier this guide is based off of the following TensorFlow tutorial.*\n", + "https://www.tensorflow.org/tutorials/keras/classification\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3io6gbUrjOQY", + "colab_type": "text" + }, + "source": [ + "###Imports" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "y8t_EdO8jEHz", + "colab_type": "code", + "colab": {} + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook\n", + "# TensorFlow and tf.keras\n", + "import tensorflow as tf\n", + "from tensorflow import keras\n", + "\n", + "# Helper libraries\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "p_iFN10li6V1", + "colab_type": "text" + }, + "source": [ + "###Dataset\n", + "For this tutorial we will use the MNIST Fashion Dataset. This is a dataset that is included in keras.\n", + "\n", + "This dataset includes 60,000 images for training and 10,000 images for validation/testing." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eQmVmgOxjCOV", + "colab_type": "code", + "colab": {} + }, + "source": [ + "fashion_mnist = keras.datasets.fashion_mnist # load dataset\n", + "\n", + "(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() # split into tetsing and training" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AcIall2njfn1", + "colab_type": "text" + }, + "source": [ + "Let's have a look at this data to see what we are working with." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WhLXRxOdjisI", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train_images.shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D2npdFHwjsLS", + "colab_type": "text" + }, + "source": [ + "So we've got 60,000 images that are made up of 28x28 pixels (784 in total)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "m280zyPqj3ws", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train_images[0,23,23] # let's have a look at one pixel" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GUciblEwkBe4", + "colab_type": "text" + }, + "source": [ + "Our pixel values are between 0 and 255, 0 being black and 255 being white. This means we have a grayscale image as there are no color channels." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Rn78KO7fkQPJ", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train_labels[:10] # let's have a look at the first 10 training labels" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r90qZKsnkaW7", + "colab_type": "text" + }, + "source": [ + "Our labels are integers ranging from 0 - 9. Each integer represents a specific article of clothing. We'll create an array of label names to indicate which is which." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pBiICD2tkne8", + "colab_type": "code", + "colab": {} + }, + "source": [ + "class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n", + " 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4rv06eD8krMR", + "colab_type": "text" + }, + "source": [ + "Fianlly let's look at what some of these images look like!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Nfc8LV4Pkq0X", + "colab_type": "code", + "colab": {} + }, + "source": [ + "plt.figure()\n", + "plt.imshow(train_images[1])\n", + "plt.colorbar()\n", + "plt.grid(False)\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n_DC1b0grL1N", + "colab_type": "text" + }, + "source": [ + "##Data Preprocessing\n", + "The last step before creating our model is to *preprocess* our data. This simply means applying some prior transformations to our data before feeding it the model. In this case we will simply scale all our greyscale pixel values (0-255) to be between 0 and 1. We can do this by dividing each value in the training and testing sets by 255.0. We do this because smaller values will make it easier for the model to process our values. \n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "wHde8MYW0OQo", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train_images = train_images / 255.0\n", + "\n", + "test_images = test_images / 255.0" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dHOX6GqR0QuD", + "colab_type": "text" + }, + "source": [ + "##Building the Model\n", + "Now it's time to build the model! We are going to use a keras *sequential* model with three different layers. This model represents a feed-forward neural network (one that passes values from left to right). We'll break down each layer and its architecture below." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XDxodHMv0xgG", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model = keras.Sequential([\n", + " keras.layers.Flatten(input_shape=(28, 28)), # input layer (1)\n", + " keras.layers.Dense(128, activation='relu'), # hidden layer (2)\n", + " keras.layers.Dense(10, activation='softmax') # output layer (3)\n", + "])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c-bL-I5w0414", + "colab_type": "text" + }, + "source": [ + "**Layer 1:** This is our input layer and it will conist of 784 neurons. We use the flatten layer with an input shape of (28,28) to denote that our input should come in in that shape. The flatten means that our layer will reshape the shape (28,28) array into a vector of 784 neurons so that each pixel will be associated with one neuron.\n", + "\n", + "**Layer 2:** This is our first and only hidden layer. The *dense* denotes that this layer will be fully connected and each neuron from the previous layer connects to each neuron of this layer. It has 128 neurons and uses the rectify linear unit activation function.\n", + "\n", + "**Layer 3:** This is our output later and is also a dense layer. It has 10 neurons that we will look at to determine our models output. Each neuron represnts the probabillity of a given image being one of the 10 different classes. The activation function *softmax* is used on this layer to calculate a probabillity distribution for each class. This means the value of any neuron in this layer will be between 0 and 1, where 1 represents a high probabillity of the image being that class." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-j1UF9QH21Ex", + "colab_type": "text" + }, + "source": [ + "###Compile the Model\n", + "The last step in building the model is to define the loss function, optimizer and metrics we would like to track. I won't go into detail about why we chose each of these right now." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Msigq4Ja29QX", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.compile(optimizer='adam',\n", + " loss='sparse_categorical_crossentropy',\n", + " metrics=['accuracy'])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7YYW5V_53OXV", + "colab_type": "text" + }, + "source": [ + "##Training the Model\n", + "Now it's finally time to train the model. Since we've already done all the work on our data this step is as easy as calling a single method." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "XmAtc4uI3_C7", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.fit(train_images, train_labels, epochs=10) # we pass the data, labels and epochs and watch the magic!" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y6SRtNcF4K1O", + "colab_type": "text" + }, + "source": [ + "##Evaluating the Model\n", + "Now it's time to test/evaluate the model. We can do this quite easily using another builtin method from keras.\n", + "\n", + "The *verbose* argument is defined from the keras documentation as:\n", + "\"verbose: 0 or 1. Verbosity mode. 0 = silent, 1 = progress bar.\"\n", + "(https://keras.io/models/sequential/)" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WqI0FEO54XN1", + "colab_type": "code", + "colab": {} + }, + "source": [ + "test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=1) \n", + "\n", + "print('Test accuracy:', test_acc)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nb4_EtfK5DuW", + "colab_type": "text" + }, + "source": [ + "You'll likely notice that the accuracy here is lower than when training the model. This difference is reffered to as **overfitting**.\n", + "\n", + "And now we have a trained model that's ready to use to predict some values!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Pv0XpgwJ7GlW", + "colab_type": "text" + }, + "source": [ + "##Making Predictions\n", + "To make predictions we simply need to pass an array of data in the form we've specified in the input layer to ```.predict()``` method." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BMAkNWii7Ufj", + "colab_type": "code", + "colab": {} + }, + "source": [ + "predictions = model.predict(test_images)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LmRgxuEc7Xjc", + "colab_type": "text" + }, + "source": [ + "This method returns to us an array of predictions for each image we passed it. Let's have a look at the predictions for image 1." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "4y2eQtCr7fnd", + "colab_type": "code", + "colab": {} + }, + "source": [ + "predictions[0]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eiRNg9Yr7lCt", + "colab_type": "text" + }, + "source": [ + "If we wan't to get the value with the highest score we can use a useful function from numpy called ```argmax()```. This simply returns the index of the maximium value from a numpy array. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "NaagMfi671ci", + "colab_type": "code", + "colab": {} + }, + "source": [ + "np.argmax(predictions[0])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aWY4SKYm8h93", + "colab_type": "text" + }, + "source": [ + "And we can check if this is correct by looking at the value of the cooresponding test label." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xVNepduo8nEy", + "colab_type": "code", + "colab": {} + }, + "source": [ + "test_labels[0]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y8I1EqJu8qRl", + "colab_type": "text" + }, + "source": [ + "##Verifying Predictions\n", + "I've written a small function here to help us verify predictions with some simple visuals." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-HJV4JF789aC", + "colab_type": "code", + "colab": {} + }, + "source": [ + "COLOR = 'white'\n", + "plt.rcParams['text.color'] = COLOR\n", + "plt.rcParams['axes.labelcolor'] = COLOR\n", + "\n", + "def predict(model, image, correct_label):\n", + " class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n", + " 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']\n", + " prediction = model.predict(np.array([image]))\n", + " predicted_class = class_names[np.argmax(prediction)]\n", + "\n", + " show_image(image, class_names[correct_label], predicted_class)\n", + "\n", + "\n", + "def show_image(img, label, guess):\n", + " plt.figure()\n", + " plt.imshow(img, cmap=plt.cm.binary)\n", + " plt.title(\"Excpected: \" + label)\n", + " plt.xlabel(\"Guess: \" + guess)\n", + " plt.colorbar()\n", + " plt.grid(False)\n", + " plt.show()\n", + "\n", + "\n", + "def get_number():\n", + " while True:\n", + " num = input(\"Pick a number: \")\n", + " if num.isdigit():\n", + " num = int(num)\n", + " if 0 <= num <= 1000:\n", + " return int(num)\n", + " else:\n", + " print(\"Try again...\")\n", + "\n", + "num = get_number()\n", + "image = test_images[num]\n", + "label = test_labels[num]\n", + "predict(model, image, label)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1HRzP5hCAijM", + "colab_type": "text" + }, + "source": [ + "And that's pretty much it for an introduction to neural networks!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PmbcLZZ0lo_2", + "colab_type": "text" + }, + "source": [ + "##Sources\n", + "\n", + "1. Doshi, Sanket. “Various Optimization Algorithms For Training Neural Network.” Medium, Medium, 10 Mar. 2019, www.medium.com/@sdoshi579/optimizers-for-training-neural-network-59450d71caf6.\n", + "\n", + "2. “Basic Classification: Classify Images of Clothing  :   TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/keras/classification.\n", + "\n", + "3. “Gradient Descent¶.” Gradient Descent - ML Glossary Documentation, www.ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html.\n", + "\n", + "4. Chollet François. Deep Learning with Python. Manning Publications Co., 2018.\n", + "\n", + "5. “Keras: The Python Deep Learning Library.” Home - Keras Documentation, www.keras.io/." + ] + } + ] +} \ No newline at end of file From 53c25807ee1e78030438a44b90d0f017e71558dc Mon Sep 17 00:00:00 2001 From: Sounak Pal <44639229+SounakPal212@users.noreply.github.com> Date: Fri, 14 Aug 2020 20:37:47 +0530 Subject: [PATCH 3/7] Created using Colaboratory --- Computer_Vision.ipynb | 1142 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1142 insertions(+) create mode 100644 Computer_Vision.ipynb diff --git a/Computer_Vision.ipynb b/Computer_Vision.ipynb new file mode 100644 index 0000000..4c379b4 --- /dev/null +++ b/Computer_Vision.ipynb @@ -0,0 +1,1142 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Computer Vision.ipynb", + "provenance": [], + "collapsed_sections": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "J_o2L3Io9t4c", + "colab_type": "text" + }, + "source": [ + "#Deep Computer Vision\n", + "\n", + "In this guide we will learn how to peform *image classification and object detection/recognition* using deep computer vision with something called a **convolutional neural network**.\n", + "\n", + "The goal of our convolutional neural networks will be to classify and detect images or specific objects from within the image. We will be using image data as our features and a label for those images as our label or output.\n", + "\n", + "We already know how neural networks work so we can skip through the basics and move right into explaining the following concepts.\n", + "- Image Data\n", + "- Convolutional Layer\n", + "- Pooling Layer\n", + "- CNN Architectures\n", + "\n", + "The major differences we are about to see in these types of neural networks are the layers that make them up." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tdqlqfhLCHZl", + "colab_type": "text" + }, + "source": [ + "##Image Data\n", + "So far, we have dealt with pretty straight forward data that has 1 or 2 dimensions. Now we are about to deal with image data that is usually made up of 3 dimensions. These 3 dimensions are as follows:\n", + "- image height\n", + "- image width\n", + "- color channels\n", + "\n", + "The only item in the list above you may not understand is **color channels**. The number of color channels represents the depth of an image and coorelates to the colors used in it. For example, an image with three channels is likely made up of rgb (red, green, blue) pixels. So, for each pixel we have three numeric values in the range 0-255 that define its color. For an image of color depth 1 we would likely have a greyscale image with one value defining each pixel, again in the range of 0-255.\n", + "\n", + "![alt text](http://xrds.acm.org/blog/wp-content/uploads/2016/06/Figure1.png)\n", + "\n", + "Keep this in mind as we discuss how our network works and the input/output of each layer. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9mqznmTh--v2", + "colab_type": "text" + }, + "source": [ + "##Convolutional Neural Network\n", + "**Note:** I will use the term *convnet* and convolutional neural network interchangably.\n", + "\n", + "Each convolutional neural network is made up of one or many convolutional layers. These layers are different than the *dense* layers we have seen previously. Their goal is to find patterns from within images that can be used to classify the image or parts of it. But this may sound familiar to what our densly connected neural network in the previous section was doing, well that's becasue it is. \n", + "\n", + "The fundemental difference between a dense layer and a convolutional layer is that dense layers detect patterns globally while convolutional layers detect patterns locally. When we have a densly connected layer each node in that layer sees all the data from the previous layer. This means that this layer is looking at all the information and is only capable of analyzing the data in a global capacity. Our convolutional layer however will not be densly connected, this means it can detect local patterns using part of the input data to that layer.\n", + "\n", + "*Let's have a look at how a densly connected layer would look at an image vs how a convolutional layer would.*\n", + "\n", + "This is our image; the goal of our network will be to determine whether this image is a cat or not.\n", + "![alt text](https://img.webmd.com/dtmcms/live/webmd/consumer_assets/site_images/article_thumbnails/reference_guide/cat_weight_ref_guide/1800x1200_cat_weight_ref_guide.jpg)\n", + "\n", + "**Dense Layer:** A dense layer will consider the ENTIRE image. It will look at all the pixels and use that information to generate some output.\n", + "\n", + "**Convolutional Layer:** The convolutional layer will look at specific parts of the image. In this example let's say it analyzes the highlighted parts below and detects patterns there.\n", + "![alt text](https://drive.google.com/uc?export=view&id=1M7v7S-b-zisFLI_G4ZY_RdUJQrGpJ3zt)\n", + "\n", + "Can you see why this might make these networks more useful?\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CIQvxFu_FB3h", + "colab_type": "text" + }, + "source": [ + "###How They Work\n", + "A dense neural network learns patterns that are present in one specific area of an image. This means if a pattern that the network knows is present in a different area of the image it will have to learn the pattern again in that new area to be able to detect it. \n", + "\n", + "*Let's use an example to better illustrate this.*\n", + "\n", + "We'll consider that we have a dense neural network that has learned what an eye looks like from a sample of dog images.\n", + "\n", + "![alt text](https://drive.google.com/uc?export=view&id=16FJKkVS_lZToQOCOOy6ohUpspWgtoQ-c)\n", + "\n", + "Let's say it's determined that an image is likely to be a dog if an eye is present in the boxed off locations of the image above.\n", + "\n", + "Now let's flip the image.\n", + "![alt text](https://drive.google.com/uc?export=view&id=1V7Dh7BiaOvMq5Pm_jzpQfJTZcpPNmN0W)\n", + "\n", + "Since our densly connected network has only recognized patterns globally it will look where it thinks the eyes should be present. Clearly it does not find them there and therefore would likely determine this image is not a dog. Even though the pattern of the eyes is present, it's just in a different location.\n", + "\n", + "Since convolutional layers learn and detect patterns from different areas of the image, they don't have problems with the example we just illustrated. They know what an eye looks like and by analyzing different parts of the image can find where it is present. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "20J29gz-NroA", + "colab_type": "text" + }, + "source": [ + "###Multiple Convolutional Layers\n", + "In our models it is quite common to have more than one convolutional layer. Even the basic example we will use in this guide will be made up of 3 convolutional layers. These layers work together by increasing complexity and abstraction at each subsequent layer. The first layer might be responsible for picking up edges and short lines, while the second layer will take as input these lines and start forming shapes or polygons. Finally, the last layer might take these shapes and determine which combiantions make up a specific image.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ii-a9rXzRwNi", + "colab_type": "text" + }, + "source": [ + "##Feature Maps\n", + "You may see me use the term *feature map* throughout this tutorial. This term simply stands for a 3D tensor with two spacial axes (width and height) and one depth axis. Our convolutional layers take feature maps as their input and return a new feature map that reprsents the prescence of spcific filters from the previous feature map. These are what we call *response maps*." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OScABB-ScXHx", + "colab_type": "text" + }, + "source": [ + "##Layer Parameters\n", + "A convolutional layer is defined by two key parameters.\n", + "\n", + "####**Filters**\n", + "A filter is a m x n pattern of pixels that we are looking for in an image. The number of filters in a convolutional layer reprsents how many patterns each layer is looking for and what the depth of our response map will be. If we are looking for 32 different patterns/filters than our output feature map (aka the response map) will have a depth of 32. Each one of the 32 layers of depth will be a matrix of some size containing values indicating if the filter was present at that location or not.\n", + "\n", + "Here's a great illustration from the book \"Deep Learning with Python\" by Francois Chollet (pg 124).\n", + "![alt text](https://drive.google.com/uc?export=view&id=1HcLvvLKvLCCGuGZPMvKYz437FbbCC2eB)\n", + "\n", + "####**Sample Size**\n", + "This isn't really the best term to describe this, but each convolutional layer is going to examine n x m blocks of pixels in each image. Typically, we'll consider 3x3 or 5x5 blocks. In the example above we use a 3x3 \"sample size\". This size will be the same as the size of our filter. \n", + "\n", + "Our layers work by sliding these filters of n x m pixels over every possible position in our image and populating a new feature map/response map indicating whether the filter is present at each location. \n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vnzqr8Dzjchd", + "colab_type": "text" + }, + "source": [ + "##Borders and Padding\n", + "The more mathematical of you may have realized that if we slide a filter of let's say size 3x3 over our image well consider less positions for our filter than pixels in our input. Look at the example below. \n", + "\n", + "*Image from \"Deep Learning with Python\" by Francois Chollet (pg 126).*\n", + "![alt text](https://drive.google.com/uc?export=view&id=1OEfXrV16NBjwAafgBfYYcWOyBCHqaZ5M)\n", + "\n", + "This means our response map will have a slightly smaller width and height than our original image. This is fine but sometimes we want our response map to have the same dimensions. We can accomplish this by using something called *padding*.\n", + "\n", + "**Padding** is simply the addition of the appropriate number of rows and/or columns to your input data such that each pixel can be centered by the filter." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "yDwH2eOMmt_N", + "colab_type": "text" + }, + "source": [ + "##Strides\n", + "In the previous sections we assumed that the filters would be slid continously through the image such that it covered every possible position. This is common but sometimes we introduce the idea of a **stride** to our convolutional layer. The stride size reprsents how many rows/cols we will move the filter each time. These are not used very frequently so we'll move on." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nCsVC-4UnfC8", + "colab_type": "text" + }, + "source": [ + "##Pooling\n", + "You may recall that our convnets are made up of a stack of convolution and pooling layers.\n", + "\n", + "The idea behind a pooling layer is to downsample our feature maps and reduce their dimensions. They work in a similar way to convolutional layers where they extract windows from the feature map and return a response map of the max, min or average values of each channel. Pooling is usually done using windows of size 2x2 and a stride of 2. This will reduce the size of the feature map by a factor of two and return a response map that is 2x smaller." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9qo85O0LsxbB", + "colab_type": "text" + }, + "source": [ + "##A More Detailed Look\n", + "Please refer to the video to learn how all of this happens at the lower level! " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xqLsm2XzNQSE", + "colab_type": "text" + }, + "source": [ + "##Creating a Convnet\n", + "\n", + "Now it is time to create our first convnet! This example is for the purpose of getting familiar with CNN architectures, we will talk about how to improves its performance later.\n", + "\n", + "*This tutorial is based on the following guide from the TensorFlow documentation: https://www.tensorflow.org/tutorials/images/cnn*\n", + "\n", + "###Dataset\n", + "The problem we will consider here is classifying 10 different everyday objects. The dataset we will use is built into tensorflow and called the [**CIFAR Image Dataset.**](https://www.cs.toronto.edu/~kriz/cifar.html) It contains 60,000 32x32 color images with 6000 images of each class. \n", + "\n", + "The labels in this dataset are the following:\n", + "- Airplane\n", + "- Automobile\n", + "- Bird\n", + "- Cat\n", + "- Deer\n", + "- Dog\n", + "- Frog\n", + "- Horse\n", + "- Ship\n", + "- Truck\n", + "\n", + "We'll load the dataset and have a look at some of the images below.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bnIbwiK7Ohv2", + "colab_type": "code", + "colab": {} + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook\n", + "import tensorflow as tf\n", + "\n", + "from tensorflow.keras import datasets, layers, models\n", + "import matplotlib.pyplot as plt" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "49wbEaM1PCCR", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# LOAD AND SPLIT DATASET\n", + "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n", + "\n", + "# Normalize pixel values to be between 0 and 1\n", + "train_images, test_images = train_images / 255.0, test_images / 255.0\n", + "\n", + "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n", + " 'dog', 'frog', 'horse', 'ship', 'truck']" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Bp0yAAcuPHFN", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Let's look at a one image\n", + "IMG_INDEX = 7 # change this to look at other images\n", + "\n", + "plt.imshow(train_images[IMG_INDEX] ,cmap=plt.cm.binary)\n", + "plt.xlabel(class_names[train_labels[IMG_INDEX][0]])\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aPqeddhcPwpc", + "colab_type": "text" + }, + "source": [ + "##CNN Architecture\n", + "A common architecture for a CNN is a stack of Conv2D and MaxPooling2D layers followed by a few denesly connected layers. To idea is that the stack of convolutional and maxPooling layers extract the features from the image. Then these features are flattened and fed to densly connected layers that determine the class of an image based on the presence of features.\n", + "\n", + "We will start by building the **Convolutional Base**." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ibuJZqAXQrWJ", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model = models.Sequential()\n", + "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", + "model.add(layers.MaxPooling2D((2, 2)))\n", + "model.add(layers.Conv2D(64, (3, 3), activation='relu'))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tybTBoi_Qtxl", + "colab_type": "text" + }, + "source": [ + "**Layer 1**\n", + "\n", + "The input shape of our data will be 32, 32, 3 and we will process 32 filters of size 3x3 over our input data. We will also apply the activation function relu to the output of each convolution operation.\n", + "\n", + "**Layer 2**\n", + "\n", + "This layer will perform the max pooling operation using 2x2 samples and a stride of 2.\n", + "\n", + "**Other Layers**\n", + "\n", + "The next set of layers do very similar things but take as input the feature map from the previous layer. They also increase the frequency of filters from 32 to 64. We can do this as our data shrinks in spacial dimensions as it passed through the layers, meaning we can afford (computationally) to add more depth." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_QahwuduSEDG", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.summary() # let's have a look at our model so far" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZXw-sreaSzTW", + "colab_type": "text" + }, + "source": [ + "After looking at the summary you should notice that the depth of our image increases but the spacial dimensions reduce drastically." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zjtADcfmSI9q", + "colab_type": "text" + }, + "source": [ + "##Adding Dense Layers\n", + "So far, we have just completed the **convolutional base**. Now we need to take these extracted features and add a way to classify them. This is why we add the following layers to our model.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "A9TMZH_oSULo", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.add(layers.Flatten())\n", + "model.add(layers.Dense(64, activation='relu'))\n", + "model.add(layers.Dense(10))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "fEzHX-7ESeCl", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.summary()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dxfqtdDbSf4W", + "colab_type": "text" + }, + "source": [ + "We can see that the flatten layer changes the shape of our data so that we can feed it to the 64-node dense layer, follwed by the final output layer of 10 neurons (one for each class).\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wdPxFvHdTLRK", + "colab_type": "text" + }, + "source": [ + "##Training\n", + "Now we will train and compile the model using the recommended hyper paramaters from tensorflow.\n", + "\n", + "*Note: This will take much longer than previous models!*" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5loIug93TW1E", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.compile(optimizer='adam',\n", + " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", + " metrics=['accuracy'])\n", + "\n", + "history = model.fit(train_images, train_labels, epochs=4, \n", + " validation_data=(test_images, test_labels))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JkdRKQnETgLv", + "colab_type": "text" + }, + "source": [ + "##Evaluating the Model\n", + "We can determine how well the model performed by looking at it's performance on the test data set." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6I2vJFiiTkQE", + "colab_type": "code", + "colab": {} + }, + "source": [ + "test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)\n", + "print(test_acc)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-lKwDlvvUbIm", + "colab_type": "text" + }, + "source": [ + "You should be getting an accuracy of about 70%. This isn't bad for a simple model like this, but we'll dive into some better approaches for computer vision below.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cstpZFVaY7YH", + "colab_type": "text" + }, + "source": [ + "##Working with Small Datasets\n", + "In the situation where you don't have millions of images it is difficult to train a CNN from scratch that performs very well. This is why we will learn about a few techniques we can use to train CNN's on small datasets of just a few thousand images. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8D4iWJ17ZRt_", + "colab_type": "text" + }, + "source": [ + "###Data Augmentation\n", + "To avoid overfitting and create a larger dataset from a smaller one we can use a technique called data augmentation. This is simply performing random transofrmations on our images so that our model can generalize better. These transformations can be things like compressions, rotations, stretches and even color changes. \n", + "\n", + "Fortunately, keras can help us do this. Look at the code below to an example of data augmentation.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "_sOet0hQZ-gR", + "colab_type": "code", + "colab": {} + }, + "source": [ + "from keras.preprocessing import image\n", + "from keras.preprocessing.image import ImageDataGenerator\n", + "\n", + "# creates a data generator object that transforms images\n", + "datagen = ImageDataGenerator(\n", + "rotation_range=40,\n", + "width_shift_range=0.2,\n", + "height_shift_range=0.2,\n", + "shear_range=0.2,\n", + "zoom_range=0.2,\n", + "horizontal_flip=True,\n", + "fill_mode='nearest')\n", + "\n", + "# pick an image to transform\n", + "test_img = train_images[20]\n", + "img = image.img_to_array(test_img) # convert image to numpy arry\n", + "img = img.reshape((1,) + img.shape) # reshape image\n", + "\n", + "i = 0\n", + "\n", + "for batch in datagen.flow(img, save_prefix='test', save_format='jpeg'): # this loops runs forever until we break, saving images to current directory with specified prefix\n", + " plt.figure(i)\n", + " plot = plt.imshow(image.img_to_array(batch[0]))\n", + " i += 1\n", + " if i > 4: # show 4 images\n", + " break\n", + "\n", + "plt.show()\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nc9RyHPYUnSK", + "colab_type": "text" + }, + "source": [ + "###Pretrained Models\n", + "You would have noticed that the model above takes a few minutes to train in the NoteBook and only gives an accuaracy of ~70%. This is okay but surely there is a way to improve on this. \n", + "\n", + "In this section we will talk about using a pretrained CNN as apart of our own custom network to improve the accuracy of our model. We know that CNN's alone (with no dense layers) don't do anything other than map the presence of features from our input. This means we can use a pretrained CNN, one trained on millions of images, as the start of our model. This will allow us to have a very good convolutional base before adding our own dense layered classifier at the end. In fact, by using this techique we can train a very good classifier for a realtively small dataset (< 10,000 images). This is because the convnet already has a very good idea of what features to look for in an image and can find them very effectively. So, if we can determine the presence of features all the rest of the model needs to do is determine which combination of features makes a specific image.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u10oZO1oXT6Y", + "colab_type": "text" + }, + "source": [ + "###Fine Tuning\n", + "When we employ the technique defined above, we will often want to tweak the final layers in our convolutional base to work better for our specific problem. This involves not touching or retraining the earlier layers in our convolutional base but only adjusting the final few. We do this because the first layers in our base are very good at extracting low level features lile lines and edges, things that are similar for any kind of image. Where the later layers are better at picking up very specific features like shapes or even eyes. If we adjust the final layers than we can look for only features relevant to our very specific problem.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XolyariNdj5p", + "colab_type": "text" + }, + "source": [ + "##Using a Pretrained Model\n", + "In this section we will combine the tecniques we learned above and use a pretrained model and fine tuning to classify images of dogs and cats using a small dataset.\n", + "\n", + "*This tutorial is based on the following guide from the TensorFlow documentation: https://www.tensorflow.org/tutorials/images/transfer_learning*\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2nRe9qWmgxm7", + "colab_type": "code", + "colab": {} + }, + "source": [ + "#Imports\n", + "import os\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "import tensorflow as tf\n", + "keras = tf.keras" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lUx4I_4jg2Tc", + "colab_type": "text" + }, + "source": [ + "###Dataset\n", + "We will load the *cats_vs_dogs* dataset from the modoule tensorflow_datatsets.\n", + "\n", + "This dataset contains (image, label) pairs where images have different dimensions and 3 color channels.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PuGu50NlgreO", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import tensorflow_datasets as tfds\n", + "tfds.disable_progress_bar()\n", + "\n", + "# split the data manually into 80% training, 10% testing, 10% validation\n", + "(raw_train, raw_validation, raw_test), metadata = tfds.load(\n", + " 'cats_vs_dogs',\n", + " split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],\n", + " with_info=True,\n", + " as_supervised=True,\n", + ")" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Mk_MpiQyh-as", + "colab_type": "code", + "colab": {} + }, + "source": [ + "get_label_name = metadata.features['label'].int2str # creates a function object that we can use to get labels\n", + "\n", + "# display 2 images from the dataset\n", + "for image, label in raw_train.take(5):\n", + " plt.figure()\n", + " plt.imshow(image)\n", + " plt.title(get_label_name(label))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XCdodmcYiPOF", + "colab_type": "text" + }, + "source": [ + "###Data Preprocessing\n", + "Since the sizes of our images are all different, we need to convert them all to the same size. We can create a function that will do that for us below.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tcoKn1VUieqx", + "colab_type": "code", + "colab": {} + }, + "source": [ + "IMG_SIZE = 160 # All images will be resized to 160x160\n", + "\n", + "def format_example(image, label):\n", + " \"\"\"\n", + " returns an image that is reshaped to IMG_SIZE\n", + " \"\"\"\n", + " image = tf.cast(image, tf.float32)\n", + " image = (image/127.5) - 1\n", + " image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))\n", + " return image, label" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wwIB21lailXh", + "colab_type": "text" + }, + "source": [ + "Now we can apply this function to all our images using ```.map()```." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "0E8iqYOAipdU", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train = raw_train.map(format_example)\n", + "validation = raw_validation.map(format_example)\n", + "test = raw_test.map(format_example)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QORLTVNaiqym", + "colab_type": "text" + }, + "source": [ + "Let's have a look at our images now." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "dU5JIa2Jiv9U", + "colab_type": "code", + "colab": {} + }, + "source": [ + "for image, label in train.take(2):\n", + " plt.figure()\n", + " plt.imshow(image)\n", + " plt.title(get_label_name(label))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iFnFVaNQi7Vq", + "colab_type": "text" + }, + "source": [ + "Finally we will shuffle and batch the images." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "v5ZIhkFPi_Pb", + "colab_type": "code", + "colab": {} + }, + "source": [ + "BATCH_SIZE = 32\n", + "SHUFFLE_BUFFER_SIZE = 1000\n", + "\n", + "train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)\n", + "validation_batches = validation.batch(BATCH_SIZE)\n", + "test_batches = test.batch(BATCH_SIZE)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6QxI-fOAjDzC", + "colab_type": "text" + }, + "source": [ + "Now if we look at the shape of an original image vs the new image we will see it has been changed." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "zyqrCYNOjY9v", + "colab_type": "code", + "colab": {} + }, + "source": [ + "for img, label in raw_train.take(2):\n", + " print(\"Original shape:\", img.shape)\n", + "\n", + "for img, label in train.take(2):\n", + " print(\"New shape:\", img.shape)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NMpKJ3Xbj4BW", + "colab_type": "text" + }, + "source": [ + "###Picking a Pretrained Model\n", + "The model we are going to use as the convolutional base for our model is the **MobileNet V2** developed at Google. This model is trained on 1.4 million images and has 1000 different classes.\n", + "\n", + "We want to use this model but only its convolutional base. So, when we load in the model, we'll specify that we don't want to load the top (classification) layer. We'll tell the model what input shape to expect and to use the predetermined weights from *imagenet* (Googles dataset).\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "2a09os6dkokI", + "colab_type": "code", + "colab": {} + }, + "source": [ + "IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)\n", + "\n", + "# Create the base model from the pre-trained model MobileNet V2\n", + "base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,\n", + " include_top=False,\n", + " weights='imagenet')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "uRvMuWoFR2CO", + "colab_type": "code", + "colab": {} + }, + "source": [ + "base_model.summary()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ckYqfl7Vky3S", + "colab_type": "text" + }, + "source": [ + "At this point this *base_model* will simply output a shape (32, 5, 5, 1280) tensor that is a feature extraction from our original (1, 160, 160, 3) image. The 32 means that we have 32 layers of differnt filters/features." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "yojo6ONzlFGF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "for image, _ in train_batches.take(1):\n", + " pass\n", + "\n", + "feature_batch = base_model(image)\n", + "print(feature_batch.shape)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oQ2kn1P_lhsg", + "colab_type": "text" + }, + "source": [ + "###Freezing the Base\n", + "The term **freezing** refers to disabling the training property of a layer. It simply means we won’t make any changes to the weights of any layers that are frozen during training. This is important as we don't want to change the convolutional base that already has learned weights.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "6hXctqtYl8o5", + "colab_type": "code", + "colab": {} + }, + "source": [ + "base_model.trainable = False" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1jIGFXOrl9wc", + "colab_type": "code", + "colab": {} + }, + "source": [ + "base_model.summary()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b7UJLbJ7mJzw", + "colab_type": "text" + }, + "source": [ + "###Adding our Classifier\n", + "Now that we have our base layer setup, we can add the classifier. Instead of flattening the feature map of the base layer we will use a global average pooling layer that will average the entire 5x5 area of each 2D feature map and return to us a single 1280 element vector per filter. \n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3uUwG5wrnFD6", + "colab_type": "code", + "colab": {} + }, + "source": [ + "global_average_layer = tf.keras.layers.GlobalAveragePooling2D()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ejxd7rjInIRp", + "colab_type": "text" + }, + "source": [ + "Finally, we will add the predicition layer that will be a single dense neuron. We can do this because we only have two classes to predict for.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GA-iVZj9nH_N", + "colab_type": "code", + "colab": {} + }, + "source": [ + "prediction_layer = keras.layers.Dense(1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dn9G9KiFnXu6", + "colab_type": "text" + }, + "source": [ + "Now we will combine these layers together in a model." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "E_IJucQNnXBK", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model = tf.keras.Sequential([\n", + " base_model,\n", + " global_average_layer,\n", + " prediction_layer\n", + "])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "fLYdAL2uSt_a", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.summary()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NHepCsPXnpYZ", + "colab_type": "text" + }, + "source": [ + "###Training the Model\n", + "Now we will train and compile the model. We will use a very small learning rate to ensure that the model does not have any major changes made to it." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "GQhg2WxHnxra", + "colab_type": "code", + "colab": {} + }, + "source": [ + "base_learning_rate = 0.0001\n", + "model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),\n", + " loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),\n", + " metrics=['accuracy'])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "8Fx9nySdoZuL", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# We can evaluate the model right now to see how it does before training it on our new images\n", + "initial_epochs = 3\n", + "validation_steps=20\n", + "\n", + "loss0,accuracy0 = model.evaluate(validation_batches, steps = validation_steps)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "edMXObctojl6", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Now we can train it on our images\n", + "history = model.fit(train_batches,\n", + " epochs=initial_epochs,\n", + " validation_data=validation_batches)\n", + "\n", + "acc = history.history['accuracy']\n", + "print(acc)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "VUUt3AxA2lf2", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.save(\"dogs_vs_cats.h5\") # we can save the model and reload it at anytime in the future\n", + "new_model = tf.keras.models.load_model('dogs_vs_cats.h5')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2095EQ4Y3qJk", + "colab_type": "text" + }, + "source": [ + "And that's it for this section on computer vision!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "m8YcdmWUvYae", + "colab_type": "text" + }, + "source": [ + "##Object Detection\n", + "If you'd like to learn how you can perform object detection and recognition with tensorflow check out the guide below.\n", + "\n", + "https://github.com/tensorflow/models/tree/master/research/object_detection" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oEiX-D2f2tvI", + "colab_type": "text" + }, + "source": [ + "##Sources\n", + "1. “Convolutional Neural Network (CNN)  :   TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/images/cnn.\n", + "2. “Transfer Learning with a Pretrained ConvNet  :   TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/images/transfer_learning.\n", + "3. Chollet François. Deep Learning with Python. Manning Publications Co., 2018.\n", + "\n" + ] + } + ] +} \ No newline at end of file From 1a1efbcf5043d2aa5e766604b06dddd2b454e484 Mon Sep 17 00:00:00 2001 From: Sounak Pal <44639229+SounakPal212@users.noreply.github.com> Date: Fri, 14 Aug 2020 20:38:19 +0530 Subject: [PATCH 4/7] Created using Colaboratory --- Natural_Language_Processing_with_RNNs_.ipynb | 1381 ++++++++++++++++++ 1 file changed, 1381 insertions(+) create mode 100644 Natural_Language_Processing_with_RNNs_.ipynb diff --git a/Natural_Language_Processing_with_RNNs_.ipynb b/Natural_Language_Processing_with_RNNs_.ipynb new file mode 100644 index 0000000..4d4d307 --- /dev/null +++ b/Natural_Language_Processing_with_RNNs_.ipynb @@ -0,0 +1,1381 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Natural Language Processing with RNNs .ipynb", + "provenance": [], + "collapsed_sections": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "h5cjtsHP8t5Y", + "colab_type": "text" + }, + "source": [ + "#Natural Language Processing \n", + "Natural Language Processing (or NLP for short) is a discipline in computing that deals with the communication between natural (human) languages and computer languages. A common example of NLP is something like spellcheck or autocomplete. Essentially NLP is the field that focuses on how computers can understand and/or process natural/human languages. \n", + "\n", + "###Recurrent Neural Networks\n", + "\n", + "In this tutorial we will introduce a new kind of neural network that is much more capable of processing sequential data such as text or characters called a **recurrent neural network** (RNN for short). \n", + "\n", + "We will learn how to use a reccurent neural network to do the following:\n", + "- Sentiment Analysis\n", + "- Character Generation \n", + "\n", + "RNN's are complex and come in many different forms so in this tutorial we wil focus on how they work and the kind of problems they are best suited for.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ur_FQq-Q-fxC", + "colab_type": "text" + }, + "source": [ + "## Sequence Data\n", + "In the previous tutorials we focused on data that we could represent as one static data point where the notion of time or step was irrelevant. Take for example our image data, it was simply a tensor of shape (width, height, channels). That data doesn't change or care about the notion of time. \n", + "\n", + "In this tutorial we will look at sequences of text and learn how we can encode them in a meaningful way. Unlike images, sequence data such as long chains of text, weather patterns, videos and really anything where the notion of a step or time is relevant needs to be processed and handled in a special way. \n", + "\n", + "But what do I mean by sequences and why is text data a sequence? Well that's a good question. Since textual data contains many words that follow in a very specific and meaningful order, we need to be able to keep track of each word and when it occurs in the data. Simply encoding say an entire paragraph of text into one data point wouldn't give us a very meaningful picture of the data and would be very difficult to do anything with. This is why we treat text as a sequence and process one word at a time. We will keep track of where each of these words appear and use that information to try to understand the meaning of peices of text.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8gQHK4V4e2wl", + "colab_type": "text" + }, + "source": [ + "##Encoding Text\n", + "As we know machine learning models and neural networks don't take raw text data as an input. This means we must somehow encode our textual data to numeric values that our models can understand. There are many different ways of doing this and we will look at a few examples below. \n", + "\n", + "Before we get into the different encoding/preprocessing methods let's understand the information we can get from textual data by looking at the following two movie reviews.\n", + "\n", + "```I thought the movie was going to be bad, but it was actually amazing!```\n", + "\n", + "```I thought the movie was going to be amazing, but it was actually bad!```\n", + "\n", + "Although these two setences are very similar we know that they have very different meanings. This is because of the **ordering** of words, a very important property of textual data.\n", + "\n", + "Now keep that in mind while we consider some different ways of encoding our textual data.\n", + "\n", + "###Bag of Words\n", + "The first and simplest way to encode our data is to use something called **bag of words**. This is a pretty easy technique where each word in a sentence is encoded with an integer and thrown into a collection that does not maintain the order of the words but does keep track of the frequency. Have a look at the python function below that encodes a string of text into bag of words. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5KiCCBsIkMHi", + "colab_type": "code", + "colab": {} + }, + "source": [ + "vocab = {} # maps word to integer representing it\n", + "word_encoding = 1\n", + "def bag_of_words(text):\n", + " global word_encoding\n", + "\n", + " words = text.lower().split(\" \") # create a list of all of the words in the text, well assume there is no grammar in our text for this example\n", + " bag = {} # stores all of the encodings and their frequency\n", + "\n", + " for word in words:\n", + " if word in vocab:\n", + " encoding = vocab[word] # get encoding from vocab\n", + " else:\n", + " vocab[word] = word_encoding\n", + " encoding = word_encoding\n", + " word_encoding += 1\n", + " \n", + " if encoding in bag:\n", + " bag[encoding] += 1\n", + " else:\n", + " bag[encoding] = 1\n", + " \n", + " return bag\n", + "\n", + "text = \"this is a test to see if this test will work is is test a a\"\n", + "bag = bag_of_words(text)\n", + "print(bag)\n", + "print(vocab)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4hEvstSBl1gy", + "colab_type": "text" + }, + "source": [ + "This isn't really the way we would do this in practice, but I hope it gives you an idea of how bag of words works. Notice that we've lost the order in which words appear. In fact, let's look at how this encoding works for the two sentences we showed above.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "miYshfvzmJ0H", + "colab_type": "code", + "colab": {} + }, + "source": [ + "positive_review = \"I thought the movie was going to be bad but it was actually amazing\"\n", + "negative_review = \"I thought the movie was going to be amazing but it was actually bad\"\n", + "\n", + "pos_bag = bag_of_words(positive_review)\n", + "neg_bag = bag_of_words(negative_review)\n", + "\n", + "print(\"Positive:\", pos_bag)\n", + "print(\"Negative:\", neg_bag)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Pl7Fw9s3mkfK", + "colab_type": "text" + }, + "source": [ + "We can see that even though these sentences have a very different meaning they are encoded exaclty the same way. Obviously, this isn't going to fly. Let's look at some other methods.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DUKTycffmu1k", + "colab_type": "text" + }, + "source": [ + "###Integer Encoding\n", + "The next technique we will look at is called **integer encoding**. This involves representing each word or character in a sentence as a unique integer and maintaining the order of these words. This should hopefully fix the problem we saw before were we lost the order of words.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "MKY4y_tjnUEW", + "colab_type": "code", + "colab": {} + }, + "source": [ + "vocab = {} \n", + "word_encoding = 1\n", + "def one_hot_encoding(text):\n", + " global word_encoding\n", + "\n", + " words = text.lower().split(\" \") \n", + " encoding = [] \n", + "\n", + " for word in words:\n", + " if word in vocab:\n", + " code = vocab[word] \n", + " encoding.append(code) \n", + " else:\n", + " vocab[word] = word_encoding\n", + " encoding.append(word_encoding)\n", + " word_encoding += 1\n", + " \n", + " return encoding\n", + "\n", + "text = \"this is a test to see if this test will work is is test a a\"\n", + "encoding = one_hot_encoding(text)\n", + "print(encoding)\n", + "print(vocab)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TOrLG9Bin0Zv", + "colab_type": "text" + }, + "source": [ + "And now let's have a look at one hot encoding on our movie reviews." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "1S-GNjotn-Br", + "colab_type": "code", + "colab": {} + }, + "source": [ + "positive_review = \"I thought the movie was going to be bad but it was actually amazing\"\n", + "negative_review = \"I thought the movie was going to be amazing but it was actually bad\"\n", + "\n", + "pos_encode = one_hot_encoding(positive_review)\n", + "neg_encode = one_hot_encoding(negative_review)\n", + "\n", + "print(\"Positive:\", pos_encode)\n", + "print(\"Negative:\", neg_encode)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jC9UYV4vpq6Y", + "colab_type": "text" + }, + "source": [ + "Much better, now we are keeping track of the order of words and we can tell where each occurs. But this still has a few issues with it. Ideally when we encode words, we would like similar words to have similar labels and different words to have very different labels. For example, the words happy and joyful should probably have very similar labels so we can determine that they are similar. While words like horrible and amazing should probably have very different labels. The method we looked at above won't be able to do something like this for us. This could mean that the model will have a very difficult time determing if two words are similar or not which could result in some pretty drastic performace impacts.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JRZ73YCqqiw9", + "colab_type": "text" + }, + "source": [ + "###Word Embeddings\n", + "Luckily there is a third method that is far superior, **word embeddings**. This method keeps the order of words intact as well as encodes similar words with very similar labels. It attempts to not only encode the frequency and order of words but the meaning of those words in the sentence. It encodes each word as a dense vector that represents its context in the sentence.\n", + "\n", + "Unlike the previous techniques word embeddings are learned by looking at many different training examples. You can add what's called an *embedding layer* to the beggining of your model and while your model trains your embedding layer will learn the correct embeddings for words. You can also use pretrained embedding layers.\n", + "\n", + "This is the technique we will use for our examples and its implementation will be showed later on.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ehig3qliuUzk", + "colab_type": "text" + }, + "source": [ + "##Recurrent Neural Networks (RNN's)\n", + "Now that we've learned a little bit about how we can encode text it's time to dive into recurrent neural networks. Up until this point we have been using something called **feed-forward** neural networks. This simply means that all our data is fed forwards (all at once) from left to right through the network. This was fine for the problems we considered before but won't work very well for processing text. After all, even we (humans) don't process text all at once. We read word by word from left to right and keep track of the current meaning of the sentence so we can understand the meaning of the next word. Well this is exaclty what a recurrent neural network is designed to do. When we say recurrent neural network all we really mean is a network that contains a loop. A RNN will process one word at a time while maintaining an internal memory of what it's already seen. This will allow it to treat words differently based on their order in a sentence and to slowly build an understanding of the entire input, one word at a time.\n", + "\n", + "This is why we are treating our text data as a sequence! So that we can pass one word at a time to the RNN.\n", + "\n", + "Let's have a look at what a recurrent layer might look like.\n", + "\n", + "![alt text](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)\n", + "*Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/*\n", + "\n", + "Let's define what all these variables stand for before we get into the explination.\n", + "\n", + "**ht** output at time t\n", + "\n", + "**xt** input at time t\n", + "\n", + "**A** Recurrent Layer (loop)\n", + "\n", + "What this diagram is trying to illustrate is that a recurrent layer processes words or input one at a time in a combination with the output from the previous iteration. So, as we progress further in the input sequence, we build a more complex understanding of the text as a whole.\n", + "\n", + "What we've just looked at is called a **simple RNN layer**. It can be effective at processing shorter sequences of text for simple problems but has many downfalls associated with it. One of them being the fact that as text sequences get longer it gets increasingly difficult for the network to understand the text properly.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fo3WY-e86zX2", + "colab_type": "text" + }, + "source": [ + "##LSTM\n", + "The layer we dicussed in depth above was called a *simpleRNN*. However, there does exist some other recurrent layers (layers that contain a loop) that work much better than a simple RNN layer. The one we will talk about here is called LSTM (Long Short-Term Memory). This layer works very similarily to the simpleRNN layer but adds a way to access inputs from any timestep in the past. Whereas in our simple RNN layer input from previous timestamps gradually disappeared as we got further through the input. With a LSTM we have a long-term memory data structure storing all the previously seen inputs as well as when we saw them. This allows for us to access any previous value we want at any point in time. This adds to the complexity of our network and allows it to discover more useful relationships between inputs and when they appear. \n", + "\n", + "For the purpose of this course we will refrain from going any further into the math or details behind how these layers work.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CRGOx6_v4eZ_", + "colab_type": "text" + }, + "source": [ + "##Sentiment Analysis\n", + "And now time to see a recurrent neural network in action. For this example, we are going to do something called sentiment analysis.\n", + "\n", + "The formal definition of this term from Wikipedia is as follows:\n", + "\n", + "*the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.*\n", + "\n", + "The example we’ll use here is classifying movie reviews as either postive, negative or neutral.\n", + "\n", + "*This guide is based on the following tensorflow tutorial: https://www.tensorflow.org/tutorials/text/text_classification_rnn*\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RACGE5Ypt5u9", + "colab_type": "text" + }, + "source": [ + "###Movie Review Dataset\n", + "Well start by loading in the IMDB movie review dataset from keras. This dataset contains 25,000 reviews from IMDB where each one is already preprocessed and has a label as either positive or negative. Each review is encoded by integers that represents how common a word is in the entire dataset. For example, a word encoded by the integer 3 means that it is the 3rd most common word in the dataset.\n", + " \n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "pdsus1kyXWC8", + "colab_type": "code", + "colab": {} + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook\n", + "from keras.datasets import imdb\n", + "from keras.preprocessing import sequence\n", + "import keras\n", + "import tensorflow as tf\n", + "import os\n", + "import numpy as np\n", + "\n", + "VOCAB_SIZE = 88584\n", + "\n", + "MAXLEN = 250\n", + "BATCH_SIZE = 64\n", + "\n", + "(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = VOCAB_SIZE)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Wh6lOpcQ9sIZ", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Lets look at one review\n", + "train_data[1]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EAtZHE9-eQ07", + "colab_type": "text" + }, + "source": [ + "###More Preprocessing\n", + "If we have a look at some of our loaded in reviews, we'll notice that they are different lengths. This is an issue. We cannot pass different length data into our neural network. Therefore, we must make each review the same length. To do this we will follow the procedure below:\n", + "- if the review is greater than 250 words then trim off the extra words\n", + "- if the review is less than 250 words add the necessary amount of 0's to make it equal to 250.\n", + "\n", + "Luckily for us keras has a function that can do this for us:\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Z3qQ83sNeog6", + "colab_type": "code", + "colab": {} + }, + "source": [ + "train_data = sequence.pad_sequences(train_data, MAXLEN)\n", + "test_data = sequence.pad_sequences(test_data, MAXLEN)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mDm_0RTVir7I", + "colab_type": "text" + }, + "source": [ + "###Creating the Model\n", + "Now it's time to create the model. We'll use a word embedding layer as the first layer in our model and add a LSTM layer afterwards that feeds into a dense node to get our predicted sentiment. \n", + "\n", + "32 stands for the output dimension of the vectors generated by the embedding layer. We can change this value if we'd like!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "OWGGcBIpjrMu", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model = tf.keras.Sequential([\n", + " tf.keras.layers.Embedding(VOCAB_SIZE, 32),\n", + " tf.keras.layers.LSTM(32),\n", + " tf.keras.layers.Dense(1, activation=\"sigmoid\")\n", + "])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "O8_jPL_Kkr-a", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.summary()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eyeQCk3LlK6V", + "colab_type": "text" + }, + "source": [ + "###Training\n", + "Now it's time to compile and train the model. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KKEMjaIulPBe", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.compile(loss=\"binary_crossentropy\",optimizer=\"rmsprop\",metrics=['acc'])\n", + "\n", + "history = model.fit(train_data, train_labels, epochs=10, validation_split=0.2)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3buYlkkhoK93", + "colab_type": "text" + }, + "source": [ + "And we'll evaluate the model on our training data to see how well it performs." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KImNMWTDoJaQ", + "colab_type": "code", + "colab": {} + }, + "source": [ + "results = model.evaluate(test_data, test_labels)\n", + "print(results)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "N1RRGcr9CFCW", + "colab_type": "text" + }, + "source": [ + "So we're scoring somewhere in the mid-high 80's. Not bad for a simple recurrent network." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lGrBRC4YCObV", + "colab_type": "text" + }, + "source": [ + "###Making Predictions\n", + "Now let’s use our network to make predictions on our own reviews. \n", + "\n", + "Since our reviews are encoded well need to convert any review that we write into that form so the network can understand it. To do that well load the encodings from the dataset and use them to encode our own data.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Onu8leY4Cn9z", + "colab_type": "code", + "colab": {} + }, + "source": [ + "word_index = imdb.get_word_index()\n", + "\n", + "def encode_text(text):\n", + " tokens = keras.preprocessing.text.text_to_word_sequence(text)\n", + " tokens = [word_index[word] if word in word_index else 0 for word in tokens]\n", + " return sequence.pad_sequences([tokens], MAXLEN)[0]\n", + "\n", + "text = \"that movie was just amazing, so amazing\"\n", + "encoded = encode_text(text)\n", + "print(encoded)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "PKna3vxmFwrB", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# while were at it lets make a decode function\n", + "\n", + "reverse_word_index = {value: key for (key, value) in word_index.items()}\n", + "\n", + "def decode_integers(integers):\n", + " PAD = 0\n", + " text = \"\"\n", + " for num in integers:\n", + " if num != PAD:\n", + " text += reverse_word_index[num] + \" \"\n", + "\n", + " return text[:-1]\n", + " \n", + "print(decode_integers(encoded))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "L8nyrr00HPZF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# now time to make a prediction\n", + "\n", + "def predict(text):\n", + " encoded_text = encode_text(text)\n", + " pred = np.zeros((1,250))\n", + " pred[0] = encoded_text\n", + " result = model.predict(pred) \n", + " print(result[0])\n", + "\n", + "positive_review = \"That movie was! really loved it and would great watch it again because it was amazingly great\"\n", + "predict(positive_review)\n", + "\n", + "negative_review = \"that movie really sucked. I hated it and wouldn't watch it again. Was one of the worst things I've ever watched\"\n", + "predict(negative_review)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "01BJLcGb4ZqK", + "colab_type": "text" + }, + "source": [ + "##RNN Play Generator\n", + "\n", + "Now time for one of the coolest examples we've seen so far. We are going to use a RNN to generate a play. We will simply show the RNN an example of something we want it to recreate and it will learn how to write a version of it on its own. We'll do this using a character predictive model that will take as input a variable length sequence and predict the next character. We can use the model many times in a row with the output from the last predicition as the input for the next call to generate a sequence.\n", + "\n", + "\n", + "*This guide is based on the following: https://www.tensorflow.org/tutorials/text/text_generation*" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fju7i1FKrK_G", + "colab_type": "code", + "colab": {} + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook\n", + "from keras.preprocessing import sequence\n", + "import keras\n", + "import tensorflow as tf\n", + "import os\n", + "import numpy as np" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F48c-EctQ378", + "colab_type": "text" + }, + "source": [ + "###Dataset\n", + "For this example, we only need one peice of training data. In fact, we can write our own poem or play and pass that to the network for training if we'd like. However, to make things easy we'll use an extract from a shakesphere play.\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IdRcVIhtRGlF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NlSVGd5ACkZe", + "colab_type": "text" + }, + "source": [ + "###Loading Your Own Data\n", + "To load your own data, you'll need to upload a file from the dialog below. Then you'll need to follow the steps from above but load in this new file instead.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CFYFwbJOC3bP", + "colab_type": "code", + "colab": {} + }, + "source": [ + "from google.colab import files\n", + "path_to_file = list(files.upload().keys())[0]" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KtJMEqQyRhAk", + "colab_type": "text" + }, + "source": [ + "###Read Contents of File\n", + "Let's look at the contents of the file." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-n4oovOMRnP7", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Read, then decode for py2 compat.\n", + "text = open(path_to_file, 'rb').read().decode(encoding='utf-8')\n", + "# length of text is the number of characters in it\n", + "print ('Length of text: {} characters'.format(len(text)))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "KHUxQVl7Rt10", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Take a look at the first 250 characters in text\n", + "print(text[:250])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5vt8Vpe0RvaJ", + "colab_type": "text" + }, + "source": [ + "###Encoding\n", + "Since this text isn't encoded yet well need to do that ourselves. We are going to encode each unique character as a different integer.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "C7AZNI7aRz6y", + "colab_type": "code", + "colab": {} + }, + "source": [ + "vocab = sorted(set(text))\n", + "# Creating a mapping from unique characters to indices\n", + "char2idx = {u:i for i, u in enumerate(vocab)}\n", + "idx2char = np.array(vocab)\n", + "\n", + "def text_to_int(text):\n", + " return np.array([char2idx[c] for c in text])\n", + "\n", + "text_as_int = text_to_int(text)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_i5kvmX_SLW4", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# lets look at how part of our text is encoded\n", + "print(\"Text:\", text[:13])\n", + "print(\"Encoded:\", text_to_int(text[:13]))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mDvD5kqTWwOn", + "colab_type": "text" + }, + "source": [ + "And here we will make a function that can convert our numeric values to text.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Af52YChSW5hX", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def int_to_text(ints):\n", + " try:\n", + " ints = ints.numpy()\n", + " except:\n", + " pass\n", + " return ''.join(idx2char[ints])\n", + "\n", + "print(int_to_text(text_as_int[:13]))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T_49cl6uS0r-", + "colab_type": "text" + }, + "source": [ + "###Creating Training Examples\n", + "Remember our task is to feed the model a sequence and have it return to us the next character. This means we need to split our text data from above into many shorter sequences that we can pass to the model as training examples. \n", + "\n", + "The training examples we will prepapre will use a *seq_length* sequence as input and a *seq_length* sequence as the output where that sequence is the original sequence shifted one letter to the right. For example:\n", + "\n", + "```input: Hell | output: ello```\n", + "\n", + "Our first step will be to create a stream of characters from our text data." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "xBkXz9fjUQHW", + "colab_type": "code", + "colab": {} + }, + "source": [ + "seq_length = 100 # length of sequence for a training example\n", + "examples_per_epoch = len(text)//(seq_length+1)\n", + "\n", + "# Create training examples / targets\n", + "char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pqmxfT7gVGlr", + "colab_type": "text" + }, + "source": [ + "Next we can use the batch method to turn this stream of characters into batches of desired length." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Xi0xaPB_VOJl", + "colab_type": "code", + "colab": {} + }, + "source": [ + "sequences = char_dataset.batch(seq_length+1, drop_remainder=True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fxo1Dig_VvV1", + "colab_type": "text" + }, + "source": [ + "Now we need to use these sequences of length 101 and split them into input and output." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "03zKVHTvV0Km", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def split_input_target(chunk): # for the example: hello\n", + " input_text = chunk[:-1] # hell\n", + " target_text = chunk[1:] # ello\n", + " return input_text, target_text # hell, ello\n", + "\n", + "dataset = sequences.map(split_input_target) # we use map to apply the above function to every entry" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "9p_y2YmgWbnc", + "colab_type": "code", + "colab": {} + }, + "source": [ + "for x, y in dataset.take(2):\n", + " print(\"\\n\\nEXAMPLE\\n\")\n", + " print(\"INPUT\")\n", + " print(int_to_text(x))\n", + " print(\"\\nOUTPUT\")\n", + " print(int_to_text(y))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "v6OxuFKVXpwK", + "colab_type": "text" + }, + "source": [ + "Finally we need to make training batches." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "cRsKcjhXXuoD", + "colab_type": "code", + "colab": {} + }, + "source": [ + "BATCH_SIZE = 64\n", + "VOCAB_SIZE = len(vocab) # vocab is number of unique characters\n", + "EMBEDDING_DIM = 256\n", + "RNN_UNITS = 1024\n", + "\n", + "# Buffer size to shuffle the dataset\n", + "# (TF data is designed to work with possibly infinite sequences,\n", + "# so it doesn't attempt to shuffle the entire sequence in memory. Instead,\n", + "# it maintains a buffer in which it shuffles elements).\n", + "BUFFER_SIZE = 10000\n", + "\n", + "data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "E6YRmZLtX0d0", + "colab_type": "text" + }, + "source": [ + "###Building the Model\n", + "Now it is time to build the model. We will use an embedding layer a LSTM and one dense layer that contains a node for each unique character in our training data. The dense layer will give us a probability distribution over all nodes." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "5v_P2dEic4qt", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def build_model(vocab_size, embedding_dim, rnn_units, batch_size):\n", + " model = tf.keras.Sequential([\n", + " tf.keras.layers.Embedding(vocab_size, embedding_dim,\n", + " batch_input_shape=[batch_size, None]),\n", + " tf.keras.layers.LSTM(rnn_units,\n", + " return_sequences=True,\n", + " stateful=True,\n", + " recurrent_initializer='glorot_uniform'),\n", + " tf.keras.layers.Dense(vocab_size)\n", + " ])\n", + " return model\n", + "\n", + "model = build_model(VOCAB_SIZE,EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)\n", + "model.summary()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8gfnHBUOvPqE", + "colab_type": "text" + }, + "source": [ + "###Creating a Loss Function\n", + "Now we are going to create our own loss function for this problem. This is because our model will output a (64, sequence_length, 65) shaped tensor that represents the probability distribution of each character at each timestep for every sequence in the batch. \n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g_ERM4F15v_S", + "colab_type": "text" + }, + "source": [ + "However, before we do that let's have a look at a sample input and the output from our untrained model. This is so we can understand what the model is giving us.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "KdvEqlwc6_q0", + "colab_type": "code", + "colab": {} + }, + "source": [ + "for input_example_batch, target_example_batch in data.take(1):\n", + " example_batch_predictions = model(input_example_batch) # ask our model for a prediction on our first batch of training data (64 entries)\n", + " print(example_batch_predictions.shape, \"# (batch_size, sequence_length, vocab_size)\") # print out the output shape" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "RQS5KXwi7_NX", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# we can see that the predicition is an array of 64 arrays, one for each entry in the batch\n", + "print(len(example_batch_predictions))\n", + "print(example_batch_predictions)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "sA1Zhop28V9n", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# lets examine one prediction\n", + "pred = example_batch_predictions[0]\n", + "print(len(pred))\n", + "print(pred)\n", + "# notice this is a 2d array of length 100, where each interior array is the prediction for the next character at each time step" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "UbIoe7Ei8q3q", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# and finally well look at a prediction at the first timestep\n", + "time_pred = pred[0]\n", + "print(len(time_pred))\n", + "print(time_pred)\n", + "# and of course its 65 values representing the probabillity of each character occuring next" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "qlEYM1H995gR", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# If we want to determine the predicted character we need to sample the output distribution (pick a value based on probabillity)\n", + "sampled_indices = tf.random.categorical(pred, num_samples=1)\n", + "\n", + "# now we can reshape that array and convert all the integers to numbers to see the actual characters\n", + "sampled_indices = np.reshape(sampled_indices, (1, -1))[0]\n", + "predicted_chars = int_to_text(sampled_indices)\n", + "\n", + "predicted_chars # and this is what the model predicted for training sequence 1" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qcCBfPjN9Cnp", + "colab_type": "text" + }, + "source": [ + "So now we need to create a loss function that can compare that output to the expected output and give us some numeric value representing how close the two were. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "ZOw23fWq9D9O", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def loss(labels, logits):\n", + " return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "kcg75GwXgW81", + "colab_type": "text" + }, + "source": [ + "###Compiling the Model\n", + "At this point we can think of our problem as a classification problem where the model predicts the probabillity of each unique letter coming next. \n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9g6o7zA_hAiS", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.compile(optimizer='adam', loss=loss)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YgDKr4yvjLPI", + "colab_type": "text" + }, + "source": [ + "###Creating Checkpoints\n", + "Now we are going to setup and configure our model to save checkpoinst as it trains. This will allow us to load our model from a checkpoint and continue training it." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "v7aMushYjSpy", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Directory where the checkpoints will be saved\n", + "checkpoint_dir = './training_checkpoints'\n", + "# Name of the checkpoint files\n", + "checkpoint_prefix = os.path.join(checkpoint_dir, \"ckpt_{epoch}\")\n", + "\n", + "checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(\n", + " filepath=checkpoint_prefix,\n", + " save_weights_only=True)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0p7acPvGja5c", + "colab_type": "text" + }, + "source": [ + "###Training\n", + "Finally, we will start training the model. \n", + "\n", + "**If this is taking a while go to Runtime > Change Runtime Type and choose \"GPU\" under hardware accelerator.**\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "R4PAgrwMjZ4_", + "colab_type": "code", + "colab": {} + }, + "source": [ + "history = model.fit(data, epochs=50, callbacks=[checkpoint_callback])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9GhoHJVtmTsz", + "colab_type": "text" + }, + "source": [ + "###Loading the Model\n", + "We'll rebuild the model from a checkpoint using a batch_size of 1 so that we can feed one peice of text to the model and have it make a prediction." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TPSto3uimSKp", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "boEJvy_vjLJQ", + "colab_type": "text" + }, + "source": [ + "Once the model is finished training, we can find the **lastest checkpoint** that stores the models weights using the following line.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "PZIEZWE4mNKl", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))\n", + "model.build(tf.TensorShape([1, None]))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CmPPtbaTKF8d", + "colab_type": "text" + }, + "source": [ + "We can load **any checkpoint** we want by specifying the exact file to load." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YQ_5p0ehKFDn", + "colab_type": "code", + "colab": {} + }, + "source": [ + "checkpoint_num = 10\n", + "model.load_weights(tf.train.load_checkpoint(\"./training_checkpoints/ckpt_\" + str(checkpoint_num)))\n", + "model.build(tf.TensorShape([1, None]))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KaZWalEeAxQN", + "colab_type": "text" + }, + "source": [ + "###Generating Text\n", + "Now we can use the lovely function provided by tensorflow to generate some text using any starting string we'd like." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "oPSALdQXA3l3", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def generate_text(model, start_string):\n", + " # Evaluation step (generating text using the learned model)\n", + "\n", + " # Number of characters to generate\n", + " num_generate = 800\n", + "\n", + " # Converting our start string to numbers (vectorizing)\n", + " input_eval = [char2idx[s] for s in start_string]\n", + " input_eval = tf.expand_dims(input_eval, 0)\n", + "\n", + " # Empty string to store our results\n", + " text_generated = []\n", + "\n", + " # Low temperatures results in more predictable text.\n", + " # Higher temperatures results in more surprising text.\n", + " # Experiment to find the best setting.\n", + " temperature = 1.0\n", + "\n", + " # Here batch size == 1\n", + " model.reset_states()\n", + " for i in range(num_generate):\n", + " predictions = model(input_eval)\n", + " # remove the batch dimension\n", + " \n", + " predictions = tf.squeeze(predictions, 0)\n", + "\n", + " # using a categorical distribution to predict the character returned by the model\n", + " predictions = predictions / temperature\n", + " predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()\n", + "\n", + " # We pass the predicted character as the next input to the model\n", + " # along with the previous hidden state\n", + " input_eval = tf.expand_dims([predicted_id], 0)\n", + "\n", + " text_generated.append(idx2char[predicted_id])\n", + "\n", + " return (start_string + ''.join(text_generated))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "cAJqhD9AA5mF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "inp = input(\"Type a starting string: \")\n", + "print(generate_text(model, inp))" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CBjHrzzyOBVr", + "colab_type": "text" + }, + "source": [ + "*And* that's pretty much it for this module! I highly reccomend messing with the model we just created and seeing what you can get it to do!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Cw-1eDE54yQo", + "colab_type": "text" + }, + "source": [ + "##Sources\n", + "\n", + "1. Chollet François. Deep Learning with Python. Manning Publications Co., 2018.\n", + "2. “Text Classification with an RNN  :   TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/text/text_classification_rnn.\n", + "3. “Text Generation with an RNN  :   TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/text/text_generation.\n", + "4. “Understanding LSTM Networks.” Understanding LSTM Networks -- Colah's Blog, https://colah.github.io/posts/2015-08-Understanding-LSTMs/." + ] + } + ] +} \ No newline at end of file From 355db5a2d780be38efa4a5262b3119888727e7d3 Mon Sep 17 00:00:00 2001 From: Sounak Pal <44639229+SounakPal212@users.noreply.github.com> Date: Fri, 14 Aug 2020 20:38:38 +0530 Subject: [PATCH 5/7] Created using Colaboratory --- Reinforcement_Learning.ipynb | 533 +++++++++++++++++++++++++++++++++++ 1 file changed, 533 insertions(+) create mode 100644 Reinforcement_Learning.ipynb diff --git a/Reinforcement_Learning.ipynb b/Reinforcement_Learning.ipynb new file mode 100644 index 0000000..ac9ff61 --- /dev/null +++ b/Reinforcement_Learning.ipynb @@ -0,0 +1,533 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Reinforcement Learning.ipynb", + "provenance": [], + "collapsed_sections": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-ADWvu7NKN2r", + "colab_type": "text" + }, + "source": [ + "##Reinforcement Learning\n", + "The next and final topic in this course covers *Reinforcement Learning*. This technique is different than many of the other machine learning techniques we have seen earlier and has many applications in training agents (an AI) to interact with enviornments like games. Rather than feeding our machine learning model millions of examples we let our model come up with its own examples by exploring an enviornemt. The concept is simple. Humans learn by exploring and learning from mistakes and past experiences so let's have our computer do the same.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HGCR3JWQLaQb", + "colab_type": "text" + }, + "source": [ + "###Terminology\n", + "Before we dive into explaining reinforcement learning we need to define a few key peices of terminology.\n", + "\n", + "**Enviornemt** In reinforcement learning tasks we have a notion of the enviornment. This is what our *agent* will explore. An example of an enviornment in the case of training an AI to play say a game of mario would be the level we are training the agent on.\n", + "\n", + "**Agent** an agent is an entity that is exploring the enviornment. Our agent will interact and take different actions within the enviornment. In our mario example the mario character within the game would be our agent. \n", + "\n", + "**State** always our agent will be in what we call a *state*. The state simply tells us about the status of the agent. The most common example of a state is the location of the agent within the enviornment. Moving locations would change the agents state.\n", + "\n", + "**Action** any interaction between the agent and enviornment would be considered an action. For example, moving to the left or jumping would be an action. An action may or may not change the current *state* of the agent. In fact, the act of doing nothing is an action as well! The action of say not pressing a key if we are using our mario example.\n", + "\n", + "**Reward** every action that our agent takes will result in a reward of some magnitude (positive or negative). The goal of our agent will be to maximize its reward in an enviornment. Sometimes the reward will be clear, for example if an agent performs an action which increases their score in the enviornment we could say they've recieved a positive reward. If the agent were to perform an action which results in them losing score or possibly dying in the enviornment then they would recieve a negative reward. \n", + "\n", + "The most important part of reinforcement learning is determing how to reward the agent. After all, the goal of the agent is to maximize its rewards. This means we should reward the agent appropiatly such that it reaches the desired goal.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "AoOJy9s4ZJJt", + "colab_type": "text" + }, + "source": [ + "###Q-Learning\n", + "Now that we have a vague idea of how reinforcement learning works it's time to talk about a specific technique in reinforcement learning called *Q-Learning*.\n", + "\n", + "Q-Learning is a simple yet quite powerful technique in machine learning that involves learning a matrix of action-reward values. This matrix is often reffered to as a Q-Table or Q-Matrix. The matrix is in shape (number of possible states, number of possible actions) where each value at matrix[n, m] represents the agents expected reward given they are in state n and take action m. The Q-learning algorithm defines the way we update the values in the matrix and decide what action to take at each state. The idea is that after a succesful training/learning of this Q-Table/matrix we can determine the action an agent should take in any state by looking at that states row in the matrix and taking the maximium value column as the action.\n", + "\n", + "**Consider this example.**\n", + "\n", + "Let's say A1-A4 are the possible actions and we have 3 states represented by each row (state 1 - state 3).\n", + "\n", + "| A1 | A2 | A3 | A4 |\n", + "|:--: |:--: |:--: |:--: |\n", + "| 0 | 0 | 10 | 5 |\n", + "| 5 | 10 | 0 | 0 |\n", + "| 10 | 5 | 0 | 0 |\n", + "\n", + "If that was our Q-Table/matrix then the following would be the preffered actions in each state.\n", + "\n", + "> State 1: A3\n", + "\n", + "> State 2: A2\n", + "\n", + "> State 3: A1\n", + "\n", + "We can see that this is because the values in each of those columns are the highest for those states!\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u5uLpN1yemTx", + "colab_type": "text" + }, + "source": [ + "###Learning the Q-Table\n", + "So that's simple, right? Now how do we create this table and find those values. Well this is where we will dicuss how the Q-Learning algorithm updates the values in our Q-Table. \n", + "\n", + "I'll start by noting that our Q-Table starts of with all 0 values. This is because the agent has yet to learn anything about the enviornment. \n", + "\n", + "Our agent learns by exploring the enviornment and observing the outcome/reward from each action it takes in each state. But how does it know what action to take in each state? There are two ways that our agent can decide on which action to take.\n", + "1. Randomly picking a valid action\n", + "2. Using the current Q-Table to find the best action.\n", + "\n", + "Near the beginning of our agents learning it will mostly take random actions in order to explore the enviornment and enter many different states. As it starts to explore more of the enviornment it will start to gradually rely more on it's learned values (Q-Table) to take actions. This means that as our agent explores more of the enviornment it will develop a better understanding and start to take \"correct\" or better actions more often. It's important that the agent has a good balance of taking random actions and using learned values to ensure it does get trapped in a local maximum. \n", + "\n", + "After each new action our agent wil record the new state (if any) that it has entered and the reward that it recieved from taking that action. These values will be used to update the Q-Table. The agent will stop taking new actions only once a certain time limit is reached or it has acheived the goal or reached the end of the enviornment. \n", + "\n", + "####Updating Q-Values\n", + "The formula for updating the Q-Table after each action is as follows:\n", + "> $ Q[state, action] = Q[state, action] + \\alpha * (reward + \\gamma * max(Q[newState, :]) - Q[state, action]) $\n", + "\n", + "- $\\alpha$ stands for the **Learning Rate**\n", + "\n", + "- $\\gamma$ stands for the **Discount Factor**\n", + "\n", + "####Learning Rate $\\alpha$\n", + "The learning rate $\\alpha$ is a numeric constant that defines how much change is permitted on each QTable update. A high learning rate means that each update will introduce a large change to the current state-action value. A small learning rate means that each update has a more subtle change. Modifying the learning rate will change how the agent explores the enviornment and how quickly it determines the final values in the QTable.\n", + "\n", + "####Discount Factor $\\gamma$\n", + "Discount factor also know as gamma ($\\gamma$) is used to balance how much focus is put on the current and future reward. A high discount factor means that future rewards will be considered more heavily.\n", + "\n", + "
\n", + "

To perform updates on this table we will let the agent explpore the enviornment for a certain period of time and use each of its actions to make an update. Slowly we should start to notice the agent learning and choosing better actions.

\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rwIl0sJgmu4D", + "colab_type": "text" + }, + "source": [ + "##Q-Learning Example\n", + "For this example we will use the Q-Learning algorithm to train an agent to navigate a popular enviornment from the [Open AI Gym](https://gym.openai.com/). The Open AI Gym was developed so programmers could practice machine learning using unique enviornments. Intersting fact, Elon Musk is one of the founders of OpenAI!\n", + "\n", + "Let's start by looking at what Open AI Gym is. " + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "rSETF0zqokYr", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import gym # all you have to do to import and use open ai gym!" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8cH3AmCzotO1", + "colab_type": "text" + }, + "source": [ + "Once you import gym you can load an enviornment using the line ```gym.make(\"enviornment\")```." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "UKN1ScBco3dp", + "colab_type": "code", + "colab": {} + }, + "source": [ + "env = gym.make('FrozenLake-v0') # we are going to use the FrozenLake enviornment" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3SvSlmVwo8cY", + "colab_type": "text" + }, + "source": [ + "There are a few other commands that can be used to interact and get information about the enviornment." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "FF3icIeapFct", + "colab_type": "code", + "colab": {} + }, + "source": [ + "print(env.observation_space.n) # get number of states\n", + "print(env.action_space.n) # get number of actions" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "lc9cwp03pQVn", + "colab_type": "code", + "colab": {} + }, + "source": [ + "env.reset() # reset enviornment to default state" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "sngyjPDapUt7", + "colab_type": "code", + "colab": {} + }, + "source": [ + "action = env.action_space.sample() # get a random action " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "HeEfi8xypXya", + "colab_type": "code", + "colab": {} + }, + "source": [ + "new_state, reward, done, info = env.step(action) # take action, notice it returns information about the action" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "_1W3D81ipdaS", + "colab_type": "code", + "colab": {} + }, + "source": [ + "env.render() # render the GUI for the enviornment " + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vmW6HAbQp01f", + "colab_type": "text" + }, + "source": [ + "###Frozen Lake Enviornment\n", + "Now that we have a basic understanding of how the gym enviornment works it's time to discuss the specific problem we will be solving.\n", + "\n", + "The enviornment we loaded above ```FrozenLake-v0``` is one of the simplest enviornments in Open AI Gym. The goal of the agent is to navigate a frozen lake and find the Goal without falling through the ice (render the enviornment above to see an example).\n", + "\n", + "There are:\n", + "- 16 states (one for each square) \n", + "- 4 possible actions (LEFT, RIGHT, DOWN, UP)\n", + "- 4 different types of blocks (F: frozen, H: hole, S: start, G: goal)\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YlWoK75ZrK2b", + "colab_type": "text" + }, + "source": [ + "###Building the Q-Table\n", + "The first thing we need to do is build an empty Q-Table that we can use to store and update our values." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "r767K4s0rR2p", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import gym\n", + "import numpy as np\n", + "import time\n", + "\n", + "env = gym.make('FrozenLake-v0')\n", + "STATES = env.observation_space.n\n", + "ACTIONS = env.action_space.n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "UAzMWGatrVIk", + "colab_type": "code", + "colab": {} + }, + "source": [ + "Q = np.zeros((STATES, ACTIONS)) # create a matrix with all 0 values \n", + "Q" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vc_h8tLSrpmc", + "colab_type": "text" + }, + "source": [ + "###Constants\n", + "As we discussed we need to define some constants that will be used to update our Q-Table and tell our agent when to stop training." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-FQapdnnr6P1", + "colab_type": "code", + "colab": {} + }, + "source": [ + "EPISODES = 2000 # how many times to run the enviornment from the beginning\n", + "MAX_STEPS = 100 # max number of steps allowed for each run of enviornment\n", + "\n", + "LEARNING_RATE = 0.81 # learning rate\n", + "GAMMA = 0.96" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NxrAj91rsMfm", + "colab_type": "text" + }, + "source": [ + "###Picking an Action\n", + "Remember that we can pick an action using one of two methods:\n", + "1. Randomly picking a valid action\n", + "2. Using the current Q-Table to find the best action.\n", + "\n", + "Here we will define a new value $\\epsilon$ that will tell us the probabillity of selecting a random action. This value will start off very high and slowly decrease as the agent learns more about the enviornment." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "YUAQVyX0sWDb", + "colab_type": "code", + "colab": {} + }, + "source": [ + "epsilon = 0.9 # start with a 90% chance of picking a random action\n", + "\n", + "# code to pick action\n", + "if np.random.uniform(0, 1) < epsilon: # we will check if a randomly selected value is less than epsilon.\n", + " action = env.action_space.sample() # take random action\n", + "else:\n", + " action = np.argmax(Q[state, :]) # use Q table to pick best action based on current values" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5n-i0B7Atige", + "colab_type": "text" + }, + "source": [ + "###Updating Q Values\n", + "The code below implements the formula discussed above." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "9r7R1W6Qtnh8", + "colab_type": "code", + "colab": {} + }, + "source": [ + "Q[state, action] = Q[state, action] + LEARNING_RATE * (reward + GAMMA * np.max(Q[new_state, :]) - Q[state, action])" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "__afaD62uh8G", + "colab_type": "text" + }, + "source": [ + "###Putting it Together\n", + "Now that we know how to do some basic things we can combine these together to create our Q-Learning algorithm," + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "AGiYCiNuutHz", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import gym\n", + "import numpy as np\n", + "import time\n", + "\n", + "env = gym.make('FrozenLake-v0')\n", + "STATES = env.observation_space.n\n", + "ACTIONS = env.action_space.n\n", + "\n", + "Q = np.zeros((STATES, ACTIONS))\n", + "\n", + "EPISODES = 1500 # how many times to run the enviornment from the beginning\n", + "MAX_STEPS = 100 # max number of steps allowed for each run of enviornment\n", + "\n", + "LEARNING_RATE = 0.81 # learning rate\n", + "GAMMA = 0.96\n", + "\n", + "RENDER = False # if you want to see training set to true\n", + "\n", + "epsilon = 0.9\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "jFRtn5dUu5ZI", + "colab_type": "code", + "colab": {} + }, + "source": [ + "rewards = []\n", + "for episode in range(EPISODES):\n", + "\n", + " state = env.reset()\n", + " for _ in range(MAX_STEPS):\n", + " \n", + " if RENDER:\n", + " env.render()\n", + "\n", + " if np.random.uniform(0, 1) < epsilon:\n", + " action = env.action_space.sample() \n", + " else:\n", + " action = np.argmax(Q[state, :])\n", + "\n", + " next_state, reward, done, _ = env.step(action)\n", + "\n", + " Q[state, action] = Q[state, action] + LEARNING_RATE * (reward + GAMMA * np.max(Q[next_state, :]) - Q[state, action])\n", + "\n", + " state = next_state\n", + "\n", + " if done: \n", + " rewards.append(reward)\n", + " epsilon -= 0.001\n", + " break # reached goal\n", + "\n", + "print(Q)\n", + "print(f\"Average reward: {sum(rewards)/len(rewards)}:\")\n", + "# and now we can see our Q values!" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Zo-tNznd65US", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# we can plot the training progress and see how the agent improved\n", + "import matplotlib.pyplot as plt\n", + "\n", + "def get_average(values):\n", + " return sum(values)/len(values)\n", + "\n", + "avg_rewards = []\n", + "for i in range(0, len(rewards), 100):\n", + " avg_rewards.append(get_average(rewards[i:i+100])) \n", + "\n", + "plt.plot(avg_rewards)\n", + "plt.ylabel('average reward')\n", + "plt.xlabel('episodes (100\\'s)')\n", + "plt.show()" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gy4YH2m9s1ww", + "colab_type": "text" + }, + "source": [ + "##Sources\n", + "1. Violante, Andre. “Simple Reinforcement Learning: Q-Learning.” Medium, Towards Data Science, 1 July 2019, https://towardsdatascience.com/simple-reinforcement-learning-q-learning-fcddc4b6fe56.\n", + "\n", + "2. Openai. “Openai/Gym.” GitHub, https://github.com/openai/gym/wiki/FrozenLake-v0." + ] + } + ] +} \ No newline at end of file From 8f108d45c8c80d37f9868f1b85f8e0c759695031 Mon Sep 17 00:00:00 2001 From: Sounak Pal <44639229+SounakPal212@users.noreply.github.com> Date: Fri, 14 Aug 2020 20:38:56 +0530 Subject: [PATCH 6/7] Created using Colaboratory --- Core_Learning_Algorithms.ipynb | 2415 ++++++++++++++++++++++++++++++++ 1 file changed, 2415 insertions(+) create mode 100644 Core_Learning_Algorithms.ipynb diff --git a/Core_Learning_Algorithms.ipynb b/Core_Learning_Algorithms.ipynb new file mode 100644 index 0000000..522492b --- /dev/null +++ b/Core_Learning_Algorithms.ipynb @@ -0,0 +1,2415 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Core Learning Algorithms.ipynb", + "provenance": [], + "collapsed_sections": [ + "sIBZww6kOIAp", + "UQlXWErlbhsG" + ], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tUgsCvCHLksw", + "colab_type": "text" + }, + "source": [ + "#TensorFlow Core Learning Algorithms\n", + "In this notebook we will walk through 4 fundemental machine learning algorithms. We will apply each of these algorithms to unique problems and datasets before highlighting the use cases of each.\n", + "\n", + "The algorithms we will focus on include:\n", + "- Linear Regression\n", + "- Classification\n", + "- Clustering\n", + "- Hidden Markov Models\n", + "\n", + "It is worth noting that there are many tools within TensorFlow that could be used to solve the problems we will see below. I have chosen the tools that I belive give the most variety and are easiest to use." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mbdnKlcuMM4u", + "colab_type": "text" + }, + "source": [ + "##Linear Regression\n", + "Linear regression is one of the most basic forms of machine learning and is used to predict numeric values. \n", + "\n", + "In this tutorial we will use a linear model to predict the survival rate of passangers from the titanic dataset.\n", + "\n", + "*This section is based on the following documentation: https://www.tensorflow.org/tutorials/estimator/linear*\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wsJ-Iov1P4RK", + "colab_type": "text" + }, + "source": [ + "###How it Works\n", + "Before we dive in, I will provide a very surface level explination of the linear regression algorithm.\n", + "\n", + "Linear regression follows a very simple concept. If data points are related linearly, we can generate a line of best fit for these points and use it to predict future values.\n", + "\n", + "Let's take an example of a data set with one feature and one label.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "t4Hp-dYmR2jf", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 286 + }, + "outputId": "bba6a7c7-9dba-4fa0-9409-e4f692155be9" + }, + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "\n", + "x = [1, 2, 2.5, 3, 4]\n", + "y = [1, 4, 7, 9, 15]\n", + "plt.plot(x, y, 'ro')\n", + "plt.axis([0, 6, 0, 20])" + ], + "execution_count": 1, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(0.0, 6.0, 0.0, 20.0)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 1 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAASt0lEQVR4nO3df4xlZ33f8fdnvSbtLG5s6qljbO8OSiwjgopxrpYgKDK/HNuxMK1Qa2tKTIo0SQQVqJVSkpVCS7QSVRVStURYU9vFtDeGBHBiNQa8SpAMEj88u1njn8Su5V3vxngXlti4EwWZfPvHPQvj8R3Pj3Nn7syc90u6Ouc857nnfK8sf+bsc59zbqoKSVI37Bh3AZKkjWPoS1KHGPqS1CGGviR1iKEvSR1i6EtShywb+kkuSvLlJA8meSDJB5r2lyU5kOSRZnnOEu+/oenzSJIbRv0BJEkrl+Xm6Sc5Hzi/qg4lOQs4CLwTeA9wqqo+muRDwDlV9R8WvfdlwBzQA6p57y9U1fdH/kkkScta9kq/qp6sqkPN+g+Ah4ALgGuBW5tutzL4Q7DYLwEHqupUE/QHgCtHUbgkafV2rqZzkingtcA3gPOq6slm13eA84a85QLgiQXbx5q2YceeAWYAdu3a9QuvfOUrV1OaJHXawYMHv1tVk8v1W3HoJ3kp8Dngg1X1TJIf76uqStLqeQ5VNQvMAvR6vZqbm2tzOEnqlCRHVtJvRbN3kpzJIPD7VfX5pvmpZrz/9Lj/iSFvPQ5ctGD7wqZNkjQGK5m9E+Bm4KGq+tiCXXcAp2fj3AD86ZC3fwm4Isk5zeyeK5o2SdIYrORK/w3Au4G3JDncvK4GPgq8PckjwNuabZL0ktwEUFWngN8F7mleH2naJEljsOyUzXFwTF+SVifJwarqLdfPO3IlqUMMfUnqEENfkjrE0JekDjH0JalDDH1J6hBDX5I6xNCXpA4x9CWpQwx9SeoQQ1+SOsTQl6QOMfQlqUMMfUnqEENfkjrE0JekDjH0JalDDH1J6pCdy3VIcgtwDXCiql7dtH0GuKTpcjbwN1V16ZD3Pg78APgR8NxKfspLkrR+lg194JPAx4FPnW6oqn91ej3J7wFPv8j731xV311rgZKk0Vk29Kvq7iRTw/YlCfAvgbeMtixJ0npoO6b/z4CnquqRJfYXcFeSg0lmWp5LktTSSoZ3Xsz1wG0vsv+NVXU8yT8BDiR5uKruHtax+aMwA7B79+6WZUmShlnzlX6SncC/AD6zVJ+qOt4sTwC3A3tfpO9sVfWqqjc5ObnWsiRJL6LN8M7bgIer6tiwnUl2JTnr9DpwBXB/i/NJklpaNvST3AZ8DbgkybEk7212XceioZ0kL09yZ7N5HvDVJPcC3wT+rKq+OLrSJUmrtZLZO9cv0f6eIW1/DVzdrD8GvKZlfZKkEfKOXEnqEENfkjrE0JekDjH0JalDDH1J6hBDX5I6xNCXpA4x9CWpQwx9SeoQQ1+SOsTQl6QOMfQlqUMMfUnqEENfkjrE0JekDjH0JalDDH1J6hBDX5I6xNCXpA5ZyQ+j35LkRJL7F7T9xyTHkxxuXlcv8d4rk3w7yaNJPjTKwiWNUL8PU1OwY8dg2e+PuyKtk5Vc6X8SuHJI++9X1aXN687FO5OcAfwBcBXwKuD6JK9qU6ykddDvw8wMHDkCVYPlzIzBv00tG/pVdTdwag3H3gs8WlWPVdUPgU8D167hOJLW0759MD///Lb5+UG7tp02Y/rvT/KtZvjnnCH7LwCeWLB9rGkbKslMkrkkcydPnmxRlqRVOXp0de3a0tYa+p8Afha4FHgS+L22hVTVbFX1qqo3OTnZ9nCSVmr37tW1a0tbU+hX1VNV9aOq+nvgfzAYylnsOHDRgu0LmzZJm8n+/TAx8fy2iYlBu7adNYV+kvMXbP5z4P4h3e4BLk7yiiQvAa4D7ljL+SSto+lpmJ2FPXsgGSxnZwft2nZ2LtchyW3A5cC5SY4BHwYuT3IpUMDjwK81fV8O3FRVV1fVc0neD3wJOAO4paoeWJdPIamd6WlDviNSVeOu4QV6vV7Nzc2NuwxJ2jKSHKyq3nL9vCNXkjrE0JekDjH0JalDDH1J6hBDX5I6xNCXpA4x9CWpQwx9SeoQQ1+SOsTQl6QOMfQlqUMMfUnqEENfkjrE0JekDjH0JalDDH1J6hBDX5I6xNCXpA5ZNvST3JLkRJL7F7T9lyQPJ/lWktuTnL3Eex9Pcl+Sw0n8/UNJGrOVXOl/ErhyUdsB4NVV9U+BvwJ+60Xe/+aqunQlv90oSVpfy4Z+Vd0NnFrUdldVPddsfh24cB1qkySN2CjG9P8N8IUl9hVwV5KDSWZe7CBJZpLMJZk7efLkCMqSJC3WKvST7AOeA/pLdHljVV0GXAW8L8mbljpWVc1WVa+qepOTk23KkiQtYc2hn+Q9wDXAdFXVsD5VdbxZngBuB/au9XySpPbWFPpJrgR+E3hHVc0v0WdXkrNOrwNXAPcP6ytJ2hgrmbJ5G/A14JIkx5K8F/g4cBZwoJmOeWPT9+VJ7mzeeh7w1ST3At8E/qyqvrgun0KStCI7l+tQVdcPab55ib5/DVzdrD8GvKZVdZKkkfKOXEnqEENfkjrE0JekDjH0JalDDH1J6hBDX5I6xNCXpA4x9CWpQwx9SeoQQ1+SOsTQl6QOMfQlqUMMfUnqEENfkjrE0JekDjH0JalDDH1J6hBDX5I6ZEWhn+SWJCeS3L+g7WVJDiR5pFmes8R7b2j6PJLkhlEVLklavZVe6X8SuHJR24eAP6+qi4E/b7afJ8nLgA8DrwP2Ah9e6o+DJGn9rSj0q+pu4NSi5muBW5v1W4F3DnnrLwEHqupUVX0fOMAL/3hIkjZImzH986rqyWb9O8B5Q/pcADyxYPtY0/YCSWaSzCWZO3nyZIuyJElLGckXuVVVQLU8xmxV9aqqNzk5OYqyJEmLtAn9p5KcD9AsTwzpcxy4aMH2hU2bJGkM2oT+HcDp2Tg3AH86pM+XgCuSnNN8gXtF0yZJGoOVTtm8DfgacEmSY0neC3wUeHuSR4C3Ndsk6SW5CaCqTgG/C9zTvD7StEmSxiCD4fjNpdfr1dzc3LjLkKQtI8nBquot1887ciWpQwx9SeoQQ19aiX4fpqZgx47Bst8fd0XSmuwcdwHSptfvw8wMzM8Pto8cGWwDTE+Pry5pDbzSl5azb99PAv+0+flBu7TFGPrSco4eXV27tIkZ+tJydu9eXbu0iRn60nL274eJiee3TUwM2qUtxtCXljM9DbOzsGcPJIPl7Kxf4mpLcvaOtBLT04a8tgWv9CWpQwx9SeoQQ1+SOsTQl6QOMfQlqUMMfUnqEENfkjrE0JekDllz6Ce5JMnhBa9nknxwUZ/Lkzy9oM/vtC9ZkrRWa74jt6q+DVwKkOQM4Dhw+5CuX6mqa9Z6HknS6IxqeOetwP+tqiMjOp4kaR2MKvSvA25bYt/rk9yb5AtJfn6pAySZSTKXZO7kyZMjKkuStFDr0E/yEuAdwB8P2X0I2FNVrwH+O/AnSx2nqmarqldVvcnJybZlSZKGGMWV/lXAoap6avGOqnqmqp5t1u8Ezkxy7gjOKUlag1GE/vUsMbST5GeSpFnf25zveyM4p7qu34epKdixY7Ds98ddkbQltHqefpJdwNuBX1vQ9usAVXUj8C7gN5I8B/wtcF1VVZtzSvT7MDPzkx8rP3JksA0+815aRjZjBvd6vZqbmxt3GdqspqYGQb/Ynj3w+OMbXY20KSQ5WFW95fp5R662nqNHV9cu6ccMfW09u3evrl3Sjxn62nr274eJiee3TUwM2iW9KENfW8/0NMzODsbwk8FydtYvcaUVaDV7Rxqb6WlDXloDr/QlqUMMfUnqEENfkjrE0JekDjH0JalDDH1J6hBDX5I6xNCXpA4x9CWpQwx9SeoQQ1+SOsTQl6QOMfQlqUNah36Sx5Pcl+Rwkhf8xmEG/luSR5N8K8llbc8pSVqbUT1a+c1V9d0l9l0FXNy8Xgd8ollKkjbYRgzvXAt8qga+Dpyd5PwNOK8kaZFRhH4BdyU5mGRmyP4LgCcWbB9r2p4nyUySuSRzJ0+eHEFZkqTFRhH6b6yqyxgM47wvyZvWcpCqmq2qXlX1JicnR1CWJGmx1qFfVceb5QngdmDvoi7HgYsWbF/YtEmSNlir0E+yK8lZp9eBK4D7F3W7A/iVZhbPLwJPV9WTbc4rSVqbtrN3zgNuT3L6WH9YVV9M8usAVXUjcCdwNfAoMA/8astzSpLWqFXoV9VjwGuGtN+4YL2A97U5jyRpNLwjV5I6xNCXpA4x9DUa/T5MTcGOHYNlvz/uiiQNMarHMKjL+n2YmYH5+cH2kSODbYDp6fHVJekFvNJXe/v2/STwT5ufH7RL2lQMfbV39Ojq2iWNjaGv9nbvXl27pLEx9NXe/v0wMfH8tomJQbukTcXQV3vT0zA7C3v2QDJYzs76Ja60CTl7R6MxPW3IS1uAV/qS1CGGviR1iKEvSR1i6EtShxj6ktQhhr4kdYihL0kdYuhLUoesOfSTXJTky0keTPJAkg8M6XN5kqeTHG5ev9OuXElSG23uyH0O+PdVdSjJWcDBJAeq6sFF/b5SVde0OI8kaUTWfKVfVU9W1aFm/QfAQ8AFoypMkjR6IxnTTzIFvBb4xpDdr09yb5IvJPn5UZxPkrQ2rR+4luSlwOeAD1bVM4t2HwL2VNWzSa4G/gS4eInjzAAzALt9DrskrYtWV/pJzmQQ+P2q+vzi/VX1TFU926zfCZyZ5Nxhx6qq2arqVVVvcnKyTVmSpCW0mb0T4Gbgoar62BJ9fqbpR5K9zfm+t9ZzSpLaaTO88wbg3cB9SQ43bb8N7AaoqhuBdwG/keQ54G+B66qqWpxTktTCmkO/qr4KZJk+Hwc+vtZzSJJGyztyJalDDH1J6hBDX5I6xNCXpA4x9CWpQwx9SeoQQ1+SOsTQl6QOMfQ3Sr8PU1OwY8dg2e+PuyJJHdT6KZtagX4fZmZgfn6wfeTIYBtgenp8dUnqHK/0N8K+fT8J/NPm5wftkrSBDP2NcPTo6tolaZ0Y+hthqR+F8cdiJG0wQ38j7N8PExPPb5uYGLRL0gYy9DfC9DTMzsKePZAMlrOzfokracM5e2ejTE8b8pLGzit9SeoQQ1+SOsTQl6QOaRX6Sa5M8u0kjyb50JD9P5XkM83+bySZanM+SVI7aw79JGcAfwBcBbwKuD7JqxZ1ey/w/ar6OeD3gf+81vNJktprc6W/F3i0qh6rqh8CnwauXdTnWuDWZv2zwFuTpMU5JUkttJmyeQHwxILtY8DrlupTVc8leRr4x8B3Fx8syQzQPIWMv0tyf4vaNrNzGfL5txE/39bm59u6LllJp00zT7+qZoFZgCRzVdUbc0nrYjt/NvDzbXV+vq0rydxK+rUZ3jkOXLRg+8KmbWifJDuBnwa+1+KckqQW2oT+PcDFSV6R5CXAdcAdi/rcAdzQrL8L+IuqqhbnlCS1sObhnWaM/v3Al4AzgFuq6oEkHwHmquoO4GbgfyV5FDjF4A/DSsyuta4tYDt/NvDzbXV+vq1rRZ8tXnhLUnd4R64kdYihL0kdsqlCf7nHOmxlSW5JcmK73n+Q5KIkX07yYJIHknxg3DWNUpJ/kOSbSe5tPt9/GndNo5bkjCR/meT/jLuWUUvyeJL7khxe6dTGrSTJ2Uk+m+ThJA8lef2SfTfLmH7zWIe/At7O4Eave4Drq+rBsRY2IkneBDwLfKqqXj3uekYtyfnA+VV1KMlZwEHgndvov1+AXVX1bJIzga8CH6iqr4+5tJFJ8u+AHvCPquqacdczSkkeB3pVtS1vzEpyK/CVqrqpmU05UVV/M6zvZrrSX8ljHbasqrqbwQymbamqnqyqQ836D4CHGNyRvS3UwLPN5pnNa3NcMY1AkguBXwZuGnctWp0kPw28icFsSarqh0sFPmyu0B/2WIdtExpd0jxN9bXAN8ZbyWg1wx+HgRPAgaraTp/vvwK/Cfz9uAtZJwXcleRg88iX7eQVwEngfzbDczcl2bVU580U+toGkrwU+Bzwwap6Ztz1jFJV/aiqLmVw9/neJNtimC7JNcCJqjo47lrW0Rur6jIGTwV+XzPcul3sBC4DPlFVrwX+H7Dkd6KbKfRX8lgHbWLNWPfngH5VfX7c9ayX5p/OXwauHHctI/IG4B3NuPengbck+d/jLWm0qup4szwB3M5gOHm7OAYcW/Avz88y+CMw1GYK/ZU81kGbVPNF583AQ1X1sXHXM2pJJpOc3az/QwYTDh4eb1WjUVW/VVUXVtUUg//v/qKq/vWYyxqZJLuayQU0wx5XANtmFl1VfQd4Isnpp2y+FVhyAsVmesrm0Mc6jLmskUlyG3A5cG6SY8CHq+rm8VY1Um8A3g3c14x7A/x2Vd05xppG6Xzg1maW2Q7gj6pq201t3KbOA25vfspjJ/CHVfXF8ZY0cv8W6DcXzI8Bv7pUx00zZVOStP420/COJGmdGfqS1CGGviR1iKEvSR1i6EtShxj6ktQhhr4kdcj/B+lpOrDm/pWSAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2m1DTp22SLo3", + "colab_type": "text" + }, + "source": [ + "We can see that this data has a linear coorespondence. When the x value increases, so does the y. Because of this relation we can create a line of best fit for this dataset. In this example our line will only use one input variable, as we are working with two dimensions. In larger datasets with more features our line will have more features and inputs.\n", + "\n", + "\"Line of best fit refers to a line through a scatter plot of data points that best expresses the relationship between those points.\" (https://www.investopedia.com/terms/l/line-of-best-fit.asp)\n", + "\n", + "Here's a refresher on the equation of a line in 2D.\n", + "\n", + "$ y = mx + b $\n", + "\n", + "Here's an example of a line of best fit for this graph.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Kv5eKLP_UYZi", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 269 + }, + "outputId": "ced4a7f7-8684-4cd5-968e-adf1e19fab6e" + }, + "source": [ + "plt.plot(x, y, 'ro')\n", + "plt.axis([0, 6, 0, 20])\n", + "plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))\n", + "plt.show()" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAf/klEQVR4nO3deXxU9bnH8c/DpgRBQRCRXYsoorKEAGqtdStSq9a6QCO7Bmxt1dtNa622vfR6721te2srpuwQQBGttqUqbbXqlSUJArLKIkvYEgirYUvy3D9muIWYkJCZ5EzmfN+vV14z8zu/M/PMy/Y7h9+c84y5OyIiEg71gi5ARERqj0JfRCREFPoiIiGi0BcRCRGFvohIiCj0RURCpNLQN7P2Zva2ma00sxVm9nB0vIWZzTOztdHb5hXsPyw6Z62ZDYv3GxARkaqzys7TN7M2QBt3X2xmTYFc4A5gOFDo7s+Y2WNAc3f/QZl9WwA5QCrg0X17u/ueuL8TERGpVKVH+u6+3d0XR+8fAFYBbYHbgSnRaVOIfBCU9SVgnrsXRoN+HjAgHoWLiMjpa3A6k82sE9ATWAi0dvft0U07gNbl7NIW2HLC47zoWHnPnQFkADRp0qT3JZdccjqliYiEWm5u7i53b1XZvCqHvpmdBcwBHnH3/Wb2/9vc3c0spn4O7p4JZAKkpqZ6Tk5OLE8nIhIqZrapKvOqdPaOmTUkEvhZ7v5KdHhndL3/+Lp/fjm7bgXan/C4XXRMREQCUJWzdwyYAKxy92dP2PQ6cPxsnGHAa+Xs/iZws5k1j57dc3N0TEREAlCVI/2rgSHA9Wa2JPo3EHgGuMnM1gI3Rh9jZqlmNh7A3QuBnwHZ0b+fRsdERCQAlZ6yGQSt6YuInB4zy3X31Mrm6YpcEZEQUeiLiISIQl9EJEQU+iIiIaLQFxEJEYW+iEiIKPRFREJEoS8iEiIKfRGREFHoi4iEiEJfRCREFPoiIiGi0BcRCRGFvohIiCj0RURCRKEvIhIiCn0RkRBR6IuIhEiDyiaY2UTgViDf3btHx14EukannAPsdfce5ey7ETgAlADFVfkpLxERqTmVhj4wGXgOmHp8wN3vPX7fzH4J7DvF/l90913VLVBEROKn0tB393fNrFN528zMgHuA6+NbloiI1IRY1/Q/D+x097UVbHfgLTPLNbOMGF9LRERiVJXlnVMZDMw8xfZr3H2rmZ0HzDOz1e7+bnkTox8KGQAdOnSIsSwRESlPtY/0zawBcCfwYkVz3H1r9DYfeBVIO8XcTHdPdffUVq1aVbcsERE5hViWd24EVrt7XnkbzayJmTU9fh+4GVgew+uJiEiMKg19M5sJzAe6mlmemY2KbhpEmaUdM7vAzOZGH7YG3jezpcAi4C/u/kb8ShcRkdNVlbN3BlcwPrycsW3AwOj9DcCVMdYnIiJxpCtyRURCRKEvIhIiCn0RkRBR6IuIhIhCX0QkRBT6IiIhotAXEQkRhb6ISIgo9EVEQkShLyISIgp9EZEQUeiLiISIQl9EJEQU+iIiIaLQFxEJEYW+iEiIKPRFREJEoS8iEiIKfRGREKnKD6NPNLN8M1t+wtjTZrbVzJZE/wZWsO8AM1tjZuvM7LF4Fi4icZSVBZ06Qb16kdusrKArkhpSlSP9ycCAcsZ/5e49on9zy240s/rA74BbgG7AYDPrFkuxIlIDsrIgIwM2bQL3yG1GhoI/SVUa+u7+LlBYjedOA9a5+wZ3PwrMAm6vxvOISE164gkoKjp5rKgoMi5JJ5Y1/YfMbFl0+ad5OdvbAltOeJwXHSuXmWWYWY6Z5RQUFMRQloicls2bT29c6rTqhv7zwEVAD2A78MtYC3H3THdPdffUVq1axfp0IlJVHTqc3rjUadUKfXff6e4l7l4K/IHIUk5ZW4H2JzxuFx0TkUQydiykpJw8lpISGZekU63QN7M2Jzz8KrC8nGnZQBcz62xmjYBBwOvVeT0RqUHp6ZCZCR07glnkNjMzMi5Jp0FlE8xsJnAd0NLM8oCngOvMrAfgwEZgdHTuBcB4dx/o7sVm9hDwJlAfmOjuK2rkXYhIbNLTFfIhYe4edA2fkZqa6jk5OUGXISJSZ5hZrrunVjZPV+SKCACHj5UEXYLUAoW+iLBj32Fue+59xr+3IehSpIZVuqYvIsltfcFBhk5YxN6io3Rr0yzocqSGKfRFQmxZ3l6GT8rGgFkZ/bm83dlBlyQ1TKEvElLvr93F6Gk5nJPSiGmj0riw1VlBlyS1QKEvEkJ/WbadR19cQueWTZg6Ko3Wzc4MuiSpJQp9kZCZvmATT762nF4dmjNxWB/OTmkYdElSixT6IiHh7vz2H+t4dt7HXH/Jefzu671o3Kh+0GVJLVPoi4RAaanzkz+tYMr8TdzZsy3/edcVNKyvM7bDSKEvkuSOFpfy3dlLeX3pNkZd05knBl5KvXoWdFkSEIW+SBIrOlrMmOmLeffjAr4/oCsPfuEizBT4YabQF0lSez49yojJ2SzL28szd17OoDT1xxeFvkhS2rb3EEMnLmJzYRHP39ebL112ftAlSYJQ6IskmXX5Bxk6YSH7DxczZUQa/S86N+iSJIEo9EWSyJItexkxaRH16xmzMvrRva3aKsjJFPoiSeK9tQWMnpbLuWc1YtrIvnRq2STokiQBKfRFksCflm7j315awkWtzmLqyDTOU1sFqYBCX6SOmzZ/Iz9+fQWpHZszflgfzm6stgpSsar8Ru5E4FYg3927R8f+G/gKcBRYD4xw973l7LsROACUAMVV+SkvEakad+fXf1vLb/6+lhsvPY/nvt6LMxuqrYKcWlWuw54MDCgzNg/o7u5XAB8Dj59i/y+6ew8Fvkj8lJQ6P35tBb/5+1q+1qsd4+7rrcCXKqk09N39XaCwzNhb7l4cfbgAaFcDtYlIOY4Ul/DtWR8ybcEmMq69kF/cfQUN1EdHqige/0sZCfy1gm0OvGVmuWaWcaonMbMMM8sxs5yCgoI4lCWSfD49UsyoyTn8Zdl2Hr/lEn448FK1VZDTEtMXuWb2BFAMZFUw5Rp332pm5wHzzGx19F8On+HumUAmQGpqqsdSl0gyKvz0KCMmLWL5tv38111XcE9q+6BLkjqo2qFvZsOJfMF7g7uXG9LuvjV6m29mrwJpQLmhLyIV27r3EEMmLGTrnkOMu683N3VrHXRJUkdVa3nHzAYA3wduc/eiCuY0MbOmx+8DNwPLq1uoSFit3XmAu57/gIL9R5g6Mk2BLzGpNPTNbCYwH+hqZnlmNgp4DmhKZMlmiZmNi869wMzmRndtDbxvZkuBRcBf3P2NGnkXIklq8eY93P3CfI6VOC+O7k/fC9VHR2JT6fKOuw8uZ3hCBXO3AQOj9zcAV8ZUnUiIvbMmnwenL6ZV0zOYPqovHc5NCbokSQK6IlckAb22ZCvfeWkpXVo3ZcrIPpzXVG0VJD4U+iIJZvL/fsLTf1pJWucWjB+WSrMz1VZB4kehL5Ig3J1fzfuY//nHOm7q1prfDu6pq2wl7hT6IgmgpNR58rXlzFi4mXtS2/Hzr16uq2ylRij0RQJ2pLiER19cwtyPdjDmCxfxgwFddZWt1BiFvkiADh4pJmNqDh+s380TAy/lgWsvDLokSXIKfZGA7D54hOGTslm5fT+/vPtKvtZbfQul5in0RQKQt6eIoRMWsXXvITKH9OaGS3WVrdQOhb5ILft45wGGTFjIoaMlTL+/L306tQi6JAkRhb5ILcrdVMjIyTmc0aAeL47uz6VtmgVdkoSMQl+klry9Jp8Hp+dyfrMzmTaqL+1bqK2C1D6FvkgtePXDPL43exldz2/KlJFptDzrjKBLkpBS6IvUsAnvf8LP/ryS/heeS+bQ3jRVWwUJkEJfpIa4O794aw2/e3s9Ay47n18P6qG2ChI4hb5IDSgpdX70x4+YuWgLg9Pa8+93XE79errKVoKn0BeJs8PHSnhk1hLeWLGDb37xIr57s9oqSOJQ6IvE0YHDx8iYmsv8Dbt58tZujLqmc9AliZxEoS8SJwUHjjB80iLW7DjAr+/twR092wZdkshnVKl3q5lNNLN8M1t+wlgLM5tnZmujt80r2HdYdM5aMxsWr8JFEsmWwiLuHvcB6wsO8odhqQp8SVhVbdg9GRhQZuwx4O/u3gX4e/TxScysBfAU0BdIA56q6MNBpK5avWM/X3v+A/YUHSPr/n58set5QZckUqEqhb67vwsUlhm+HZgSvT8FuKOcXb8EzHP3QnffA8zjsx8eInVW9sZC7hk3HzOYPaY/vTvqmEYSWyw/zdPa3bdH7+8AymsT2BbYcsLjvOjYZ5hZhpnlmFlOQUFBDGWJ1I6/r9rJfeMX0vKsM5jz4FVc3Lpp0CWJVCouv8fm7g54jM+R6e6p7p7aqlWreJQlUmPm5OaRMS2Xi1s3ZfaY/rRrrj46UjfEEvo7zawNQPQ2v5w5W4H2JzxuFx0TqbP+8O4GvjN7Kf0ubMHMjH6cqz46UofEEvqvA8fPxhkGvFbOnDeBm82sefQL3JujYyJ1jrvzzF9XM3buKgZefj4Th/fhrDN01rPULVU9ZXMmMB/oamZ5ZjYKeAa4yczWAjdGH2NmqWY2HsDdC4GfAdnRv59Gx0TqlOKSUn4wZxnj/rme9L4d+O3gXpzRQH10pO6xyHJ8YklNTfWcnJygyxABIm0VvjXzQ+at3Mm3r/8cj950sdoqSMIxs1x3T61snv5tKnIK+w8f4/4pOSz6pJCnv9KN4VerrYLUbQp9kQrkHzjMsInZrN15gN8M6sHtPXSVrdR9cTllUyTZbN5dxN3j5rNx16dMGN6H21e8A506Qb16kdusrIArFKkeHemLlLFy236GTVrEsZJSZjzQl57vzYWMDCgqikzYtCnyGCA9PbhCRapBR/oiJ1j0SSH3Zs6nQT1j9uj+9OzQHJ544l+Bf1xRUWRcpI7Rkb5I1LyVO3loxmLaNm/MtFF9aXtO48iGzZvL36GicZEEpiN9EWB2zhbGTM/lkvOb8vKYq/4V+AAdOpS/U0XjIglMoS+h98I/1/O9l5dx1UXnMuOBfrRo0ujkCWPHQkqZ3jopKZFxkTpGoS+h5e78fO4q/uOvq7n1ijaMH5ZKk/LaKqSnQ2YmdOwIZpHbzEx9iSt1ktb0JZSKS0p57JWPeDk3jyH9OvL0bZdRv94prrJNT1fIS1JQ6EvoHD5WwkMzFvO3Vfk8cmMXHr6hi9oqSGgo9CVU9h06xgNTcsjeVMjPbr+MIf07BV2SSK1S6Eto5O8/zNCJi1hfcJD/GdSTr1x5QdAlidQ6hb6EwqbdnzJkwiJ2HTzCxOF9+HwX/TqbhJNCX5Le8q37GD4pm5LSUmY+0I8r258TdEkigdEpm5LUFmzYzeDMBTSqb8wec5UCX0JPR/qStN5csYNvzfyQDi1SmDYqjTZnN658J5Ekp9CXpPRi9mYef+Ujrmh3DpOG96F52atsRUKq2ss7ZtbVzJac8LffzB4pM+c6M9t3wpwfx16ySMXcneffWc8P5nzENV1aMeOBvgp8kRNU+0jf3dcAPQDMrD6wFXi1nKnvufut1X0dkaoqLY20VRj//ifcduUF/OLuK2nUQF9biZwoXss7NwDr3X1TnJ5P5LQcKynlB3OW8crirQy/qhM/vrUb9U7VVkEkpOJ1GDQImFnBtv5mttTM/mpml1X0BGaWYWY5ZpZTUFAQp7IkDA4dLWH0tFxeWbyV79x0MU99RYEvUhFz99iewKwRsA24zN13ltnWDCh194NmNhD4jbt3qew5U1NTPScnJ6a6JBz2FR1j1JRscjfv4d/v6E56345BlyQSCDPLdffUyubF40j/FmBx2cAHcPf97n4wen8u0NDMWsbhNUXYuf8w97wwn2V5+/jd13sp8EWqIB5r+oOpYGnHzM4Hdrq7m1kakQ+Z3XF4TQm5TybOZMjio+xplMKk/x3P1d2GwuVqfSxSmZhC38yaADcBo08YGwPg7uOAu4AHzawYOAQM8ljXkyT0lo+fxbBljtdryMyZP+SKHesg4/3IRvW8FzmlmNf0a4LW9KUiH6zfRcbv3+Hsov1Me/FJLtyz7V8bO3aEjRsDq00kSFVd09cVuVJnvLF8O9+euYROe/OZ+tKPOf9gmZXCzZuDKUykDtGVK1InzFy0mW9kLaZ722a89N5znw18gA4dar8wkTpGoS8Jzd353dvrePyVj/jCxa3Iur8f5zz1BKSknDwxJQXGjg2mSJE6RKEvCau01Pnpn1fy32+u4as925I5NJXGjepHvqzNzIys4ZtFbjMz9SWuSBVoTV8S0rGSUr43eyl/XLKNkVd35kdfvvTkq2zT0xXyItWg0JeEU3S0mG9kLeadNQV870td+cZ1F2Gmtgoi8aDQl4Syt+goIydns2TLXp6583IGpenLWZF4UuhLwtix7zBDJy5k464ifp/eiwHd2wRdkkjSUehLQlhfcJChExax79AxJo/sw1UXqUWTSE1Q6EvgluXtZfikbAyYldGP7m3PDrokkaSl0JdAvb92F6On5dC8SSOmjepL55ZNgi5JJKkp9CUwf1m2nUdfXMKFrZowZWQarZudGXRJIklPoS+BmL5gE0++tpzeHZozYVgfzk5pGHRJIqGg0Jda5e789h/reHbex9xwyXk89/VekatsRaRWKPSl1pSWOj/50wqmzN/Enb3a8p9fu4KG9dUJRKQ2KfSlVhwtLuW7s5fy+tJtPPD5zjx+y6X68XKRACj0pcYVHS1mzPTFvPtxAY/dcgljvnBR0CWJhJZCX2rUnk+PMmJyNsvy9vJfX7uCe/q0D7okkVCLOfTNbCNwACgBisv+XJdFOmX9BhgIFAHD3X1xrK8riW/b3kMMnbiIzYVFjLuvNzdfdn7QJYmEXryO9L/o7rsq2HYL0CX61xd4PnorSWxd/kGGTljIgcPFTBuZRt8Lzw26JBGhdn5E5XZgqkcsAM4xM3XSSmJLtuzl7nEfcLTEmTW6nwJfJIHEI/QdeMvMcs0so5ztbYEtJzzOi46dxMwyzCzHzHIKCgriUJYE4b21BXz9DwtoemZD5jzYn8suUB8dkUQSj9C/xt17EVnG+aaZXVudJ3H3THdPdffUVq1axaEsqW1/WrqNkZOz6XhuE14e05+O56qPjkiiiTn03X1r9DYfeBVIKzNlK3DiKRvtomOSRKbN38i3Z31Iz/bNmZXRj/PUR0ckIcUU+mbWxMyaHr8P3AwsLzPtdWCoRfQD9rn79lheVxKHu/OreR/z5GsruOGS1kwdlcbZjdVHRyRRxXr2Tmvg1ejvlzYAZrj7G2Y2BsDdxwFziZyuuY7IKZsjYnxNSRAlpc7Tr69g2oJN3N27Hf9x5+U0UFsFkYQWU+i7+wbgynLGx51w34FvxvI6kniOFJfwby8t5S/LtjP6Cxfy2IBL9OPlInWArsiV03bwSDFjpuXy/rpd/HDgJWRcq7YKInWFQl9OS+GnRxkxaRHLt+3nF3dfyV292wVdkoicBi3ASpVt3XuIu8Z9wOodB3jhvt4nB35WFnTqBPXqRW6zsoIqU0ROQUf6UiVrdx5gyIRFfHq0mOn396VPpxb/2piVBRkZUFQUebxpU+QxQHp67RcrIhXSkb5UavHmPdz9wnxK3HlpdP+TAx/giSf+FfjHFRVFxkUkoehIX07pnTX5PDh9Mec1O4Ppo/rSvkXKZydt3lz+zhWNi0hgdKQvFXptyVbun5JD55ZNeHnMVeUHPkCHDqc3LiKBUehLuSb97yc8PGsJvTs2Z9bofrRqekbFk8eOhZQyHwgpKZFxEUkoCn05ibvzy7fW8JM/reTmbq2ZMjKNZmdW0lYhPR0yM6FjRzCL3GZm6ktckQSkNX35fyWlzpOvLWfGws3cm9qesV/tXvW2CunpCnmROkChL0CkrcKjLy5h7kc7+MZ1F/G9L3VVWwWRJKTQFw4eKSZjag4frN/Nj758Kfd//sKgSxKRGqLQD7ldB48wYlI2K7fv59l7ruTOXmqrIJLMFPohtqWwiGETF7Ft3yH+MLQ311/SOuiSRKSGKfRDas2OAwyduJBDR0uYPqovqWWvshWRpKTQD6HcTYWMmJRN40b1mT3mKrqe3zTokkSklij0Q+bt1fk8mJVLm7MbM3VkWsVX2YpIUlLoh8irH+bx3dnLuLRNUyaPSKPlWae4ylZEklK1r8g1s/Zm9raZrTSzFWb2cDlzrjOzfWa2JPr349jKleqa8P4nPPriUvp2bsHMB/op8EVCKpYj/WLgO+6+2MyaArlmNs/dV5aZ95673xrD60gM3J3/fnMNv39nPbd0P59f3duDMxvWD7osEQlItUPf3bcD26P3D5jZKqAtUDb0JSDFJaX86I/LmZW9hcFpHfj3O7pTv56ushUJs7g0XDOzTkBPYGE5m/ub2VIz+6uZXRaP15PKHT5WwjdnLGZW9ha+df3n+PlXFfgiEocvcs3sLGAO8Ii77y+zeTHQ0d0PmtlA4I9AlwqeJwPIAOigPuwxOXD4GA9MzWHBhkKe+ko3RlzdOeiSRCRBxHSkb2YNiQR+lru/Una7u+9394PR+3OBhmbWsrzncvdMd09199RWrVrFUlaoFRw4wqDMBeRs3MOv7+2hwBeRk1T7SN8iLRgnAKvc/dkK5pwP7HR3N7M0Ih8yu6v7mnJqWwqLGDJhITv3H2H8sFSu63pe0CWJSIKJZXnnamAI8JGZLYmO/RDoAODu44C7gAfNrBg4BAxyd4/hNaUCq7bvZ+jERRwtLmX6/X3p3bF50CWJSAKK5eyd94FTfjPo7s8Bz1X3NaRqsjcWMnJyNk0aNWD2mP5c3FptFUSkfLoit47728qdfHPGYtqe05ipo9Jo11xtFUSkYgr9Ouzl3Dx+MGcZl13QjEnD+3CurrIVkUoo9OuoP7y7gbFzV3H1587lhSGpnHWG/lOKSOWUFHWMu/PMG6t54Z8b+PLlbXj23is5o4HaKohI1Sj065DiklIef+UjZufmkd63Az+9XVfZisjpUejXEYePlfDQjA/526qdPHxDFx65sQuRSyVERKpOoV8H7D98jPun5JC9sZCf3HYZw67qFHRJIlJHKfQTXP6BwwybmM26/AP8ZlBPbrvygqBLEpE6LC5dNqUKsrKgUyeoVy9ym5VV6S6bdn/KXc/PZ9PuT5kwrI8CX0RipiP92pCVBRkZUFQUebxpU+QxQHp6ubus2LaPYROzKS4tJev+vvTsoLYKIhI7HenXhiee+FfgH1dUFBkvx4INuxn0wgIa1jdeHtNfgS8icaMj/dqweXOVx99asYOHZn5I++aNmTaqLxec07iGixORMNGRfm2o6Edhyoy/lL2FMdNzubRNM2aPuUqBLyJxp9CvDWPHQkqZRmgpKZHxqHH/XM/35yzj6s+1ZMb9fWnRpFEtFykiYaDQrw3p6ZCZCR07glnkNjMT0tMpLXV+PncVz/x1Nbde0YYJw/rQRH10RKSGKF1qS3r6Z87UOVZSymNzPmLO4jyG9u/I01+5jHpqqyAiNUihH5BDR0t4aMZi/r46n0dvvJhv3/A5tVUQkRqn0A/AvqJj3D81m5xNe/jZHd0Z0q9j0CWJSEgo9GvZzv2HGTZxEesLDvLbwT259QpdZSsitSemL3LNbICZrTGzdWb2WDnbzzCzF6PbF5pZp1her677ZNenfO35D9hcWMSk4WkKfBGpddUOfTOrD/wOuAXoBgw2s25lpo0C9rj754BfAf9Z3ddLBiu37efwsRJmPtCPa7q0DLocEQmhWJZ30oB17r4BwMxmAbcDK0+YczvwdPT+y8BzZmbu7jG8bp315SvacO3FLWl6ZsOgSxGRkIol9NsCW054nAf0rWiOuxeb2T7gXGBX2Sczswwg2oWMI2a2PIbaEllLynn/SUTvr27T+6u7ulZlUsJ8kevumUAmgJnluHtqwCXViGR+b6D3V9fp/dVdZpZTlXmxfJG7FWh/wuN20bFy55hZA+BsYHcMrykiIjGIJfSzgS5m1tnMGgGDgNfLzHkdGBa9fxfwj7Cu54uIJIJqL+9E1+gfAt4E6gMT3X2Fmf0UyHH314EJwDQzWwcUEvlgqIrM6tZVByTzewO9v7pO76/uqtJ7Mx14i4iEh7psioiEiEJfRCREEir0K2vrUJeZ2UQzy0/W6w/MrL2ZvW1mK81shZk9HHRN8WRmZ5rZIjNbGn1/Pwm6pngzs/pm9qGZ/TnoWuLNzDaa2UdmtqSqpzbWJWZ2jpm9bGarzWyVmfWvcG6irOlH2zp8DNxE5EKvbGCwu6885Y51hJldCxwEprp796DriTczawO0cffFZtYUyAXuSKL/fgY0cfeDZtYQeB942N0XBFxa3JjZvwGpQDN3vzXoeuLJzDYCqe6elBdmmdkU4D13Hx89mzLF3feWNzeRjvT/v62Dux8Fjrd1SAru/i6RM5iSkrtvd/fF0fsHgFVErshOCh5xMPqwYfQvMY6Y4sDM2gFfBsYHXYucHjM7G7iWyNmSuPvRigIfEiv0y2vrkDShESbRbqo9gYXBVhJf0eWPJUA+MM/dk+n9/Rr4PlAadCE1xIG3zCw32vIlmXQGCoBJ0eW58WbWpKLJiRT6kgTM7CxgDvCIu+8Pup54cvcSd+9B5OrzNDNLimU6M7sVyHf33KBrqUHXuHsvIl2Bvxldbk0WDYBewPPu3hP4FKjwO9FECv2qtHWQBBZd654DZLn7K0HXU1Oi/3R+GxgQdC1xcjVwW3TdexZwvZlND7ak+HL3rdHbfOBVIsvJySIPyDvhX54vE/kQKFcihX5V2jpIgop+0TkBWOXuzwZdT7yZWSszOyd6vzGREw5WB1tVfLj74+7ezt07Efn/3T/c/b6Ay4obM2sSPbmA6LLHzUDSnEXn7juALWZ2vMvmDZzc4v4kidRls9y2DgGXFTdmNhO4DmhpZnnAU+4+Idiq4upqYAjwUXTdG+CH7j43wJriqQ0wJXqWWT3gJXdPulMbk1Rr4NXIcQkNgBnu/kawJcXdt4Cs6AHzBmBERRMT5pRNERGpeYm0vCMiIjVMoS8iEiIKfRGREFHoi4iEiEJfRCREFPoiIiGi0BcRCZH/A23rN3YTPtt3AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Hd3T50PLB4lV", + "colab_type": "text" + }, + "source": [ + "Once we've generated this line for our dataset, we can use its equation to predict future values. We just pass the features of the data point we would like to predict into the equation of the line and use the output as our prediction." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "y02FbC56Nbx0", + "colab_type": "text" + }, + "source": [ + "### Setup and Imports\n", + "Before we get started we must install *sklearn* and import the following modules." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "p4xpuxwKHOGL", + "colab_type": "code", + "colab": {} + }, + "source": [ + "!pip install -q sklearn" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "iI3zi2ZhQ3WB", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 101 + }, + "outputId": "a566a88f-93ea-4f70-9003-39b654b63447" + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook" + ], + "execution_count": 3, + "outputs": [ + { + "output_type": "stream", + "text": [ + "`%tensorflow_version` only switches the major version: 1.x or 2.x.\n", + "You set: `2.x # this line is not required unless you are in a notebook`. This will be interpreted as: `2.x`.\n", + "\n", + "\n", + "TensorFlow 2.x selected.\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "qcII_xj9Ntyo", + "colab_type": "code", + "colab": {} + }, + "source": [ + "from __future__ import absolute_import, division, print_function, unicode_literals\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "from IPython.display import clear_output\n", + "from six.moves import urllib\n", + "\n", + "import tensorflow.compat.v2.feature_column as fc\n", + "\n", + "import tensorflow as tf" + ], + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GltdTjiERfWi", + "colab_type": "text" + }, + "source": [ + "### Data\n", + "So, if you haven't realized by now a major part of machine learning is data! In fact, it's so important that most of what we do in this tutorial will focus on exploring, cleaning and selecting appropriate data.\n", + "\n", + "The dataset we will be focusing on here is the titanic dataset. It has tons of information about each passanger on the ship. Our first step is always to understand the data and explore it. So, let's do that!\n", + "\n", + "**Below we will load a dataset and learn how we can explore it using some built-in tools. **\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "CpllWsKIOGOy", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Load dataset.\n", + "dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') # training data\n", + "dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') # testing data\n", + "y_train = dftrain.pop('survived')\n", + "y_eval = dfeval.pop('survived')" + ], + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "1PJr5GosQBeY", + "colab_type": "text" + }, + "source": [ + "The ```pd.read_csv()``` method will return to us a new pandas *dataframe*. You can think of a dataframe like a table. In fact, we can actually have a look at the table representation.\n", + "\n", + "We've decided to pop the \"survived\" column from our dataset and store it in a new variable. This column simply tells us if the person survived our not.\n", + "\n", + "To look at the data we'll use the ```.head()``` method from pandas. This will show us the first 5 items in our dataframe." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mKsXeWinQiVR", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 195 + }, + "outputId": "37edaaa0-da85-47c3-b8cc-fb4a1f78a1d1" + }, + "source": [ + "dftrain.head()" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sexagen_siblings_spousesparchfareclassdeckembark_townalone
0male22.0107.2500ThirdunknownSouthamptonn
1female38.01071.2833FirstCCherbourgn
2female26.0007.9250ThirdunknownSouthamptony
3female35.01053.1000FirstCSouthamptonn
4male28.0008.4583ThirdunknownQueenstowny
\n", + "
" + ], + "text/plain": [ + " sex age n_siblings_spouses parch ... class deck embark_town alone\n", + "0 male 22.0 1 0 ... Third unknown Southampton n\n", + "1 female 38.0 1 0 ... First C Cherbourg n\n", + "2 female 26.0 0 0 ... Third unknown Southampton y\n", + "3 female 35.0 1 0 ... First C Southampton n\n", + "4 male 28.0 0 0 ... Third unknown Queenstown y\n", + "\n", + "[5 rows x 9 columns]" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eeVQGdpCRC1P", + "colab_type": "text" + }, + "source": [ + "And if we want a more statistical analysis of our data we can use the ```.describe()``` method." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IkL2G42GRMf8", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 293 + }, + "outputId": "ffb42606-8f18-48d2-aacf-120578212af7" + }, + "source": [ + "dftrain.describe()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
agen_siblings_spousesparchfare
count627.000000627.000000627.000000627.000000
mean29.6313080.5454550.37958534.385399
std12.5118181.1510900.79299954.597730
min0.7500000.0000000.0000000.000000
25%23.0000000.0000000.0000007.895800
50%28.0000000.0000000.00000015.045800
75%35.0000001.0000000.00000031.387500
max80.0000008.0000005.000000512.329200
\n", + "
" + ], + "text/plain": [ + " age n_siblings_spouses parch fare\n", + "count 627.000000 627.000000 627.000000 627.000000\n", + "mean 29.631308 0.545455 0.379585 34.385399\n", + "std 12.511818 1.151090 0.792999 54.597730\n", + "min 0.750000 0.000000 0.000000 0.000000\n", + "25% 23.000000 0.000000 0.000000 7.895800\n", + "50% 28.000000 0.000000 0.000000 15.045800\n", + "75% 35.000000 1.000000 0.000000 31.387500\n", + "max 80.000000 8.000000 5.000000 512.329200" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 17 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "aX7FSzrQRXeC", + "colab_type": "text" + }, + "source": [ + "And since we talked so much about shapes in the previous tutorial let's have a look at that too!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "tR1Oy1dISdjn", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "94fb254b-0222-4b4b-d47c-73562f37571c" + }, + "source": [ + "dftrain.shape" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(627, 9)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 18 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iIfLiSRMTeW4", + "colab_type": "text" + }, + "source": [ + "So have have 627 entries and 9 features, nice!\n", + "\n", + "Now let's have a look at our survival information." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "aX1lKW7TTh-E", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 120 + }, + "outputId": "6b2efa46-40a8-4fae-bdd1-b828283daa0c" + }, + "source": [ + "y_train.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 0\n", + "1 1\n", + "2 1\n", + "3 1\n", + "4 0\n", + "Name: survived, dtype: int64" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 19 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ahqIyYPoTnRh", + "colab_type": "text" + }, + "source": [ + "Notice that each entry is either a 0 or 1. Can you guess which stands for survival? \n", + "\n", + "**And now because visuals are always valuable let's generate a few graphs of the data.**" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Edndbw4sU5Wd", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 282 + }, + "outputId": "42a09890-5513-4785-84d7-98af53216446" + }, + "source": [ + "dftrain.age.hist(bins=20)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 20 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAVdklEQVR4nO3df7Bcd13/8efbFhFzmYTaeifftHphjHVKI5Hs1DowzL3UH6E4FBynttPBRqoXZuqI2hlN0RGUYabf75cf4qBosLVFMbdIW6hp/VFjrxXHgrm1NiltoYWAzTcm0KYJtzAMKW//2HO/XS97c+/u2b177qfPx8zO3f2cc/a8srt53b2fPbsbmYkkqSzfMeoAkqTBs9wlqUCWuyQVyHKXpAJZ7pJUoNNHHQDgzDPPzImJiZ62efrpp1m3bt1wAtVgrt41NVtTc0FzszU1FzQ3W51cc3NzX8nMs7ouzMxTnoBzgLuBzwAPAm+txs8A7gI+V/18UTUewB8AjwIPAC9fbh/btm3LXt199909b7MazNW7pmZraq7M5mZraq7M5markwvYl0v06kqmZU4C12TmecCFwNURcR6wE9ibmZuBvdVlgNcAm6vTNPDBHn4RSZIGYNlyz8zDmXlfdf6rwEPAJuAS4KZqtZuA11fnLwE+XP1iuRfYEBEbB55ckrSkyB7eoRoRE8A9wPnAlzJzQzUewLHM3BARe4DrMvOT1bK9wG9m5r5F1zVN+5k94+Pj22ZmZnoKPj8/z9jYWE/brAZz9a6p2ZqaC5qbram5oLnZ6uSampqay8xW14VLzdcsPgFjwBzwM9XlpxYtP1b93AO8smN8L9A61XU75z58Tc2V2dxsTc2V2dxsTc2V2dxso5xzJyKeB9wCfCQzb62GjyxMt1Q/j1bjh2i/CLvg7GpMkrRKli33asrleuChzHxvx6LbgSur81cCn+gY//louxA4npmHB5hZkrSMlRzn/grgjcD+iLi/GnsbcB3w0Yi4CvgicGm17E7gYtqHQn4N+IWBJpYkLWvZcs/2C6OxxOKLuqyfwNU1c0mSavDjBySpQI34+AGtHRM77+h724PXvXaASSSdis/cJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFWskXZN8QEUcj4kDH2M0RcX91Orjw3aoRMRERX+9Y9sfDDC9J6m4l38R0I/AB4MMLA5n5cwvnI+I9wPGO9R/LzK2DCihJ6t1KviD7noiY6LYsIgK4FHj1YGNJkuqIzFx+pXa578nM8xeNvwp4b2a2OtZ7EPgscAL47cz85yWucxqYBhgfH982MzPTU/D5+XnGxsZ62mY1lJ5r/6Hjy6+0hC2b1ncdL/02G4amZmtqLmhutjq5pqam5hb6d7G6X5B9ObC74/Jh4Psy84mI2AZ8PCJempknFm+YmbuAXQCtVisnJyd72vHs7Cy9brMaSs+1o84XZF/Rff+l32bD0NRsTc0Fzc02rFx9Hy0TEacDPwPcvDCWmd/IzCeq83PAY8AP1g0pSepNnUMhfxx4ODMfXxiIiLMi4rTq/EuAzcDn60WUJPVqJYdC7gb+FTg3Ih6PiKuqRZfxP6dkAF4FPFAdGvkx4C2Z+eQgA0uSlreSo2UuX2J8R5exW4Bb6seSJNXhO1QlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqkOUuSQWy3CWpQJa7JBVoJd+hekNEHI2IAx1j74iIQxFxf3W6uGPZtRHxaEQ8EhE/NazgkqSlreSZ+43A9i7j78vMrdXpToCIOI/2F2e/tNrmjyLitEGFlSStzLLlnpn3AE+u8PouAWYy8xuZ+QXgUeCCGvkkSX2IzFx+pYgJYE9mnl9dfgewAzgB7AOuycxjEfEB4N7M/ItqveuBv8nMj3W5zmlgGmB8fHzbzMxMT8Hn5+cZGxvraZvVUHqu/YeO973tlk3ru46XfpsNQ1OzNTUXNDdbnVxTU1Nzmdnqtuz0PvN8EHgnkNXP9wBv6uUKMnMXsAug1Wrl5ORkTwFmZ2fpdZvVUHquHTvv6Hvbg1d033/pt9kwNDVbU3NBc7MNK1dfR8tk5pHMfCYzvwV8iGenXg4B53SsenY1JklaRX2Ve0Rs7Lj4BmDhSJrbgcsi4vkR8WJgM/DpehElSb1adlomInYDk8CZEfE48HZgMiK20p6WOQi8GSAzH4yIjwKfAU4CV2fmM8OJLklayrLlnpmXdxm+/hTrvwt4V51QkqR6fIeqJBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFWjZco+IGyLiaEQc6Bj7vxHxcEQ8EBG3RcSGanwiIr4eEfdXpz8eZnhJUncreeZ+I7B90dhdwPmZ+cPAZ4FrO5Y9lplbq9NbBhNTktSLZcs9M+8Bnlw09veZebK6eC9w9hCySZL6FJm5/EoRE8CezDy/y7K/Bm7OzL+o1nuQ9rP5E8BvZ+Y/L3Gd08A0wPj4+LaZmZmegs/PzzM2NtbTNquh9Fz7Dx3ve9stm9Z3HS/9NhuGpmZrai5obrY6uaampuYys9Vt2el1QkXEbwEngY9UQ4eB78vMJyJiG/DxiHhpZp5YvG1m7gJ2AbRarZycnOxp37Ozs/S6zWooPdeOnXf0ve3BK7rvv/TbbBiamq2puaC52YaVq++jZSJiB/DTwBVZPf3PzG9k5hPV+TngMeAHB5BTktSDvso9IrYDvwG8LjO/1jF+VkScVp1/CbAZ+PwggkqSVm7ZaZmI2A1MAmdGxOPA22kfHfN84K6IALi3OjLmVcDvRcQ3gW8Bb8nMJ7tesSRpaJYt98y8vMvw9UusewtwS91QkqR6fIeqJBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCrajcI+KGiDgaEQc6xs6IiLsi4nPVzxdV4xERfxARj0bEAxHx8mGFlyR1t9Jn7jcC2xeN7QT2ZuZmYG91GeA1wObqNA18sH5MSVIvVlTumXkP8OSi4UuAm6rzNwGv7xj/cLbdC2yIiI2DCCtJWpnIzJWtGDEB7MnM86vLT2Xmhup8AMcyc0NE7AGuy8xPVsv2Ar+ZmfsWXd807Wf2jI+Pb5uZmekp+Pz8PGNjYz1tsxpKz7X/0PG+t92yaX3X8dJvs2Foaram5oLmZquTa2pqai4zW92WnV4rVSUzMyJW9lvi2W12AbsAWq1WTk5O9rTP2dlZet1mNZSea8fOO/re9uAV3fdf+m02DE3N1tRc0Nxsw8pV52iZIwvTLdXPo9X4IeCcjvXOrsYkSaukTrnfDlxZnb8S+ETH+M9XR81cCBzPzMM19iNJ6tGKpmUiYjcwCZwZEY8DbweuAz4aEVcBXwQurVa/E7gYeBT4GvALA84sSVrGiso9My9fYtFFXdZN4Oo6oSRJ9fgOVUkqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgy12SCmS5S1KBLHdJKpDlLkkFstwlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBVrR1+x1ExHnAjd3DL0E+B1gA/BLwJer8bdl5p19J5Qk9azvcs/MR4CtABFxGnAIuI32F2K/LzPfPZCEkqSeDWpa5iLgscz84oCuT5JUQ2Rm/SuJuAG4LzM/EBHvAHYAJ4B9wDWZeazLNtPANMD4+Pi2mZmZnvY5Pz/P2NhYzeSDV3qu/YeO973tlk3ru46XfpsNQ1OzNTUXNDdbnVxTU1Nzmdnqtqx2uUfEdwL/D3hpZh6JiHHgK0AC7wQ2ZuabTnUdrVYr9+3b19N+Z2dnmZyc7C/0EJWea2LnHX1ve/C613YdL/02G4amZmtqLmhutjq5ImLJch/EtMxraD9rPwKQmUcy85nM/BbwIeCCAexDktSDQZT75cDuhQsRsbFj2RuAAwPYhySpB30fLQMQEeuAnwDe3DH8fyJiK+1pmYOLlkmSVkGtcs/Mp4HvWTT2xlqJJEm1+Q5VSSqQ5S5JBbLcJalAlrskFajWC6pam+q8EUnS2uAzd0kqkOUuSQWy3CWpQJa7JBXIcpekAlnuklQgD4XUqlnqEMxrtpxkx5APz1zqs+SlUvnMXZIKZLlLUoEsd0kqkOUuSQXyBdU1qJ/PhlmNFy0lNUftco+Ig8BXgWeAk5nZiogzgJuBCdpftXdpZh6ruy9J0soMalpmKjO3ZmarurwT2JuZm4G91WVJ0ioZ1pz7JcBN1fmbgNcPaT+SpC4iM+tdQcQXgGNAAn+Smbsi4qnM3FAtD+DYwuWO7aaBaYDx8fFtMzMzPe13fn6esbGxWtmHYTVy7T90vOdtxl8AR74+hDADsBrZtmxa3/M2TX2MQXOzNTUXNDdbnVxTU1NzHTMm/8MgXlB9ZWYeiojvBe6KiIc7F2ZmRsS3/QbJzF3ALoBWq5WTk5M97XR2dpZet1kNq5GrnxdGr9lykvfsb+br56uR7eAVkz1v09THGDQ3W1NzQXOzDStX7WmZzDxU/TwK3AZcAByJiI0A1c+jdfcjSVq5WuUeEesi4oUL54GfBA4AtwNXVqtdCXyizn4kSb2p+7fwOHBbe1qd04G/zMy/jYh/Az4aEVcBXwQurbkfSVIPapV7Zn4eeFmX8SeAi+pctySpf378gCQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAlrskFchyl6QCWe6SVCDLXZIKZLlLUoEsd0kqUDO/d00asIk+v5pwx847OHjda4eQSBoun7lLUoEsd0kqkOUuSQXqe849Is4BPkz7e1QT2JWZ74+IdwC/BHy5WvVtmXln3aDSWtTPXP8C5/pVR50XVE8C12TmfRHxQmAuIu6qlr0vM99dP54kqR99l3tmHgYOV+e/GhEPAZsGFUyS1L/IzPpXEjEB3AOcD/w6sAM4Aeyj/ez+WJdtpoFpgPHx8W0zMzM97XN+fp6xsbE6sYdiNXLtP3S8523GXwBHvj6EMAPQ1GwLubZsWt/3dfRzXy041X6fy4//fjU1W51cU1NTc5nZ6rasdrlHxBjwT8C7MvPWiBgHvkJ7Hv6dwMbMfNOprqPVauW+fft62u/s7CyTk5NAs+Y1O3MNS7/HbL9nfzPf1tDUbAu56jxGhvXYXI3HWT+amguam61OrohYstxr/Y+KiOcBtwAfycxbATLzSMfyDwF76uxDeq461S+GhTdYLcUXY9X3oZAREcD1wEOZ+d6O8Y0dq70BONB/PElSP+o8c38F8EZgf0TcX429Dbg8IrbSnpY5CLy5VsJC1flzXavL+0prUZ2jZT4JRJdFHtMuSSPmO1QlqUCWuyQVyHKXpAJZ7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalAzfsovjWk29vSl/tAJ2kt6PcjF67ZcpLJwUZRn3zmLkkFstwlqUCWuyQV6Dk/5+7HuUoq0XO+3CUNVpO+9vK5zGkZSSqQ5S5JBXJaRirQc/G1pOX+zad6D0qJ00FDK/eI2A68HzgN+NPMvG5Y+5JUhufiL6VhGcq0TEScBvwh8BrgPNpfmn3eMPYlSfp2w3rmfgHwaGZ+HiAiZoBLgM8MaX+SNDJ1/uK4cfu6ASZ5VmTm4K804meB7Zn5i9XlNwI/mpm/3LHONDBdXTwXeKTH3ZwJfGUAcQfNXL1raram5oLmZmtqLmhutjq5vj8zz+q2YGQvqGbmLmBXv9tHxL7MbA0w0kCYq3dNzdbUXNDcbE3NBc3NNqxcwzoU8hBwTsfls6sxSdIqGFa5/xuwOSJeHBHfCVwG3D6kfUmSFhnKtExmnoyIXwb+jvahkDdk5oMD3k3fUzpDZq7eNTVbU3NBc7M1NRc0N9tQcg3lBVVJ0mj58QOSVCDLXZIKtObKPSK2R8QjEfFoROwccZYbIuJoRBzoGDsjIu6KiM9VP180glznRMTdEfGZiHgwIt7ahGwR8V0R8emI+I8q1+9W4y+OiE9V9+nN1Yvwqy4iTouIf4+IPQ3LdTAi9kfE/RGxrxob+eOsyrEhIj4WEQ9HxEMR8WOjzhYR51a31cLpRET86qhzVdl+rXrsH4iI3dX/iaE8ztZUuTfwYw1uBLYvGtsJ7M3MzcDe6vJqOwlck5nnARcCV1e306izfQN4dWa+DNgKbI+IC4H/DbwvM38AOAZctcq5FrwVeKjjclNyAUxl5taO46FHfV8ueD/wt5n5Q8DLaN9+I82WmY9Ut9VWYBvwNeC2UeeKiE3ArwCtzDyf9sEmlzGsx1lmrpkT8GPA33Vcvha4dsSZJoADHZcfATZW5zcCjzTgdvsE8BNNygZ8N3Af8KO03513erf7eBXznE37P/yrgT1ANCFXte+DwJmLxkZ+XwLrgS9QHZjRpGwdWX4S+Jcm5AI2Af8JnEH7SMU9wE8N63G2pp658+yNs+DxaqxJxjPzcHX+v4DxUYaJiAngR4BP0YBs1dTH/cBR4C7gMeCpzDxZrTKq+/T3gd8AvlVd/p6G5AJI4O8jYq762A5owH0JvBj4MvBn1XTWn0bEuoZkW3AZsLs6P9JcmXkIeDfwJeAwcByYY0iPs7VW7mtKtn8Vj+xY04gYA24BfjUzT3QuG1W2zHwm238un037A+Z+aLUzLBYRPw0czcy5UWdZwisz8+W0pyOvjohXdS4c4ePsdODlwAcz80eAp1k01THK/wPV3PXrgL9avGwUuao5/kto/1L8X8A6vn1ad2DWWrmvhY81OBIRGwGqn0dHESIinke72D+Smbc2KRtAZj4F3E37z9ANEbHwhrpR3KevAF4XEQeBGdpTM+9vQC7g/z/jIzOP0p47voBm3JePA49n5qeqyx+jXfZNyAbtX4b3ZeaR6vKoc/048IXM/HJmfhO4lfZjbyiPs7VW7mvhYw1uB66szl9Je757VUVEANcDD2Xme5uSLSLOiogN1fkX0H4d4CHaJf+zo8qVmddm5tmZOUH7MfWPmXnFqHMBRMS6iHjhwnnac8gHaMDjLDP/C/jPiDi3GrqI9sd6jzxb5XKenZKB0ef6EnBhRHx39X904fYazuNsVC901HhR4mLgs7Tnan9rxFl20547+ybtZzFX0Z6r3Qt8DvgH4IwR5Hol7T85HwDur04Xjzob8MPAv1e5DgC/U42/BPg08CjtP6GfP8L7dBLY05RcVYb/qE4PLjzmR31fduTbCuyr7tOPAy9qQjbaUx5PAOs7xpqQ63eBh6vH/58Dzx/W48yPH5CkAq21aRlJ0gpY7pJUIMtdkgpkuUtSgSx3SSqQ5S5JBbLcJalA/w0h+sl++RADlQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "4SM_tYvyUtsw", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 282 + }, + "outputId": "128bea12-3654-4692-e7cc-6279f8ed2079" + }, + "source": [ + "dftrain.sex.value_counts().plot(kind='barh')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 21 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAD4CAYAAADo30HgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAMdUlEQVR4nO3cf4xld1nH8c8D225NS4rQhmxacChuJKRAW0tFRQKICF1DQTAhEigJoVEUNabRIpHUVLSCKJqgpCgWFQVBDAghiLTGBLF11/7Y1nah2jVSKw0SlpomVenXP+5ZmGec2XbbmXtmy+uVTPbcc+/e88x3cve959y7W2OMAMBhj5h7AAC2F2EAoBEGABphAKARBgCaHXMPsBlOOeWUsbKyMvcYAMeUffv2fWmMcera/Q+LMKysrGTv3r1zjwFwTKmqf11vv0tJADTCAEAjDAA0wgBAIwwANMIAQCMMADTCAEAjDAA0wgBAIwwANMIAQCMMADTCAEAjDAA0wgBAIwwANMIAQCMMADTCAEAjDAA0wgBAIwwANMIAQCMMADTCAECzY+4BNsP+Ow5l5ZKPzz0GrOvg5XvmHgGOijMGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAmvsNQ1X9VFXdUlXv24oBqurSqrp4K54bgKO34wE85vVJnj/G+MJWDwPA/I4Yhqp6V5Izknyiqt6f5ElJzkxyXJJLxxgfqarXJHlJkhOT7E7y60mOT/KqJPcmOX+M8eWqel2Si6b7bkvyqjHGPWuO96Qk70xyapJ7krxujHHrJn2vADwAR7yUNMb4sST/nuS5WfzBf9UY47zp9tuq6sTpoWcm+eEkz0jyliT3jDHOTvLZJK+eHvPhMcYzxhhPT3JLkteuc8grkrxhjPGdSS5O8jsbzVZVF1XV3qra+7V7Dj2w7xaA+/VALiUd9oIkL171fsAJSZ4wbV89xrg7yd1VdSjJX0779yd52rR9ZlX9cpJHJzkpySdXP3lVnZTke5J8sKoO79650TBjjCuyCEl27to9juL7AOAIjiYMleRlY4wDbWfVd2Vxyeiw+1bdvm/VMa5M8pIxxg3T5afnrHn+RyT5yhjjrKOYCYBNdjQfV/1kkjfU9Nf5qjr7KI/1qCR3VtVxSV659s4xxleT3F5VPzI9f1XV04/yGAA8REcThsuyeNP5xqq6ebp9NH4xyTVJPpNkozeUX5nktVV1Q5Kbk1xwlMcA4CGqMY79y/M7d+0euy58x9xjwLoOXr5n7hFgXVW1b4xx7tr9/uUzAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwDNjrkH2AxPPe3k7L18z9xjADwsOGMAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCgEQYAGmEAoBEGABphAKARBgAaYQCg2TH3AJth/x2HsnLJx+ceA2CpDl6+Z0ue1xkDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAzbYIQ1U9p6o+NvccAGyTMACwfWxaGKpqpapuraorq+pzVfW+qnp+VX2mqj5fVedNX5+tquuq6u+q6jvWeZ4Tq+o9VXXt9LgLNmtGAO7fZp8xfHuStyd58vT1o0meleTiJL+Q5NYk3zfGODvJm5P8yjrP8aYkV40xzkvy3CRvq6oT1z6oqi6qqr1Vtfdr9xza5G8D4JvXjk1+vtvHGPuTpKpuTvLpMcaoqv1JVpKcnOS9VbU7yUhy3DrP8YIkL66qi6fbJyR5QpJbVj9ojHFFkiuSZOeu3WOTvw+Ab1qbHYZ7V23ft+r2fdOxLkty9RjjpVW1kuRv1nmOSvKyMcaBTZ4NgAdg2W8+n5zkjmn7NRs85pNJ3lBVlSRVdfYS5gJgsuwwvDXJr1bVddn4bOWyLC4x3ThdjrpsWcMBkNQYx/7l+Z27do9dF75j7jEAlurg5Xse0u+vqn1jjHPX7vfvGABohAGARhgAaIQBgEYYAGiEAYBGGABohAGARhgAaIQBgEYYAGiEAYBGGABohAGARhgAaIQBgEYYAGiEAYBGGABohAGARhgAaIQBgGbH3ANshqeednL2Xr5n7jEAHhacMQDQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMAjTAA0AgDAI0wANAIAwCNMADQCAMATY0x5p7hIauqu5McmHuODZyS5EtzD7GO7TpXYrYHy2wPzjfzbN82xjh17c4dW3jAZTowxjh37iHWU1V7t+Ns23WuxGwPltkeHLP9fy4lAdAIAwDNwyUMV8w9wBFs19m261yJ2R4ssz04ZlvjYfHmMwCb5+FyxgDAJhEGAJpjOgxV9cKqOlBVt1XVJdtgnoNVtb+qrq+qvdO+x1TVp6rq89Ov37qkWd5TVXdV1U2r9q07Sy389rSON1bVOTPMdmlV3TGt3fVVdf6q+944zXagqn5wi2d7fFVdXVX/VFU3V9VPT/tnXbsjzDX7ulXVCVV1bVXdMM32S9P+J1bVNdMMH6iq46f9O6fbt033r8ww25VVdfuqdTtr2r/U18J0zEdW1XVV9bHp9uzrljHGMfmV5JFJ/jnJGUmOT3JDkqfMPNPBJKes2ffWJJdM25ck+bUlzfLsJOckuen+ZklyfpJPJKkkz0xyzQyzXZrk4nUe+5TpZ7szyROnn/kjt3C2XUnOmbYfleRz0wyzrt0R5pp93abv/aRp+7gk10xr8WdJXjHtf1eSH5+2X5/kXdP2K5J8YAt/nhvNdmWSl6/z+KW+FqZj/mySP0nysen27Ot2LJ8xnJfktjHGv4wx/jvJ+5NcMPNM67kgyXun7fcmeckyDjrG+NskX36As1yQ5A/Hwt8neXRV7VrybBu5IMn7xxj3jjFuT3JbFj/7rZrtzjHGP07bdye5JclpmXntjjDXRpa2btP3/l/TzeOmr5HkeUk+NO1fu2aH1/JDSb6/qmrJs21kqa+Fqjo9yZ4kvzfdrmyDdTuWw3Bakn9bdfsLOfILZRlGkr+qqn1VddG073FjjDun7f9I8rh5RjviLNtlLX9yOn1/z6pLbrPNNp2qn53F3zK3zdqtmSvZBus2XQ65PsldST6VxRnKV8YY/7vO8b8+23T/oSSPXdZsY4zD6/aWad1+s6p2rp1tnbm3wjuS/FyS+6bbj802WLdjOQzb0bPGGOckeVGSn6iqZ6++cyzOAbfF54O30yyT303ypCRnJbkzydvnHKaqTkry50l+Zozx1dX3zbl268y1LdZtjPG1McZZSU7P4szkyXPMsZ61s1XVmUnemMWMz0jymCQ/v+y5quqHktw1xti37GPfn2M5DHckefyq26dP+2Yzxrhj+vWuJH+RxQvki4dPRadf75pvwg1nmX0txxhfnF7A9yV5d75x2WPps1XVcVn84fu+McaHp92zr916c22ndZvm+UqSq5N8dxaXYQ7/f2yrj//12ab7T07yn0uc7YXTpbkxxrg3yR9knnX73iQvrqqDWVwKf16S38o2WLdjOQz/kGT39A7+8Vm8GfPRuYapqhOr6lGHt5O8IMlN00wXTg+7MMlH5pkwOcIsH03y6ukTGc9McmjVZZOlWHMd96VZrN3h2V4xfSLjiUl2J7l2C+eoJL+f5JYxxm+sumvWtdtoru2wblV1alU9etr+liQ/kMV7IFcnefn0sLVrdngtX57kquksbFmz3boq8pXFNfzV67aU18IY441jjNPHGCtZ/Pl11RjjldkG67al77Zv9VcWnyD4XBbXM9808yxnZPEpkBuS3Hx4niyuAX46yeeT/HWSxyxpnj/N4tLC/2RxnfK1G82SxScw3jmt4/4k584w2x9Nx74xixfArlWPf9M024EkL9ri2Z6VxWWiG5NcP32dP/faHWGu2dctydOSXDfNcFOSN696TVybxRvfH0yyc9p/wnT7tun+M2aY7app3W5K8sf5xieXlvpaWDXnc/KNTyXNvm7+SwwAmmP5UhIAW0AYAGiEAYBGGABohAGARhgAaIQBgOb/AEYEJAUZdZ+FAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WCv3ek2LU1Lw", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 282 + }, + "outputId": "a71bfdd1-ebbf-44cf-ad1b-9ec992e2fcd7" + }, + "source": [ + "dftrain['class'].value_counts().plot(kind='barh')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 22 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAD4CAYAAADy46FuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAN6klEQVR4nO3de4yld13H8ffHbbultiyWbspSiENrYwOtLNtFAQGLgJaupFxqKH8oJppNEKONMVpC0lSFpBW8RIKSNiJoGyiiKKERKNKiCbF1t267LfSmu8YuhaZgl5ZLheXrH+fZchz2fPc2M+ec6fuVnMxzfs8zZz7zO2fms89l56SqkCRpkh+YdgBJ0myzKCRJLYtCktSyKCRJLYtCktQ6ZtoBltIpp5xSCwsL044hSXNl+/btD1XV+knrV1VRLCwssG3btmnHkKS5kuS/uvUeepIktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVJrVb1x0c49e1m49Pppx9Ay2H3FlmlHkJ6w3KOQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklS65CKIsnbktyZ5PYkO5L8xHIHW/T1z0vy8ZX8mpKkkYO+H0WSFwI/B2yqqseSnAIct+zJJEkz4VD2KDYAD1XVYwBV9VBVfTHJuUk+m2R7kk8m2QCQ5EeSfDrJbUluTXJGRt6Z5I4kO5O8Ydj2vCQ3JflIkruSXJskw7rzh7Fbgdct0/cvSTqIQymKTwHPTHJPkj9L8lNJjgXeDVxUVecC7wPeMWx/LfCeqnou8CLgAUa/6DcCzwVeAbxzf7EAzwMuAZ4NnA78ZJLjgauBVwPnAk87+m9VknQkDnroqaoeTXIu8BLgZcB1wNuBs4Ebhh2ANcADSU4CTquqjw6f+y2AJC8GPlhV+4AvJ/ks8Hzga8AtVXX/sN0OYAF4FNhVVfcO49cAWw+UL8nW/evWPHn9EUyBJKlzSO+ZPfyCvwm4KclO4C3AnVX1wvHthqI4XI+NLe871Exj2a4CrgJYu+HMOoKvL0lqHPTQU5IfTXLm2NBG4AvA+uFEN0mOTfKcqnoEuD/Ja4bxtUlOAP4FeEOSNUnWAy8Fbmm+7F3AQpIzhvtvPOzvTJK0JA7lHMWJwAeSfD7J7YzOJVwGXARcmeQ2YAej8xEAvwD8+rDt5xidX/gocDtwG/AZ4Ler6kuTvuBwyGorcP1wMvvBI/nmJElHL1Wr52jN2g1n1oY3/cm0Y2gZ7L5iy7QjSKtWku1VtXnSev9ntiSpZVFIkloWhSSpZVFIkloWhSSpZVFIkloWhSSpZVFIkloWhSSpZVFIkloWhSSpZVFIkloWhSSpdVhvEjTrzjltHdv8K6OStKTco5AktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktY6ZdoCltHPPXhYuvX7aMbSK7L5iy7QjSFPnHoUkqWVRSJJaFoUkqWVRSJJaFoUkqWVRSJJaFoUkqWVRSJJaFoUkqWVRSJJaFoUkqWVRSJJaFoUkqWVRSJJay14USfYl2TF2W0jyucN8jEuSnLBcGSVJk63E+1F8s6o2Lhp70eKNkhxTVd+Z8BiXANcA31jqcJKk3lTeuCjJo1V1YpLzgN8H/gc4K8nzgA8DzwDWDOtOBZ4O3Jjkoap62TQyS9IT1UoUxZOS7BiWd1XVaxet3wScXVW7krwe+GJVbQFIsq6q9ib5TeBlVfXQ4gdPshXYCrDmyeuX77uQpCeolTiZ/c2q2jjcFpcEwC1VtWtY3gm8MsmVSV5SVXsP9uBVdVVVba6qzWtOWLekwSVJs3HV09f3L1TVPYz2MHYCb09y2dRSSZKAKZ2jmCTJ04GvVtU1SR4GfmVY9QhwEvB9h54kSctrpooCOAd4Z5LvAt8G3jyMXwV8IskXPZktSStr2Yuiqk6cNFZVNwE3jY1/EvjkAbZ/N/DuZQspSZpoFs5RSJJmmEUhSWpZFJKklkUhSWpZFJKklkUhSWpZFJKklkUhSWpZFJKklkUhSWpZFJKklkUhSWrN2l+PPSrnnLaObVdsmXYMSVpV3KOQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLUsCklSy6KQJLWOmXaApbRzz14WLr1+2jEkaUXtvmLLsj6+exSSpJZFIUlqWRSSpJZFIUlqWRSSpJZFIUlqWRSSpJZFIUlqWRSSpJZFIUlqWRSSpJZFIUlqWRSSpJZFIUlqLVlRJHlqkh3D7UtJ9gzLDyf5/ITP+b0krziExz4vyceXKqsk6dAt2ftRVNVXgI0ASS4HHq2qdyVZAA74S76qLjvQeJI1VbVvqbJJko7cSh16WpPk6iR3JvlUkicBJHl/kouG5d1JrkxyK/DzSc5Pctdw/3UrlFOStMhKFcWZwHuq6jnAw8DrJ2z3laraBPw9cDXwauBc4GkrklKS9H1Wqih2VdWOYXk7sDBhu+uGj2cNn3NvVRVwzaQHTrI1ybYk2/Z9Y++SBZYkjaxUUTw2tryPyedGvn64D1xVV1XV5qravOaEdUcUTpI02axeHnsXsJDkjOH+G6cZRpKeyGayKKrqW8BW4PrhZPaDU44kSU9YS3Z57LiqunxseTdw9tj9d40t/9LY8sKix/gEo3MVkqQpmsk9CknS7LAoJEkti0KS1LIoJEkti0KS1LIoJEkti0KS1LIoJEkti0KS1LIoJEkti0KS1LIoJEkti0KS1FqWvx47Leecto5tV2yZdgxJWlXco5AktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktSwKSVLLopAktVJV086wZJI8Atw97RxH6BTgoWmHOALzmhvMPi3zmn1ec8PBs/9wVa2ftHJVvRUqcHdVbZ52iCORZNs8Zp/X3GD2aZnX7POaG44+u4eeJEkti0KS1FptRXHVtAMchXnNPq+5wezTMq/Z5zU3HGX2VXUyW5K09FbbHoUkaYlZFJKk1qooiiTnJ7k7yX1JLp12noNJsjvJziQ7kmwbxk5OckOSe4ePPzTtnABJ3pfkwSR3jI0dMGtG/nR4Hm5Psml6ySdmvzzJnmHudyS5YGzdW4fsdyf52emkhiTPTHJjks8nuTPJbwzjMz/vTfZ5mPfjk9yS5LYh++8O489KcvOQ8bokxw3ja4f79w3rF2Ys9/uT7Bqb843D+OG/Xqpqrm/AGuA/gNOB44DbgGdPO9dBMu8GTlk09gfApcPypcCV0845ZHkpsAm442BZgQuAfwQCvAC4eQazXw781gG2ffbw2lkLPGt4Ta2ZUu4NwKZh+STgniHfzM97k30e5j3AicPyscDNw3x+GLh4GH8v8OZh+VeB9w7LFwPXzVju9wMXHWD7w369rIY9ih8H7quq/6yq/wU+BFw45UxH4kLgA8PyB4DXTDHL46rqn4GvLhqelPVC4K9q5F+BpyTZsDJJv9+E7JNcCHyoqh6rql3AfYxeWyuuqh6oqluH5UeALwCnMQfz3mSfZJbmvarq0eHuscOtgJ8GPjKML573/c/HR4CXJ8kKxX1ck3uSw369rIaiOA3477H799O/MGdBAZ9Ksj3J1mHs1Kp6YFj+EnDqdKIdkklZ5+W5+LVhl/t9Y4f4ZjL7cDjjeYz+lThX874oO8zBvCdZk2QH8CBwA6M9nIer6jsHyPd49mH9XuCpK5t4ZHHuqto/5+8Y5vyPk6wdxg57zldDUcyjF1fVJuBVwFuSvHR8ZY32D+fiuuV5yjr4c+AMYCPwAPCH040zWZITgb8FLqmqr42vm/V5P0D2uZj3qtpXVRuBZzDaszlrypEOyeLcSc4G3soo//OBk4HfOdLHXw1FsQd45tj9ZwxjM6uq9gwfHwQ+yugF+eX9u3/Dxwenl/CgJmWd+eeiqr48/FB9F7ia7x3mmKnsSY5l9Iv22qr6u2F4Lub9QNnnZd73q6qHgRuBFzI6NLP/7+KN53s8+7B+HfCVFY76/4zlPn84DFhV9RjwlxzFnK+Govg34MzhyoTjGJ1U+tiUM02U5AeTnLR/GfgZ4A5Gmd80bPYm4B+mk/CQTMr6MeAXh6sqXgDsHTtUMhMWHYt9LaO5h1H2i4crWZ4FnAncstL5YHRVCvAXwBeq6o/GVs38vE/KPifzvj7JU4blJwGvZHSO5UbgomGzxfO+//m4CPjMsKe3oibkvmvsHxVhdF5lfM4P7/UyjbP0S31jdBb/HkbHE9827TwHyXo6o6s8bgPu3J+X0bHNfwLuBT4NnDztrEOuDzI6VPBtRscyf3lSVkZXUbxneB52AptnMPtfD9luH35gNoxt/7Yh+93Aq6aY+8WMDivdDuwYbhfMw7w32edh3n8M+Pch4x3AZcP46YzK6z7gb4C1w/jxw/37hvWnz1juzwxzfgdwDd+7MuqwXy/+CQ9JUms1HHqSJC0ji0KS1LIoJEkti0KS1LIoJEkti0KS1LIoJEmt/wNVU+he/vp3xQAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "D4kPWqBYVDlj", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 296 + }, + "outputId": "c4f858de-73e4-41a3-8928-e97a0e0e08ad" + }, + "source": [ + "pd.concat([dftrain, y_train], axis=1).groupby('sex').survived.mean().plot(kind='barh').set_xlabel('% survive')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "Text(0.5, 0, '% survive')" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 23 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAEGCAYAAACtqQjWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAPpUlEQVR4nO3dfZBddX3H8fcHo0ER0RqcRlC30lhEFBGM1qkWRgctGQELIj7VTCnWh2Idi1NaK6WlSiq1tTNqFVsH62hB0KkgKLUKdUwBDQZII4IocQSZ1seIZrSSfPvHOamXdcPeJL/7sJv3a+bOnHPvb8/93LObfO45Z/d3U1VIktTCXpMOIElaPCwVSVIzlookqRlLRZLUjKUiSWpmyaQDTNKyZctqZmZm0jEkaUG5/vrrv1NV+8/12B5dKjMzM6xbt27SMSRpQUnyjR095ukvSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqZklkw4wSRvu3MzMmZdPOsbU27Rm1aQjSFogPFKRJDVjqUiSmrFUJEnNWCqSpGYsFUlSM5aKJKkZS0WS1IylIklqxlKRJDVjqUiSmrFUJEnNWCqSpGYsFUlSM5aKJKkZS0WS1IylIklqxlKRJDVjqUiSmrFUJEnNWCqSpGYsFUlSM5aKJKkZS0WS1MyCLpUkRyX5xKRzSJI6C7pUJEnTZeKlkmQmyVeSXJDk1iQfSvKcJGuTfDXJyv52TZL1Sf4zya/NsZ19krw/yRf6ccdP4vVI0p5s4qXS+1Xg7cDB/e0lwG8AZwB/CnwFeGZVHQ6cBbx1jm28CfhsVa0EjgbOS7LP7EFJXplkXZJ1W7dsHsmLkaQ91ZJJB+jdXlUbAJJsBD5TVZVkAzAD7Ad8IMkKoID7z7GNY4DjkpzRr+8NPBq4eXBQVZ0PnA+wdPmKGsFrkaQ91rSUyk8HlrcNrG+jy3gOcFVVvSDJDHD1HNsIcGJV3TK6mJKk+zItp7/msx9wZ7+8egdjrgROTxKAJIePIZckacBCKZW3AecmWc+Oj67OoTstdlN/Cu2ccYWTJHVStedeVli6fEUtf8U7Jh1j6m1as2rSESRNkSTXV9WRcz22UI5UJEkLgKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc0smXSASXriAfuxbs2qSceQpEXDIxVJUjOWiiSpGUtFktSMpSJJasZSkSQ1Y6lIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJasZSkSQ1Y6lIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJamaoUkly6qz1+yX589FEkiQtVMMeqTw7yRVJlid5AnAtsO8Ic0mSFqAlwwyqqpckeRGwAfgx8JKqWjvSZJKkBWfY018rgD8EPgp8A3h5kgeNMpgkaeEZ9vTXZcBZVfX7wG8CXwW+OLJUkqQFaajTX8DKqvohQFUV8PYkl40uliRpIRr2SOWBSf4pyacAkhwCPHN0sSRJC9GwpXIBcCWwvF+/FXj9KAJJkhauYUtlWVV9BNgGUFX3AFtHlkqStCANWyo/TvJwoACSPB3YPLJUkqQFadgL9W8ALgUOSrIW2B84aWSpJEkL0rBHKgcBvwU8g+7aylcZvpAkSXuIYUvlzf2vFD8MOBp4N/API0slSVqQhi2V7RflVwHvq6rLgQeMJpIkaaEatlTuTPJe4EXAFUmW7sTXSpL2EMMWw8l011KeW1U/AH4JeOPIUkmSFqRhZyneAnxsYP0u4K5RhZIkLUyewpIkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWpmqA/pWqw23LmZmTMvn3QMSRqrTWtWjWzbHqlIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJasZSkSQ1Y6lIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJasZSkSQ1Y6lIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJamZkpZLkdUluTvKhEW3/7CRnjGLbkqRds2SE234N8JyqumOEzyFJmiIjKZUk7wEeC3wyyYXAQcChwP2Bs6vq40lWAycA+wArgL8BHgC8HPgpcGxVfS/JacAr+8duA15eVVtmPd9BwLuA/YEtwGlV9ZVRvDZJ0o6N5PRXVb0K+BZwNF1pfLaqVvbr5yXZpx96KPDbwFOBtwBbqupw4Brgd/oxH6uqp1bVYcDNwKlzPOX5wOlVdQRwBvDuHWVL8sok65Ks27pl8+6+VEnSgFGe/truGOC4gesfewOP7pevqqq7gbuTbAYu6+/fADypXz40yV8BDwUeDFw5uPEkDwaeAVycZPvdS3cUpqrOpyshli5fUbvxuiRJs4yjVAKcWFW33OvO5Gl0p7m22zawvm0g2wXACVV1Y3/K7KhZ298L+EFVPbltbEnSzhrHrxRfCZye/jAiyeE7+fX7AncluT/w0tkPVtUPgduTvLDffpIctpuZJUm7YBylcg7dBfqbkmzs13fGm4HrgLXAji6+vxQ4NcmNwEbg+F3MKknaDanacy8rLF2+opa/4h2TjiFJY7Vpzard+vok11fVkXM95l/US5KasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUjKUiSWrGUpEkNWOpSJKasVQkSc1YKpKkZiwVSVIzlookqRlLRZLUzJJJB5ikJx6wH+vWrJp0DElaNDxSkSQ1Y6lIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJasZSkSQ1Y6lIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJasZSkSQ1Y6lIkpqxVCRJzVgqkqRmLBVJUjOWiiSpGUtFktSMpSJJasZSkSQ1Y6lIkppJVU06w8QkuRu4ZdI55rEM+M6kQ8zDjLtv2vOBGVtZDBkfU1X7z/XAktHkWTBuqaojJx3iviRZZ8bdN+0Zpz0fmLGVxZ7R01+SpGYsFUlSM3t6qZw/6QBDMGMb055x2vOBGVtZ1Bn36Av1kqS29vQjFUlSQ5aKJKmZRV8qSZ6X5JYktyU5c47Hlya5qH/8uiQzU5jxWUm+lOSeJCeNO9+QGd+Q5MtJbkrymSSPmcKMr0qyIckNST6f5JBpyzgw7sQklWTsv3o6xH5cneTb/X68IcnvTVvGfszJ/c/kxiQfnraMSf5uYB/emuQHU5jx0UmuSrK+/7d97LwbrapFewPuB3wNeCzwAOBG4JBZY14DvKdfPgW4aAozzgBPAv4ZOGlK9+PRwIP65VdP6X58yMDyccCnpi1jP25f4HPAtcCR05YRWA28c9w/hzuZcQWwHnhYv/6Iacs4a/zpwPunLSPdBftX98uHAJvm2+5iP1JZCdxWVV+vqv8FLgSOnzXmeOAD/fIlwLOTZJoyVtWmqroJ2DbGXIOGyXhVVW3pV68FDpzCjD8cWN0HGPdvqQzz8whwDvDXwE/GGa43bMZJGibjacC7qur7AFX1P1OYcdCLgX8ZS7KfGyZjAQ/pl/cDvjXfRhd7qRwAfHNg/Y7+vjnHVNU9wGbg4WNJN+v5e3NlnLSdzXgq8MmRJvpFQ2VM8tokXwPeBrxuTNm2mzdjkqcAj6qqy8cZbMCw3+sT+9MhlyR51Hii/b9hMj4OeFyStUmuTfK8saXrDP1vpj9V/CvAZ8eQa9AwGc8GXpbkDuAKuiOq+7TYS0VjluRlwJHAeZPOMpeqeldVHQT8MfBnk84zKMlewN8CfzTpLPO4DJipqicBn+bnR/rTZAndKbCj6I4C3pfkoRNNtGOnAJdU1dZJB5nDi4ELqupA4Fjgg/3P6Q4t9lK5Exh8F3Vgf9+cY5IsoTvE++5Y0s16/t5cGSdtqIxJngO8CTiuqn46pmzb7ex+vBA4YaSJftF8GfcFDgWuTrIJeDpw6Zgv1s+7H6vquwPf338EjhhTtu2G+V7fAVxaVT+rqtuBW+lKZlx25ufxFMZ/6guGy3gq8BGAqroG2JtusskdG+eFoXHf6N6tfJ3u0HL7hagnzBrzWu59of4j05ZxYOwFTOZC/TD78XC6i34rpvh7vWJg+fnAumnLOGv81Yz/Qv0w+3H5wPILgGunMOPzgA/0y8voTvM8fJoy9uMOBjbR/yH6FO7HTwKr++XH011Tuc+sY30Rk7jRHbLd2v+H96b+vr+kezcNXfNeDNwGfAF47BRmfCrdO68f0x1FbZzCjP8O/DdwQ3+7dAoz/j2wsc931X39hz6pjLPGjr1UhtyP5/b78cZ+Px48hRlDdyrxy8AG4JRpy9ivnw2sGXe2ndiPhwBr++/1DcAx823TaVokSc0s9msqkqQxslQkSc1YKpKkZiwVSVIzlookqRlLRdoFSfbvZzr+ryQnDNz/8SSPHHOWK6b4r8W1h7FUpF3zYuA9dJPyvR4gyfOB9VU176R7OyvJ/Xb0WFUdW1VjnzZdmoulIu2anwEPApYCW/spfl5PN1HlnJK8sD+yuTHJ5/r7Vid558CYTyQ5ql/+UZK3J7kR+JMkFw+MOyrJJ/rlTUmWJVmT5LUDY85Ocka//MYkX+wngfyLhvtBuhdLRdo1H6abJvzTwFvpPpfng/Xz6f/nchbw3Ko6jO7zXOazD3BdP34N8LQk+/SPvYhu/rJBFwEnD6yfDFyU5Bi6ea9WAk8GjkjyrCGeX9pploq0C6pqc1WtqqojgS/RzSV2SZL39dPB//ocX7YWuCDJaXQfkDSfrcBH++e7B/gU8Pz+qGgV8PFZmdYDj0jyyCSHAd+vqm8Cx/S39X3Wgxnv5IragyyZdABpEXgz8Ba66yyfp/uwt48Bzx0cVFWvSvI0ukK4PskRwD3c+83d3gPLP6l7T4d+IfAHwPfoJsO8e44sFwMnAb9Md+QC3TxY51bVe3ft5UnD80hF2g1JVgAHVtXVdNdYttF9Wt4D5xh7UFVdV1VnAd+mm3Z8E/DkJHv1H3a18j6e7j+Ap9B9quHsU1/bXUQ32/ZJdAUDcCXwu0ke3Oc4IMkjduZ1SsPySEXaPW+h+wwZ6D4T41+BM+mun8x2Xl9CAT5DN/MrwO10s+neTHd6ak5VtbW/OL8aeMUOxmxMsi9wZ1Xd1d/3b0keD1zTf1L2j4CXAeP+iF3tAZylWJLUjKe/JEnNWCqSpGYsFUlSM5aKJKkZS0WS1IylIklqxlKRJDXzf4O37IPoSniUAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l42qmk_bVHvD", + "colab_type": "text" + }, + "source": [ + "After analyzing this information, we should notice the following:\n", + "- Most passengers are in their 20's or 30's \n", + "- Most passengers are male\n", + "- Most passengers are in \"Third\" class\n", + "- Females have a much higher chance of survival\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sIBZww6kOIAp", + "colab_type": "text" + }, + "source": [ + "### Training vs Testing Data\n", + "You may have noticed that we loaded **two different datasets** above. This is because when we train models, we need two sets of data: **training and testing**. \n", + "\n", + "The **training** data is what we feed to the model so that it can develop and learn. It is usually a much larger size than the testing data.\n", + "\n", + "The **testing** data is what we use to evaulate the model and see how well it is performing. We must use a seperate set of data that the model has not been trained on to evaluate it. Can you think of why this is?\n", + "\n", + "Well, the point of our model is to be able to make predictions on NEW data, data that we have never seen before. If we simply test the model on the data that it has already seen we cannot measure its accuracy accuratly. We can't be sure that the model hasn't simply memorized our training data. This is why we need our testing and training data to be seperate.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ar_cXv2jV8A3", + "colab_type": "text" + }, + "source": [ + "###Feature Columns\n", + "In our dataset we have two different kinds of information: **Categorical and Numeric**\n", + "\n", + "Our **categorical data** is anything that is not numeric! For example, the sex column does not use numbers, it uses the words \"male\" and \"female\".\n", + "\n", + "Before we continue and create/train a model we must convet our categorical data into numeric data. We can do this by encoding each category with an integer (ex. male = 1, female = 2). \n", + "\n", + "Fortunately for us TensorFlow has some tools to help!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-lcnwG0VXF5h", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 54 + }, + "outputId": "a4603a14-5903-4e75-a8cf-6a2ccca69946" + }, + "source": [ + "CATEGORICAL_COLUMNS = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck',\n", + " 'embark_town', 'alone']\n", + "NUMERIC_COLUMNS = ['age', 'fare']\n", + "\n", + "feature_columns = []\n", + "for feature_name in CATEGORICAL_COLUMNS:\n", + " vocabulary = dftrain[feature_name].unique() # gets a list of all unique values from given feature column\n", + " feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))\n", + "\n", + "for feature_name in NUMERIC_COLUMNS:\n", + " feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))\n", + "\n", + "print(feature_columns)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "[VocabularyListCategoricalColumn(key='sex', vocabulary_list=('male', 'female'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='n_siblings_spouses', vocabulary_list=(1, 0, 3, 4, 2, 5, 8), dtype=tf.int64, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='parch', vocabulary_list=(0, 1, 2, 5, 3, 4), dtype=tf.int64, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='class', vocabulary_list=('Third', 'First', 'Second'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='deck', vocabulary_list=('unknown', 'C', 'G', 'A', 'B', 'D', 'F', 'E'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='embark_town', vocabulary_list=('Southampton', 'Cherbourg', 'Queenstown', 'unknown'), dtype=tf.string, default_value=-1, num_oov_buckets=0), VocabularyListCategoricalColumn(key='alone', vocabulary_list=('n', 'y'), dtype=tf.string, default_value=-1, num_oov_buckets=0), NumericColumn(key='age', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='fare', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "l-nazdZpXgAr", + "colab_type": "text" + }, + "source": [ + "Let's break this code down a little bit...\n", + "\n", + "Essentially what we are doing here is creating a list of features that are used in our dataset. \n", + "\n", + "The cryptic lines of code inside the ```append()``` create an object that our model can use to map string values like \"male\" and \"female\" to integers. This allows us to avoid manually having to encode our dataframes.\n", + "\n", + "*And here is some relevant documentation*\n", + "\n", + "https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list?version=stable\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UQlXWErlbhsG", + "colab_type": "text" + }, + "source": [ + "###The Training Process\n", + "So, we are almost done preparing our dataset and I feel as though it's a good time to explain how our model is trained. Specifically, how input data is fed to our model. \n", + "\n", + "For this specific model data is going to be streamed into it in small batches of 32. This means we will not feed the entire dataset to our model at once, but simply small batches of entries. We will feed these batches to our model multiple times according to the number of **epochs**. \n", + "\n", + "An **epoch** is simply one stream of our entire dataset. The number of epochs we define is the amount of times our model will see the entire dataset. We use multiple epochs in hope that after seeing the same data multiple times the model will better determine how to estimate it.\n", + "\n", + "Ex. if we have 10 ephocs, our model will see the same dataset 10 times. \n", + "\n", + "Since we need to feed our data in batches and multiple times, we need to create something called an **input function**. The input function simply defines how our dataset will be converted into batches at each epoch.\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OO0mBu_WaVXp", + "colab_type": "text" + }, + "source": [ + "###Input Function\n", + "The TensorFlow model we are going to use requires that the data we pass it comes in as a ```tf.data.Dataset``` object. This means we must create a *input function* that can convert our current pandas dataframe into that object. \n", + "\n", + "Below you'll see a seemingly complicated input function, this is straight from the TensorFlow documentation (https://www.tensorflow.org/tutorials/estimator/linear). I've commented as much as I can to make it understandble, but you may want to refer to the documentation for a detailed explination of each method." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "I3qcbvOYbIwa", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):\n", + " def input_function(): # inner function, this will be returned\n", + " ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # create tf.data.Dataset object with data and its label\n", + " if shuffle:\n", + " ds = ds.shuffle(1000) # randomize order of data\n", + " ds = ds.batch(batch_size).repeat(num_epochs) # split dataset into batches of 32 and repeat process for number of epochs\n", + " return ds # return a batch of the dataset\n", + " return input_function # return a function object for use\n", + "\n", + "train_input_fn = make_input_fn(dftrain, y_train) # here we will call the input_function that was returned to us to get a dataset object we can feed to the model\n", + "eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FXqlPst9fpx4", + "colab_type": "text" + }, + "source": [ + "###Creating the Model\n", + "In this tutorial we are going to use a linear estimator to utilize the linear regression algorithm. \n", + "\n", + "Creating one is pretty easy! Have a look below.\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "q1Wo8brFf663", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 192 + }, + "outputId": "73b78e55-1fc4-44b0-9fc6-33506a199db5" + }, + "source": [ + "linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)\n", + "# We create a linear estimtor by passing the feature columns we created earlier" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "INFO:tensorflow:Using default config.\n", + "WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpupqmt1yp\n", + "INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpupqmt1yp', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n", + "graph_options {\n", + " rewrite_options {\n", + " meta_optimizer_iterations: ONE\n", + " }\n", + "}\n", + ", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "K5GPkW_CgDFy", + "colab_type": "text" + }, + "source": [ + "###Training the Model\n", + "Training the model is as easy as passing the input functions that we created earlier." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "J11OJrlZgPhb", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "cd3c7099-fdde-45aa-b57b-f1c8d1d02024" + }, + "source": [ + "linear_est.train(train_input_fn) # train\n", + "result = linear_est.evaluate(eval_input_fn) # get model metrics/stats by testing on tetsing data\n", + "\n", + "clear_output() # clears consoke output\n", + "print(result['accuracy']) # the result variable is simply a dict of stats about our model" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "0.7689394\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LisxO81tgi1n", + "colab_type": "text" + }, + "source": [ + "And we now we have a model with a 74% accuracy (this will change each time)! Not crazy impressive but decent for our first try.\n", + "\n", + "Now let's see how we can actually use this model to make predicitons.\n", + "\n", + "We can use the ```.predict()``` method to get survival probabilities from the model. This method will return a list of dicts that store a predicition for each of the entries in our testing data set. Below we've used some pandas magic to plot a nice graph of the predictions.\n", + "\n", + "As you can see the survival rate is not very high :/" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "JQz0Lj60hjLI", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 525 + }, + "outputId": "eec262fd-e206-46b6-d8d7-7c9067546f16" + }, + "source": [ + "pred_dicts = list(linear_est.predict(eval_input_fn))\n", + "probs = pd.Series([pred['probabilities'][1] for pred in pred_dicts])\n", + "\n", + "probs.plot(kind='hist', bins=20, title='predicted probabilities')" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "INFO:tensorflow:Calling model_fn.\n", + "WARNING:tensorflow:Layer linear/linear_model is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because it's dtype defaults to floatx.\n", + "\n", + "If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.\n", + "\n", + "To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.\n", + "\n", + "INFO:tensorflow:Done calling model_fn.\n", + "INFO:tensorflow:Graph was finalized.\n", + "INFO:tensorflow:Restoring parameters from /tmp/tmpupqmt1yp/model.ckpt-200\n", + "INFO:tensorflow:Running local_init_op.\n", + "INFO:tensorflow:Done running local_init_op.\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 28 + }, + { + "output_type": "display_data", + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEICAYAAABYoZ8gAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAXc0lEQVR4nO3de5hddX3v8fcHAnKRe8aI4RIviFIVxEHrY62XQAVUQlukUPAES422Hi+PPtaAeqRWPHCsIl5ajaDEGxBQJIo3jCjHtoAJYBUCBwiJCbeMIZiISgx+zh/rN7qZTGZWkll7z8z6vJ5nnln39d2/TD7zm99ae23ZJiIi2mO7XhcQERHdleCPiGiZBH9ERMsk+CMiWibBHxHRMgn+iIiWSfDHuCRpuaQjy/SZki7owjlfKmlV0+cp5zpN0o+2ct8R65T0KUnvHW5bSbdIeukI+35L0uytqSsmjim9LiBiNLY/WGc7SRcBq2y/p9mKxjfbbxxh3Z8MTks6C3ia7VM71h/TbHUxHqTHH42T1LoORhtfc0wcCf7YKmUo5gxJt0paK+lzknYq614qaZWkd0m6H/icpO0kzZV0l6Q1khZI2rvjeK+VtKKse/eQc50l6Ysd838m6T8lPSRpZRk2mQOcAvyTpF9J+nrZ9kmSviJpQNLdkt7ScZydJV1U6r8VOGKU12xJb5G0TNIvJH1I0nZl3WmS/kPSeZLWAGdJ2kPS58u5V0h6z+D2fzykPiHpl5JukzSzY8XrJC2VtL6c7w3D1HNmqWO5pFM6ll8k6QMj/LsdKelo4Ezgb0p7/aSs/4Gkv+/Y/u9KHWslfUfSgYOFl9e6WtI6ST+V9KyR2i/GjwR/bItTgFcATwWeDnQOsTwR2Bs4EJgDvBk4HngJ8CRgLfBJAEmHAP8OvLas2wfYb7gTluD5FvBxoA84DLjZ9jzgS8D/sf14268uIft14CfAdGAm8DZJryiHe1+p/anlddQZ2/5LoB84HJgF/F3HuhcAy4BpwNmlxj2Ap5TX/T+A1w3Z/i5gaqnlqx2/DFcDrwJ2L/ucJ+nwjn2fWPabXuqeJ+ngGvUDYPvbwAeBS0t7HTp0G0mzqH45/BVVW/9f4OKy+i+AP6f6d98DOBFYU/f80VsJ/tgWn7C90vaDVEF3cse63wPvs/2I7d8AbwTebXuV7UeAs4ATypDICcA3bF9b1r237D+cvwW+Z/ti27+zvcb2zZvZ9gigz/b7bW+wvQz4DHBSWX8icLbtB22vBD5W4zWfW7b/OfDRIa/5Xtsft70R2FDOc4bt9baXAx+m+uU2aDXw0fI6LgVuB14JYPsq23e58kPgu8CLh9Ty3tK+PwSuKq9nLL0R+N+2l5bX9EHgsPLL93fAbsAzAJVt7hvj80dDEvyxLVZ2TK+g6q0PGrD92475A4EryvDMQ8BS4FGq3vGTOo9l+2E233vcn6qXXMeBwJMGz1nOe2Y5J0PPW17DaEZ6zZ3rpgI7DDnmCqoe+qB7/NinJP7heJKOkXSdpAdL3ceWYw5aW9ppc7WMhQOB8zva7kFAwHTb3wc+QfVX22pJ8yTtPsbnj4Yk+GNb7N8xfQBwb8f80Me+rgSOsb1nx9dOtu8B7us8lqRdqIZ7hrOSamhmOMOd8+4h59zN9rFl/WPOW17DaOq+5l9Q9YoPHLL9PR3z0yVp6PEkPQ74CvCvwDTbewLfpArdQXtJ2nWEWuoY7dG8K4E3DGm/nW3/J4Dtj9l+HnAI1ZDPO7fw/NEjCf7YFm+StF8Zl343cOkI234KOLvj4mBfGUMGuBx4VblouyPwfjb/s/kl4EhJJ0qaImkfSYeVdQ9QjacPugFYXy4y7yxpe0nPkjR4EXcBcIakvSTtR3UdYjTvLNvvD7x1c6/Z9qPl+GdL2q287rcDX+zY7AnAWyTtIOk1wDOpAn5H4HHAALBR0jFUY+pD/bOkHSW9mOp6wGU16u/0ADBjyAXnTp+iap8/ASgXq19Tpo+Q9AJJOwAPA79l88NzMc4k+GNbfJlq7HkZ1fDLsHeSFOcDC4HvSloPXEd1cRPbtwBvKse7j+rC77BvUCpj68cC76AaergZGLwweSFwSBma+FoJ31dRXQC+m6oXfgHVxUiAf6YaIrm7vI4v1HjNVwJLynmvKufcnDdTheIy4Efl9X22Y/31wEGlrrOBE8o1i/XAW6h+cayluq6xcMix7y/r7qX6ZfhG27fVqL/T4C+KNZJuHLrS9hXAucAlktYBPwMG7/Pfnep6yVqqNlwDfGgLzx89onwQS2wNScuBv7f9vV7X0i2SDBxk+85e1xKxLdLjj4homQR/RETLZKgnIqJl0uOPiGiZCfEgqalTp3rGjBm9LiMiYkJZsmTJL2z3DV0+IYJ/xowZLF68uNdlRERMKJKGfTd6hnoiIlomwR8R0TIJ/oiIlknwR0S0TII/IqJlEvwRES2T4I+IaJkEf0REyyT4IyJaZkK8c7dXZsy9aqv3XX7OK8ewkoiIsZMef0REyyT4IyJaJsEfEdEyCf6IiJZJ8EdEtEyCPyKiZRL8EREtk+CPiGiZBH9ERMs0FvySDpZ0c8fXOklvk7S3pKsl3VG+79VUDRERsanGgt/27bYPs30Y8Dzg18AVwFxgke2DgEVlPiIiuqRbQz0zgbtsrwBmAfPL8vnA8V2qISIi6F7wnwRcXKan2b6vTN8PTOtSDRERQReCX9KOwHHAZUPX2Tbgzew3R9JiSYsHBgYarjIioj260eM/BrjR9gNl/gFJ+wKU76uH28n2PNv9tvv7+vq6UGZERDt0I/hP5o/DPAALgdllejZwZRdqiIiIotHgl7QrcBTw1Y7F5wBHSboDOLLMR0RElzT6CVy2Hwb2GbJsDdVdPhER0QN5525ERMsk+CMiWibBHxHRMgn+iIiWSfBHRLRMgj8iomUS/BERLZPgj4homQR/RETLJPgjIlomwR8R0TIJ/oiIlknwR0S0TII/IqJlEvwRES2T4I+IaJkEf0REyyT4IyJaJsEfEdEyTX/Y+p6SLpd0m6Slkl4oaW9JV0u6o3zfq8kaIiLisZru8Z8PfNv2M4BDgaXAXGCR7YOARWU+IiK6pLHgl7QH8OfAhQC2N9h+CJgFzC+bzQeOb6qGiIjYVJM9/icDA8DnJN0k6QJJuwLTbN9XtrkfmDbczpLmSFosafHAwECDZUZEtEuTwT8FOBz4d9vPBR5myLCObQMebmfb82z32+7v6+trsMyIiHZpMvhXAatsX1/mL6f6RfCApH0ByvfVDdYQERFDNBb8tu8HVko6uCyaCdwKLARml2WzgSubqiEiIjY1peHjvxn4kqQdgWXA66h+2SyQdDqwAjix4RoiIqJDo8Fv+2agf5hVM5s8b0REbF7euRsR0TIJ/oiIlknwR0S0TII/IqJlEvwRES2T4I+IaJkEf0REyyT4IyJaJsEfEdEyCf6IiJZJ8EdEtEyCPyKiZRL8EREtk+CPiGiZBH9ERMsk+CMiWibBHxHRMgn+iIiWSfBHRLRMo5+5K2k5sB54FNhou1/S3sClwAxgOXCi7bVN1hEREX/UjR7/y2wfZnvwQ9fnAotsHwQsKvMREdElvRjqmQXML9PzgeN7UENERGs1HfwGvitpiaQ5Zdk02/eV6fuBacPtKGmOpMWSFg8MDDRcZkREezQ6xg/8me17JD0BuFrSbZ0rbVuSh9vR9jxgHkB/f/+w20RExJZrtMdv+57yfTVwBfB84AFJ+wKU76ubrCEiIh6rseCXtKuk3Qangb8AfgYsBGaXzWYDVzZVQ0REbKrJoZ5pwBWSBs/zZdvflvRjYIGk04EVwIkN1hAREUM0Fvy2lwGHDrN8DTCzqfNGRMTIag31SHp204VERER31B3j/zdJN0j6R0l7NFpRREQ0qlbw234xcAqwP7BE0pclHdVoZRER0Yjad/XYvgN4D/Au4CXAxyTdJumvmiouIiLGXt0x/udIOg9YCrwceLXtZ5bp8xqsLyIixljdu3o+DlwAnGn7N4MLbd8r6T2NVBYREY2oG/yvBH5j+1EASdsBO9n+te0vNFZdRESMubpj/N8Ddu6Y36Usi4iICaZu8O9k+1eDM2V6l2ZKioiIJtUN/oclHT44I+l5wG9G2D4iIsapumP8bwMuk3QvIOCJwN80VlVERDSmVvDb/rGkZwAHl0W32/5dc2VFRERTtuQhbUdQfUD6FOBwSdj+fCNVRUREY2oFv6QvAE8FbgYeLYsNJPgjIiaYuj3+fuAQ2/kIxIiICa7uXT0/o7qgGxERE1zdHv9U4FZJNwCPDC60fVwjVUVERGPqBv9ZTRYRERHdU/d5/D8ElgM7lOkfAzfW2VfS9pJukvSNMv9kSddLulPSpZJ23MraIyJiK9R9LPPrgcuBT5dF04Gv1TzHW6ke5zzoXOA8208D1gKn1zxORESMgboXd98EvAhYB3/4UJYnjLaTpP2onux5QZkX1TP8Ly+bzAeO37KSIyJiW9QN/kdsbxickTSF6j7+0XwU+Cfg92V+H+Ah2xvL/Cqqvx42IWmOpMWSFg8MDNQsMyIiRlM3+H8o6Uxg5/JZu5cBXx9pB0mvAlbbXrI1hdmeZ7vfdn9fX9/WHCIiIoZR966euVRj8T8F3gB8kzJ8M4IXAcdJOhbYCdgdOB/YU9KU0uvfD7hnawqPiIitU/eunt/b/ozt19g+oUyPONRj+wzb+9meAZwEfN/2KcA1wAlls9nAldtQf0REbKG6z+q5m2HG9G0/ZSvO+S7gEkkfAG4CLtyKY0RExFbakmf1DNoJeA2wd92T2P4B8IMyvQx4ft19IyJibNUd6lnT8XWP7Y9S3aYZERETTN2hnsM7Zrej+gtgS57lHxER40Td8P5wx/RGqsc3nDjm1UREROPqfvTiy5ouJCIiuqPuUM/bR1pv+yNjU05ERDRtS+7qOQJYWOZfDdwA3NFEURER0Zy6wb8fcLjt9QCSzgKusn1qU4VFREQz6j6rZxqwoWN+Q1kWERETTN0e/+eBGyRdUeaPp3qkckRETDB17+o5W9K3gBeXRa+zfVNzZUVERFPqDvUA7AKss30+sErSkxuqKSIiGlT3oxffR/VwtTPKoh2ALzZVVERENKduj/8vgeOAhwFs3wvs1lRRERHRnLrBv6E8f98AknZtrqSIiGhS3eBfIOnTVJ+e9Xrge8BnmisrIiKaMupdPZIEXAo8A1gHHAz8L9tXN1xbREQ0YNTgt21J37T9bCBhHxExwdUd6rlR0hGNVhIREV1R9527LwBOlbSc6s4eUf0x8JymCouIiGaMGPySDrD9c+AVW3pgSTsB1wKPK+e53Pb7yhu/LgH2AZYAr7W9YfNHioiIsTTaUM/XAGyvAD5ie0Xn1yj7PgK83PahwGHA0ZL+FDgXOM/204C1wOnb9hIiImJLjBb86ph+ypYc2JVfldkdypeBlwOXl+XzqR74FhERXTJa8Hsz07VI2l7SzcBqqjuC7gIesr2xbLIKmL6ZfedIWixp8cDAwJaeOiIiNmO04D9U0jpJ64HnlOl1ktZLWjfawW0/avswqg9yeT7VewFqsT3Pdr/t/r6+vrq7RUTEKEa8uGt7+7E4ie2HJF0DvJDq3b9TSq9/P+CesThHRETUsyWPZd4ikvok7VmmdwaOApYC1wAnlM1mA1c2VUNERGyq7n38W2NfYL6k7al+wSyw/Q1JtwKXSPoAcBNwYYM1RETEEI0Fv+3/Bp47zPJlVOP9ERHRA40N9URExPiU4I+IaJkEf0REyyT4IyJaJsEfEdEyCf6IiJZJ8EdEtEyCPyKiZRL8EREtk+CPiGiZBH9ERMsk+CMiWibBHxHRMgn+iIiWSfBHRLRMgj8iomUS/BERLZPgj4homSY/bH1/SddIulXSLZLeWpbvLelqSXeU73s1VUNERGyqyQ9b3wi8w/aNknYDlki6GjgNWGT7HElzgbnAuxqsoydmzL1qm/Zffs4rx6iSiIjHaqzHb/s+2zeW6fXAUmA6MAuYXzabDxzfVA0REbGprozxS5oBPBe4Hphm+76y6n5gWjdqiIiISpNDPQBIejzwFeBtttdJ+sM625bkzew3B5gDcMABBzRdZkSrbMtQZIYhJ75Ge/ySdqAK/S/Z/mpZ/ICkfcv6fYHVw+1re57tftv9fX19TZYZEdEqTd7VI+BCYKntj3SsWgjMLtOzgSubqiEiIjbV5FDPi4DXAj+VdHNZdiZwDrBA0unACuDEBmuIiIghGgt+2z8CtJnVM5s6b0REjKzxi7uxdXLxLSKakkc2RES0TII/IqJlEvwRES2T4I+IaJkEf0REyyT4IyJaJsEfEdEyk/4+/m19Ln5ExGSTHn9ERMsk+CMiWibBHxHRMgn+iIiWSfBHRLTMpL+rp43yZM+YrPKzPTbS44+IaJkEf0REy2SoJyJaIcNEf5Qef0REyzQW/JI+K2m1pJ91LNtb0tWS7ijf92rq/BERMbwme/wXAUcPWTYXWGT7IGBRmY+IiC5qLPhtXws8OGTxLGB+mZ4PHN/U+SMiYnjdHuOfZvu+Mn0/MG1zG0qaI2mxpMUDAwPdqS4iogV6dnHXtgGPsH6e7X7b/X19fV2sLCJicut28D8gaV+A8n11l88fEdF63b6PfyEwGzinfL+yy+ePBvXyQ28m233WdeRDhrpnsr0HoMnbOS8G/gs4WNIqSadTBf5Rku4AjizzERHRRY31+G2fvJlVM5s6Z0REjC6PbIiILTLZhj2aNh7bK49siIhomQR/RETLJPgjIlomwR8R0TK5uBuPkXvDt8x4vHAXMZr0+CMiWibBHxHRMhnqiYiuyVDi+JAef0REyyT4IyJaJsEfEdEyCf6IiJZJ8EdEtEzu6olJYSK+kSp3uESvpMcfEdEy6fFH66XnHW2THn9ERMsk+CMiWqYnwS/paEm3S7pT0txe1BAR0VZdD35J2wOfBI4BDgFOlnRIt+uIiGirXvT4nw/caXuZ7Q3AJcCsHtQREdFKvbirZzqwsmN+FfCCoRtJmgPMKbO/knT7MMeaCvxizCuceNIOlbRDJe0wSdpA527zIQ4cbuG4vZ3T9jxg3kjbSFpsu79LJY1baYdK2qGSdkgbjKYXQz33APt3zO9XlkVERBf0Ivh/DBwk6cmSdgROAhb2oI6IiFbq+lCP7Y2S/ifwHWB74LO2b9nKw404FNQiaYdK2qGSdkgbjEi2e11DRER0Ud65GxHRMgn+iIiWGffBP9rjHSQ9TtKlZf31kmZ0v8rm1WiHt0u6VdJ/S1okadj7dye6uo/7kPTXkixpUt7SV6cdJJ1YfiZukfTlbtfYDTX+Xxwg6RpJN5X/G8f2os5xx/a4/aK6+HsX8BRgR+AnwCFDtvlH4FNl+iTg0l7X3aN2eBmwS5n+h7a2Q9luN+Ba4Dqgv9d19+jn4SDgJmCvMv+EXtfdo3aYB/xDmT4EWN7rusfD13jv8dd5vMMsYH6ZvhyYKUldrLEbRm0H29fY/nWZvY7q/RGTTd3HffwLcC7w224W10V12uH1wCdtrwWwvbrLNXZDnXYwsHuZ3gO4t4v1jVvjPfiHe7zD9M1tY3sj8Etgn65U1z112qHT6cC3Gq2oN0ZtB0mHA/vbnsyfrlLn5+HpwNMl/Yek6yQd3bXquqdOO5wFnCppFfBN4M3dKW18G7ePbIitI+lUoB94Sa9r6TZJ2wEfAU7rcSnjwRSq4Z6XUv31d62kZ9t+qKdVdd/JwEW2PyzphcAXJD3L9u97XVgvjfcef53HO/xhG0lTqP6cW9OV6rqn1mMuJB0JvBs4zvYjXaqtm0Zrh92AZwE/kLQc+FNg4SS8wFvn52EVsND272zfDfw/ql8Ek0mddjgdWABg+7+Anage4NZq4z346zzeYSEwu0yfAHzf5UrOJDJqO0h6LvBpqtCfjOO5MEo72P6l7am2Z9ieQXWt4zjbi3tTbmPq/L/4GlVvH0lTqYZ+lnWzyC6o0w4/B2YCSHomVfAPdLXKcWhcB38Zsx98vMNSYIHtWyS9X9JxZbMLgX0k3Qm8HZh0n+hVsx0+BDweuEzSzZIm3fOParbDpFezHb4DrJF0K3AN8E7bk+ov4Zrt8A7g9ZJ+AlwMnDYJO4ZbLI9siIhomXHd44+IiLGX4I+IaJkEf0REyyT4IyJaJsEfEdEyCf6IiJZJ8EdEtMz/B8NHHYQmhMhXAAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "tags": [], + "needs_background": "light" + } + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CN_r0Vn8VOf5", + "colab_type": "text" + }, + "source": [ + "That's it for linear regression! Now onto classification." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hG9gxhAqVTBT", + "colab_type": "text" + }, + "source": [ + "##Classification\n", + "Now that we've covered linear regression it is time to talk about classification. Where regression was used to predict a numeric value, classification is used to seperate data points into classes of different labels. In this example we will use a TensorFlow estimator to classify flowers.\n", + "\n", + "Since we've touched on how estimators work earlier, I'll go a bit quicker through this example. \n", + "\n", + "This section is based on the following guide from the TensorFlow website.\n", + "https://www.tensorflow.org/tutorials/estimator/premade\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iWk2Kb7Sdk-T", + "colab_type": "text" + }, + "source": [ + "###Imports and Setup" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "eH4_xJaD605_", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 103 + }, + "outputId": "cbe50069-f9c1-489e-ebdb-e08af4d7cea5" + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "`%tensorflow_version` only switches the major version: 1.x or 2.x.\n", + "You set: `2.x # this line is not required unless you are in a notebook`. This will be interpreted as: `2.x`.\n", + "\n", + "\n", + "TensorFlow is already loaded. Please restart the runtime to change versions.\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "TMiLc6LPdoPm", + "colab_type": "code", + "colab": {} + }, + "source": [ + "from __future__ import absolute_import, division, print_function, unicode_literals\n", + "\n", + "\n", + "import tensorflow as tf\n", + "\n", + "import pandas as pd" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9zLNzmkGds1U", + "colab_type": "text" + }, + "source": [ + "###Dataset\n", + "This specific dataset seperates flowers into 3 different classes of species.\n", + "- Setosa\n", + "- Versicolor\n", + "- Virginica\n", + "\n", + "The information about each flower is the following.\n", + "- sepal length\n", + "- sepal width\n", + "- petal length\n", + "- petal width" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "puOQDTNKeCRC", + "colab_type": "code", + "colab": {} + }, + "source": [ + "CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']\n", + "SPECIES = ['Setosa', 'Versicolor', 'Virginica']\n", + "# Lets define some constants to help us later on" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "oMW41Wd9eLIo", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 106 + }, + "outputId": "4950908e-de50-4839-a569-a1f5acf015f8" + }, + "source": [ + "train_path = tf.keras.utils.get_file(\n", + " \"iris_training.csv\", \"https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv\")\n", + "test_path = tf.keras.utils.get_file(\n", + " \"iris_test.csv\", \"https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv\")\n", + "\n", + "train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)\n", + "test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)\n", + "# Here we use keras (a module inside of TensorFlow) to grab our datasets and read them into a pandas dataframe" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv\n", + "8192/2194 [================================================================================================================] - 0s 0us/step\n", + "Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv\n", + "8192/573 [============================================================================================================================================================================================================================================================================================================================================================================================================================================] - 0s 0us/step\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4aHRWY47ecdr", + "colab_type": "text" + }, + "source": [ + "Let's have a look at our data." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "BQ9uo6KkegBH", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 201 + }, + "outputId": "ed4b9d06-06da-4014-f04d-95bc35dc64ba" + }, + "source": [ + "train.head()" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SepalLengthSepalWidthPetalLengthPetalWidthSpecies
06.42.85.62.22
15.02.33.31.01
24.92.54.51.72
34.93.11.50.10
45.73.81.70.30
\n", + "
" + ], + "text/plain": [ + " SepalLength SepalWidth PetalLength PetalWidth Species\n", + "0 6.4 2.8 5.6 2.2 2\n", + "1 5.0 2.3 3.3 1.0 1\n", + "2 4.9 2.5 4.5 1.7 2\n", + "3 4.9 3.1 1.5 0.1 0\n", + "4 5.7 3.8 1.7 0.3 0" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 33 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7PzWyoE9eu8H", + "colab_type": "text" + }, + "source": [ + "Now we can pop the species column off and use that as our label." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "fP_nlslke4U8", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 201 + }, + "outputId": "560a0559-ebf8-43a8-bca7-4fa25d9c06ce" + }, + "source": [ + "train_y = train.pop('Species')\n", + "test_y = test.pop('Species')\n", + "train.head() # the species column is now gone" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
SepalLengthSepalWidthPetalLengthPetalWidth
06.42.85.62.2
15.02.33.31.0
24.92.54.51.7
34.93.11.50.1
45.73.81.70.3
\n", + "
" + ], + "text/plain": [ + " SepalLength SepalWidth PetalLength PetalWidth\n", + "0 6.4 2.8 5.6 2.2\n", + "1 5.0 2.3 3.3 1.0\n", + "2 4.9 2.5 4.5 1.7\n", + "3 4.9 3.1 1.5 0.1\n", + "4 5.7 3.8 1.7 0.3" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 34 + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "3oVw2zRkfTXq", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "a25c814d-63e4-40da-9ac9-6efc326164b2" + }, + "source": [ + "train.shape # we have 120 entires with 4 features" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(120, 4)" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 35 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "V6ZWVZs8fg-H", + "colab_type": "text" + }, + "source": [ + "###Input Function\n", + "Remember that nasty input function we created earlier. Well we need to make another one here! Fortunatly for us this one is a little easier to digest." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "K-NQRLv_fyhg", + "colab_type": "code", + "colab": {} + }, + "source": [ + "def input_fn(features, labels, training=True, batch_size=256):\n", + " # Convert the inputs to a Dataset.\n", + " dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))\n", + "\n", + " # Shuffle and repeat if you are in training mode.\n", + " if training:\n", + " dataset = dataset.shuffle(1000).repeat()\n", + " \n", + " return dataset.batch(batch_size)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FL--3OAnf4mO", + "colab_type": "text" + }, + "source": [ + "###Feature Columns\n", + "And you didn't think we forgot about the feature columns, did you?\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "nErIJJbggQ5w", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 54 + }, + "outputId": "837852b1-97a0-4965-88fc-50883dfb558c" + }, + "source": [ + "# Feature columns describe how to use the input.\n", + "my_feature_columns = []\n", + "for key in train.keys():\n", + " my_feature_columns.append(tf.feature_column.numeric_column(key=key))\n", + "print(my_feature_columns)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5kl1Wr_Xgpmv", + "colab_type": "text" + }, + "source": [ + "###Building the Model\n", + "And now we are ready to choose a model. For classification tasks there are variety of different estimators/models that we can pick from. Some options are listed below.\n", + "- ```DNNClassifier``` (Deep Neural Network)\n", + "- ```LinearClassifier```\n", + "\n", + "We can choose either model but the DNN seems to be the best choice. This is because we may not be able to find a linear coorespondence in our data. \n", + "\n", + "So let's build a model!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "n7YVQowgiDak", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 192 + }, + "outputId": "bd9457f3-7d30-456c-aa8b-3d160c0f751f" + }, + "source": [ + "# Build a DNN with 2 hidden layers with 30 and 10 hidden nodes each.\n", + "classifier = tf.estimator.DNNClassifier(\n", + " feature_columns=my_feature_columns,\n", + " # Two hidden layers of 30 and 10 nodes respectively.\n", + " hidden_units=[30, 10],\n", + " # The model must choose between 3 classes.\n", + " n_classes=3)" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "INFO:tensorflow:Using default config.\n", + "WARNING:tensorflow:Using temporary folder as model directory: /tmp/tmpqaqtrlgy\n", + "INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpqaqtrlgy', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n", + "graph_options {\n", + " rewrite_options {\n", + " meta_optimizer_iterations: ONE\n", + " }\n", + "}\n", + ", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jQ_SJAMuiF6p", + "colab_type": "text" + }, + "source": [ + "What we've just done is created a deep neural network that has two hidden layers. These layers have 30 and 10 neurons respectively. This is the number of neurons the TensorFlow official tutorial uses so we'll stick with it. However, it is worth mentioning that the number of hidden neurons is an arbitrary number and many experiments and tests are usually done to determine the best choice for these values. Try playing around with the number of hidden neurons and see if your results change." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NBHnWYKTjV5D", + "colab_type": "text" + }, + "source": [ + "###Training\n", + "Now it's time to train the model!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "INug63pCjaOw", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "outputId": "3eb6211c-9fd1-4748-8e2e-31e64b4ec698" + }, + "source": [ + "classifier.train(\n", + " input_fn=lambda: input_fn(train, train_y, training=True),\n", + " steps=5000)\n", + "# We include a lambda to avoid creating an inner function previously" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "INFO:tensorflow:Calling model_fn.\n", + "WARNING:tensorflow:Layer dnn is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because it's dtype defaults to floatx.\n", + "\n", + "If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.\n", + "\n", + "To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.\n", + "\n", + "INFO:tensorflow:Done calling model_fn.\n", + "INFO:tensorflow:Create CheckpointSaverHook.\n", + "INFO:tensorflow:Graph was finalized.\n", + "INFO:tensorflow:Running local_init_op.\n", + "INFO:tensorflow:Done running local_init_op.\n", + "INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...\n", + "INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpqaqtrlgy/model.ckpt.\n", + "INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...\n", + "INFO:tensorflow:loss = 1.1511875, step = 0\n", + "INFO:tensorflow:global_step/sec: 497.034\n", + "INFO:tensorflow:loss = 0.9358924, step = 100 (0.208 sec)\n", + "INFO:tensorflow:global_step/sec: 636.632\n", + "INFO:tensorflow:loss = 0.90521353, step = 200 (0.154 sec)\n", + "INFO:tensorflow:global_step/sec: 654.875\n", + "INFO:tensorflow:loss = 0.8495918, step = 300 (0.153 sec)\n", + "INFO:tensorflow:global_step/sec: 651.748\n", + "INFO:tensorflow:loss = 0.82983136, step = 400 (0.153 sec)\n", + "INFO:tensorflow:global_step/sec: 655.704\n", + "INFO:tensorflow:loss = 0.7790096, step = 500 (0.154 sec)\n", + "INFO:tensorflow:global_step/sec: 667.452\n", + "INFO:tensorflow:loss = 0.7574419, step = 600 (0.149 sec)\n", + "INFO:tensorflow:global_step/sec: 675.438\n", + "INFO:tensorflow:loss = 0.7266582, step = 700 (0.148 sec)\n", + "INFO:tensorflow:global_step/sec: 666.098\n", + "INFO:tensorflow:loss = 0.7117815, step = 800 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 659.401\n", + "INFO:tensorflow:loss = 0.6849373, step = 900 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 664.227\n", + "INFO:tensorflow:loss = 0.6696919, step = 1000 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 674.756\n", + "INFO:tensorflow:loss = 0.6453049, step = 1100 (0.151 sec)\n", + "INFO:tensorflow:global_step/sec: 664.265\n", + "INFO:tensorflow:loss = 0.63444245, step = 1200 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 646.941\n", + "INFO:tensorflow:loss = 0.61837286, step = 1300 (0.155 sec)\n", + "INFO:tensorflow:global_step/sec: 675.566\n", + "INFO:tensorflow:loss = 0.60927534, step = 1400 (0.149 sec)\n", + "INFO:tensorflow:global_step/sec: 693.653\n", + "INFO:tensorflow:loss = 0.59321785, step = 1500 (0.144 sec)\n", + "INFO:tensorflow:global_step/sec: 643.728\n", + "INFO:tensorflow:loss = 0.5786046, step = 1600 (0.155 sec)\n", + "INFO:tensorflow:global_step/sec: 640.974\n", + "INFO:tensorflow:loss = 0.5679003, step = 1700 (0.156 sec)\n", + "INFO:tensorflow:global_step/sec: 668.473\n", + "INFO:tensorflow:loss = 0.55420506, step = 1800 (0.147 sec)\n", + "INFO:tensorflow:global_step/sec: 683.02\n", + "INFO:tensorflow:loss = 0.51620936, step = 1900 (0.149 sec)\n", + "INFO:tensorflow:global_step/sec: 668.192\n", + "INFO:tensorflow:loss = 0.5385388, step = 2000 (0.149 sec)\n", + "INFO:tensorflow:global_step/sec: 689.724\n", + "INFO:tensorflow:loss = 0.523809, step = 2100 (0.143 sec)\n", + "INFO:tensorflow:global_step/sec: 674.861\n", + "INFO:tensorflow:loss = 0.50772285, step = 2200 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 645.258\n", + "INFO:tensorflow:loss = 0.49664536, step = 2300 (0.155 sec)\n", + "INFO:tensorflow:global_step/sec: 664.988\n", + "INFO:tensorflow:loss = 0.49152842, step = 2400 (0.148 sec)\n", + "INFO:tensorflow:global_step/sec: 676.346\n", + "INFO:tensorflow:loss = 0.4791621, step = 2500 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 682.166\n", + "INFO:tensorflow:loss = 0.46931824, step = 2600 (0.146 sec)\n", + "INFO:tensorflow:global_step/sec: 632.973\n", + "INFO:tensorflow:loss = 0.46762398, step = 2700 (0.159 sec)\n", + "INFO:tensorflow:global_step/sec: 666.066\n", + "INFO:tensorflow:loss = 0.44671822, step = 2800 (0.148 sec)\n", + "INFO:tensorflow:global_step/sec: 663.873\n", + "INFO:tensorflow:loss = 0.44892073, step = 2900 (0.151 sec)\n", + "INFO:tensorflow:global_step/sec: 639.968\n", + "INFO:tensorflow:loss = 0.44627833, step = 3000 (0.159 sec)\n", + "INFO:tensorflow:global_step/sec: 676.9\n", + "INFO:tensorflow:loss = 0.4288814, step = 3100 (0.147 sec)\n", + "INFO:tensorflow:global_step/sec: 658.251\n", + "INFO:tensorflow:loss = 0.43346274, step = 3200 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 664.476\n", + "INFO:tensorflow:loss = 0.42365474, step = 3300 (0.153 sec)\n", + "INFO:tensorflow:global_step/sec: 680.291\n", + "INFO:tensorflow:loss = 0.41678685, step = 3400 (0.145 sec)\n", + "INFO:tensorflow:global_step/sec: 682.509\n", + "INFO:tensorflow:loss = 0.41475928, step = 3500 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 564.669\n", + "INFO:tensorflow:loss = 0.40962964, step = 3600 (0.178 sec)\n", + "INFO:tensorflow:global_step/sec: 643.631\n", + "INFO:tensorflow:loss = 0.40175164, step = 3700 (0.154 sec)\n", + "INFO:tensorflow:global_step/sec: 675.225\n", + "INFO:tensorflow:loss = 0.39052343, step = 3800 (0.148 sec)\n", + "INFO:tensorflow:global_step/sec: 666.891\n", + "INFO:tensorflow:loss = 0.39769873, step = 3900 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 666.438\n", + "INFO:tensorflow:loss = 0.386145, step = 4000 (0.148 sec)\n", + "INFO:tensorflow:global_step/sec: 685.811\n", + "INFO:tensorflow:loss = 0.39405277, step = 4100 (0.146 sec)\n", + "INFO:tensorflow:global_step/sec: 682.425\n", + "INFO:tensorflow:loss = 0.37922394, step = 4200 (0.148 sec)\n", + "INFO:tensorflow:global_step/sec: 661.11\n", + "INFO:tensorflow:loss = 0.37118322, step = 4300 (0.150 sec)\n", + "INFO:tensorflow:global_step/sec: 670.221\n", + "INFO:tensorflow:loss = 0.36706787, step = 4400 (0.149 sec)\n", + "INFO:tensorflow:global_step/sec: 681.885\n", + "INFO:tensorflow:loss = 0.3653447, step = 4500 (0.146 sec)\n", + "INFO:tensorflow:global_step/sec: 682.826\n", + "INFO:tensorflow:loss = 0.3557425, step = 4600 (0.147 sec)\n", + "INFO:tensorflow:global_step/sec: 665.897\n", + "INFO:tensorflow:loss = 0.36362734, step = 4700 (0.152 sec)\n", + "INFO:tensorflow:global_step/sec: 671.864\n", + "INFO:tensorflow:loss = 0.3526679, step = 4800 (0.149 sec)\n", + "INFO:tensorflow:global_step/sec: 662.965\n", + "INFO:tensorflow:loss = 0.35308143, step = 4900 (0.151 sec)\n", + "INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5000...\n", + "INFO:tensorflow:Saving checkpoints for 5000 into /tmp/tmpqaqtrlgy/model.ckpt.\n", + "INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5000...\n", + "INFO:tensorflow:Loss for final step: 0.34763944.\n" + ], + "name": "stdout" + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 39 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "57oNBLV1j0wc", + "colab_type": "text" + }, + "source": [ + "The only thing to explain here is the **steps** argument. This simply tells the classifier to run for 5000 steps. Try modifiying this and seeing if your results change. Keep in mind that more is not always better." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5suI1lmskE7p", + "colab_type": "text" + }, + "source": [ + "###Evaluation\n", + "Now let's see how this trained model does!" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "23rIrgbxkJUO", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 381 + }, + "outputId": "24767659-81cb-436f-a3c5-3fb19fcf53b2" + }, + "source": [ + "eval_result = classifier.evaluate(\n", + " input_fn=lambda: input_fn(test, test_y, training=False))\n", + "\n", + "print('\\nTest set accuracy: {accuracy:0.3f}\\n'.format(**eval_result))" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "INFO:tensorflow:Calling model_fn.\n", + "WARNING:tensorflow:Layer dnn is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because it's dtype defaults to floatx.\n", + "\n", + "If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.\n", + "\n", + "To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.\n", + "\n", + "INFO:tensorflow:Done calling model_fn.\n", + "INFO:tensorflow:Starting evaluation at 2020-06-19T18:22:07Z\n", + "INFO:tensorflow:Graph was finalized.\n", + "INFO:tensorflow:Restoring parameters from /tmp/tmpqaqtrlgy/model.ckpt-5000\n", + "INFO:tensorflow:Running local_init_op.\n", + "INFO:tensorflow:Done running local_init_op.\n", + "INFO:tensorflow:Inference Time : 0.20221s\n", + "INFO:tensorflow:Finished evaluation at 2020-06-19-18:22:08\n", + "INFO:tensorflow:Saving dict for global step 5000: accuracy = 0.93333334, average_loss = 0.41360682, global_step = 5000, loss = 0.41360682\n", + "INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5000: /tmp/tmpqaqtrlgy/model.ckpt-5000\n", + "\n", + "Test set accuracy: 0.933\n", + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4v1ZMe7jkXdp", + "colab_type": "text" + }, + "source": [ + "Notice this time we didn't specify the number of steps. This is because during evaluation the model will only look at the testing data one time." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "464HkZ6lknua", + "colab_type": "text" + }, + "source": [ + "### Predictions\n", + "Now that we have a trained model it's time to use it to make predictions. I've written a little script below that allows you to type the features of a flower and see a prediction for its class." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "bQRLq4M1k1jm", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 344 + }, + "outputId": "f720852e-66e9-448e-bddf-ce33ce98cc61" + }, + "source": [ + "def input_fn(features, batch_size=256):\n", + " # Convert the inputs to a Dataset without labels.\n", + " return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)\n", + "\n", + "features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']\n", + "predict = {}\n", + "\n", + "print(\"Please type numeric values as prompted.\")\n", + "for feature in features:\n", + " valid = True\n", + " while valid: \n", + " val = input(feature + \": \")\n", + " if not val.isdigit(): valid = False\n", + "\n", + " predict[feature] = [float(val)]\n", + "\n", + "predictions = classifier.predict(input_fn=lambda: input_fn(predict))\n", + "for pred_dict in predictions:\n", + " class_id = pred_dict['class_ids'][0]\n", + " probability = pred_dict['probabilities'][class_id]\n", + "\n", + " print('Prediction is \"{}\" ({:.1f}%)'.format(\n", + " SPECIES[class_id], 100 * probability))\n" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Please type numeric values as prompted.\n", + "SepalLength: 23\n", + "SepalLength: 12\n", + "SepalLength: 12\n", + "SepalLength: 3\n", + "SepalLength: 4\n", + "SepalLength: 2\n", + "SepalLength: 0.5\n", + "SepalWidth: 2\n", + "SepalWidth: 0.4\n", + "PetalLength: 0.5\n", + "PetalWidth: 0.3\n", + "INFO:tensorflow:Calling model_fn.\n", + "INFO:tensorflow:Done calling model_fn.\n", + "INFO:tensorflow:Graph was finalized.\n", + "INFO:tensorflow:Restoring parameters from /tmp/tmpqaqtrlgy/model.ckpt-5000\n", + "INFO:tensorflow:Running local_init_op.\n", + "INFO:tensorflow:Done running local_init_op.\n", + "Prediction is \"Setosa\" (38.2%)\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "-tRxhpmSr1FH", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Here is some example input and expected classes you can try above\n", + "expected = ['Setosa', 'Versicolor', 'Virginica']\n", + "predict_x = {\n", + " 'SepalLength': [5.1, 5.9, 6.9],\n", + " 'SepalWidth': [3.3, 3.0, 3.1],\n", + " 'PetalLength': [1.7, 4.2, 5.4],\n", + " 'PetalWidth': [0.5, 1.5, 2.1],\n", + "}" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ujwvc6ASsHID", + "colab_type": "text" + }, + "source": [ + "And that's pretty much it for classification! " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d0dfaT4esRh3", + "colab_type": "text" + }, + "source": [ + "##Clustering\n", + "Now that we've covered regression and classification it's time to talk about clustering data! \n", + "\n", + "Clustering is a Machine Learning technique that involves the grouping of data points. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. (https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68)\n", + "\n", + "Unfortunalty there are issues with the current version of TensorFlow and the implementation for KMeans. This means we cannot use KMeans without writing the algorithm from scratch. We aren't quite at that level yet, so we'll just explain the basics of clustering for now.\n", + "\n", + "####Basic Algorithm for K-Means.\n", + "- Step 1: Randomly pick K points to place K centroids\n", + "- Step 2: Assign all the data points to the centroids by distance. The closest centroid to a point is the one it is assigned to.\n", + "- Step 3: Average all the points belonging to each centroid to find the middle of those clusters (center of mass). Place the corresponding centroids into that position.\n", + "- Step 4: Reassign every point once again to the closest centroid.\n", + "- Step 5: Repeat steps 3-4 until no point changes which centroid it belongs to.\n", + "\n", + "*Please refer to the video for an explanation of KMeans clustering.*" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sQ9iJrSbBTZB", + "colab_type": "text" + }, + "source": [ + "##Hidden Markov Models\n", + "\n", + "\"The Hidden Markov Model is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution []. Transitions among the states are governed by a set of probabilities called transition probabilities.\" (http://jedlik.phy.bme.hu/~gerjanos/HMM/node4.html)\n", + "\n", + "A hidden markov model works with probabilities to predict future events or states. In this section we will learn how to create a hidden markov model that can predict the weather.\n", + "\n", + "*This section is based on the following TensorFlow tutorial.* https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/HiddenMarkovModel" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RKJSFk4NP0eq", + "colab_type": "text" + }, + "source": [ + "###Data\n", + "Let's start by discussing the type of data we use when we work with a hidden markov model. \n", + "\n", + "In the previous sections we worked with large datasets of 100's of different entries. For a markov model we are only interested in probability distributions that have to do with states. \n", + "\n", + "We can find these probabilities from large datasets or may already have these values. We'll run through an example in a second that should clear some things up, but let's discuss the components of a markov model.\n", + "\n", + "**States:** In each markov model we have a finite set of states. These states could be something like \"warm\" and \"cold\" or \"high\" and \"low\" or even \"red\", \"green\" and \"blue\". These states are \"hidden\" within the model, which means we do not direcly observe them.\n", + "\n", + "**Observations:** Each state has a particular outcome or observation associated with it based on a probability distribution. An example of this is the following: *On a hot day Tim has a 80% chance of being happy and a 20% chance of being sad.*\n", + "\n", + "**Transitions:** Each state will have a probability defining the likelyhood of transitioning to a different state. An example is the following: *a cold day has a 30% chance of being followed by a hot day and a 70% chance of being follwed by another cold day.*\n", + "\n", + "To create a hidden markov model we need.\n", + "- States\n", + "- Observation Distribution\n", + "- Transition Distribution\n", + "\n", + "For our purpose we will assume we already have this information available as we attempt to predict the weather on a given day." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iK2QbOzr6jNJ", + "colab_type": "text" + }, + "source": [ + "###Imports and Setup" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Suf1v8kJ6niA", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 103 + }, + "outputId": "f12c7178-cc43-42cd-92c3-8084d5d61637" + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "`%tensorflow_version` only switches the major version: 1.x or 2.x.\n", + "You set: `2.x # this line is not required unless you are in a notebook`. This will be interpreted as: `2.x`.\n", + "\n", + "\n", + "TensorFlow is already loaded. Please restart the runtime to change versions.\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GN_Fkrx30xbb", + "colab_type": "text" + }, + "source": [ + "Due to a version mismatch with tensorflow v2 and tensorflow_probability we need to install the most recent version of tensorflow_probability (see below)." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "kawrMHKGBWyS", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 293 + }, + "outputId": "ad6d30ff-e53a-4776-825d-c18e2b0728cc" + }, + "source": [ + "!pip install tensorflow_probability==0.8.0rc0 --user --upgrade" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Collecting tensorflow_probability==0.8.0rc0\n", + "\u001b[?25l Downloading https://files.pythonhosted.org/packages/b2/63/f54ce32063abaa682d779e44b49eb63fcf63c2422f978842fdeda794337d/tensorflow_probability-0.8.0rc0-py2.py3-none-any.whl (2.5MB)\n", + "\u001b[K |████████████████████████████████| 2.5MB 2.7MB/s \n", + "\u001b[?25hRequirement already satisfied, skipping upgrade: decorator in /usr/local/lib/python3.6/dist-packages (from tensorflow_probability==0.8.0rc0) (4.4.2)\n", + "Collecting cloudpickle==1.1.1\n", + " Downloading https://files.pythonhosted.org/packages/24/fb/4f92f8c0f40a0d728b4f3d5ec5ff84353e705d8ff5e3e447620ea98b06bd/cloudpickle-1.1.1-py2.py3-none-any.whl\n", + "Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow_probability==0.8.0rc0) (1.12.0)\n", + "Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /usr/local/lib/python3.6/dist-packages (from tensorflow_probability==0.8.0rc0) (1.18.5)\n", + "\u001b[31mERROR: gym 0.17.2 has requirement cloudpickle<1.4.0,>=1.2.0, but you'll have cloudpickle 1.1.1 which is incompatible.\u001b[0m\n", + "Installing collected packages: cloudpickle, tensorflow-probability\n", + "Successfully installed cloudpickle-1.1.1 tensorflow-probability-0.8.0rc0\n" + ], + "name": "stdout" + }, + { + "output_type": "display_data", + "data": { + "application/vnd.colab-display-data+json": { + "pip_warning": { + "packages": [ + "cloudpickle" + ] + } + } + }, + "metadata": { + "tags": [] + } + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mEIk7FYD6lcF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import tensorflow_probability as tfp # We are using a different module from tensorflow this time\n", + "import tensorflow as tf" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ssOcn-nIOCcV", + "colab_type": "text" + }, + "source": [ + "###Weather Model\n", + "Taken direclty from the TensorFlow documentation (https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/HiddenMarkovModel). \n", + "\n", + "We will model a simple weather system and try to predict the temperature on each day given the following information.\n", + "1. Cold days are encoded by a 0 and hot days are encoded by a 1.\n", + "2. The first day in our sequence has an 80% chance of being cold.\n", + "3. A cold day has a 30% chance of being followed by a hot day.\n", + "4. A hot day has a 20% chance of being followed by a cold day.\n", + "5. On each day the temperature is\n", + " normally distributed with mean and standard deviation 0 and 5 on\n", + " a cold day and mean and standard deviation 15 and 10 on a hot day.\n", + "\n", + "If you're unfamiliar with **standard deviation** it can be put simply as the range of expected values. \n", + "\n", + "In this example, on a hot day the average temperature is 15 and ranges from 5 to 25.\n", + "\n", + "To model this in TensorFlow we will do the following.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "4LBLEJp4YlIf", + "colab_type": "code", + "colab": {} + }, + "source": [ + "tfd = tfp.distributions # making a shortcut for later on\n", + "initial_distribution = tfd.Categorical(probs=[0.2, 0.8]) # Refer to point 2 above\n", + "transition_distribution = tfd.Categorical(probs=[[0.5, 0.5],\n", + " [0.2, 0.8]]) # refer to points 3 and 4 above\n", + "observation_distribution = tfd.Normal(loc=[0., 15.], scale=[5., 10.]) # refer to point 5 above\n", + "\n", + "# the loc argument represents the mean and the scale is the standard devitation" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-XtTg0l04mqc", + "colab_type": "text" + }, + "source": [ + "We've now created distribution variables to model our system and it's time to create the hidden markov model." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "P4M6cZww4mZk", + "colab_type": "code", + "colab": {} + }, + "source": [ + "model = tfd.HiddenMarkovModel(\n", + " initial_distribution=initial_distribution,\n", + " transition_distribution=transition_distribution,\n", + " observation_distribution=observation_distribution,\n", + " num_steps=7)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DJ0XIA2M5gqD", + "colab_type": "text" + }, + "source": [ + "The number of steps represents the number of days that we would like to predict information for. In this case we've chosen 7, an entire week.\n", + "\n", + "To get the **expected temperatures** on each day we can do the following." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "plVVG4fi55Jv", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "cef2d0a3-0edf-4226-ad65-43245933109f" + }, + "source": [ + "mean = model.mean()\n", + "\n", + "# due to the way TensorFlow works on a lower level we need to evaluate part of the graph\n", + "# from within a session to see the value of this tensor\n", + "\n", + "# in the new version of tensorflow we need to use tf.compat.v1.Session() rather than just tf.Session()\n", + "with tf.compat.v1.Session() as sess: \n", + " print(mean.numpy())" + ], + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "text": [ + "[12. 11.1 10.83 10.748999 10.724699 10.71741 10.715222]\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RzzUGR12AkiF", + "colab_type": "text" + }, + "source": [ + "##Conclusion\n", + "So that's it for the core learning algorithms in TensorFlow. Hopefully you've learned about a few interesting tools that are easy to use! To practice I'd encourage you to try out some of these algorithms on different datasets." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IEeIRxlbx0wY", + "colab_type": "text" + }, + "source": [ + "##Sources\n", + "\n", + "1. Chen, James. “Line Of Best Fit.” Investopedia, Investopedia, 29 Jan. 2020, www.investopedia.com/terms/l/line-of-best-fit.asp.\n", + "2. “Tf.feature_column.categorical_column_with_vocabulary_list.” TensorFlow, www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list?version=stable.\n", + "3. “Build a Linear Model with Estimators  :   TensorFlow Core.” TensorFlow, www.tensorflow.org/tutorials/estimator/linear.\n", + "4. Staff, EasyBib. “The Free Automatic Bibliography Composer.” EasyBib, Chegg, 1 Jan. 2020, www.easybib.com/project/style/mla8?id=1582473656_5e52a1b8c84d52.80301186.\n", + "5. Seif, George. “The 5 Clustering Algorithms Data Scientists Need to Know.” Medium, Towards Data Science, 14 Sept. 2019, https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68.\n", + "6. Definition of Hidden Markov Model, http://jedlik.phy.bme.hu/~gerjanos/HMM/node4.html.\n", + "7. “Tfp.distributions.HiddenMarkovModel  :   TensorFlow Probability.” TensorFlow, www.tensorflow.org/probability/api_docs/python/tfp/distributions/HiddenMarkovModel." + ] + } + ] +} \ No newline at end of file From ffb86ef48262b79146d07896780f180ea9d5d67e Mon Sep 17 00:00:00 2001 From: Sounak Pal <44639229+SounakPal212@users.noreply.github.com> Date: Fri, 14 Aug 2020 20:39:12 +0530 Subject: [PATCH 7/7] Created using Colaboratory --- TensorFlow_Introduction.ipynb | 481 ++++++++++++++++++++++++++++++++++ 1 file changed, 481 insertions(+) create mode 100644 TensorFlow_Introduction.ipynb diff --git a/TensorFlow_Introduction.ipynb b/TensorFlow_Introduction.ipynb new file mode 100644 index 0000000..deb39b4 --- /dev/null +++ b/TensorFlow_Introduction.ipynb @@ -0,0 +1,481 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "TensorFlow-Introduction.ipynb", + "provenance": [], + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-5u3a4csUPyn", + "colab_type": "text" + }, + "source": [ + "#TensorFlow 2.0 Introduction\n", + "In this notebook you will be given an interactive introduction to TensorFlow 2.0. We will walk through the following topics within the TensorFlow module:\n", + "\n", + "- TensorFlow Install and Setup\n", + "- Representing Tensors\n", + "- Tensor Shape and Rank\n", + "- Types of Tensors\n", + "\n", + "\n", + "If you'd like to follow along without installing TensorFlow on your machine you can use **Google Collaboratory**. Collaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F7ThfbiQl96l", + "colab_type": "text" + }, + "source": [ + "##Installing TensorFlow\n", + "To install TensorFlow on your local machine you can use pip.\n", + "```console\n", + "pip install tensorflow\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JYQWyAJ2mez6", + "colab_type": "text" + }, + "source": [ + "![alt text](https://)If you have a CUDA enabled GPU you can install the GPU version of TensorFlow. You will also need to install some other software which can be found here: https://www.tensorflow.org/install/gpu \n", + "```console\n", + "pip install tensorflow-gpu\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JJjNMaSClWhg", + "colab_type": "text" + }, + "source": [ + "## Importing TensorFlow\n", + "The first step here is going to be to select the correct version of TensorFlow from within collabratory!\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vGcE8x2Gkw9K", + "colab_type": "code", + "colab": {} + }, + "source": [ + "%tensorflow_version 2.x # this line is not required unless you are in a notebook" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "4N7XbNDVY8P3", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "25d429c1-1f21-47c3-df67-74052e05f827" + }, + "source": [ + "import tensorflow as tf # now import the tensorflow module\n", + "print(tf.version) # make sure the version is 2.x" + ], + "execution_count": 2, + "outputs": [ + { + "output_type": "stream", + "text": [ + "\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "duDj86TfWFof", + "colab_type": "text" + }, + "source": [ + "##Tensors \n", + "\"A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.\" (https://www.tensorflow.org/guide/tensor)\n", + "\n", + "It should't surprise you that tensors are a fundemental apsect of TensorFlow. They are the main objects that are passed around and manipluated throughout the program. Each tensor represents a partialy defined computation that will eventually produce a value. TensorFlow programs work by building a graph of Tensor objects that details how tensors are related. Running different parts of the graph allow results to be generated.\n", + "\n", + "Each tensor has a data type and a shape. \n", + "\n", + "**Data Types Include**: float32, int32, string and others.\n", + "\n", + "**Shape**: Represents the dimension of data.\n", + "\n", + "Just like vectors and matrices tensors can have operations applied to them like addition, subtraction, dot product, cross product etc.\n", + "\n", + "In the next sections we will discuss some different properties of tensors. This is to make you more familiar with how tensorflow represnts data and how you can manipulate this data.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TAk6QhGUwQRt", + "colab_type": "text" + }, + "source": [ + "###Creating Tensors\n", + "Below is an example of how to create some different tensors.\n", + "\n", + "You simply define the value of the tensor and the datatype and you are good to go! It's worth mentioning that usually we deal with tensors of numeric data, it is quite rare to see string tensors.\n", + "\n", + "For a full list of datatypes please refer to the following guide.\n", + "\n", + "https://www.tensorflow.org/api_docs/python/tf/dtypes/DType?version=stable" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "epGskXdjZHzu", + "colab_type": "code", + "colab": {} + }, + "source": [ + "string = tf.Variable(\"this is a string\", tf.string) \n", + "number = tf.Variable(324, tf.int16)\n", + "floating = tf.Variable(3.567, tf.float64)" + ], + "execution_count": 3, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D0_H71HMaE-5", + "colab_type": "text" + }, + "source": [ + "###Rank/Degree of Tensors\n", + "Another word for rank is degree, these terms simply mean the number of dimensions involved in the tensor. What we created above is a *tensor of rank 0*, also known as a scalar. \n", + "\n", + "Now we'll create some tensors of higher degrees/ranks." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "hX_Cc5IfjQ6-", + "colab_type": "code", + "colab": {} + }, + "source": [ + "rank1_tensor = tf.Variable([\"Test\"], tf.string) \n", + "rank2_tensor = tf.Variable([[\"test\", \"ok\"], [\"test\", \"yes\"]], tf.string)" + ], + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "55zuGMc7nHjC", + "colab_type": "text" + }, + "source": [ + "**To determine the rank** of a tensor we can call the following method." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "Zrj0rAWLnMNv", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "6c51bc2d-5c13-4e4c-ed57-6e3bb755efc9" + }, + "source": [ + "tf.rank(rank2_tensor)" + ], + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 5 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hTv4Gz67pQbx", + "colab_type": "text" + }, + "source": [ + "The rank of a tensor is direclty related to the deepest level of nested lists. You can see in the first example ```[\"Test\"]``` is a rank 1 tensor as the deepest level of nesting is 1. \n", + "Where in the second example ```[[\"test\", \"ok\"], [\"test\", \"yes\"]]``` is a rank 2 tensor as the deepest level of nesting is 2." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "RaVrANK8q21q", + "colab_type": "text" + }, + "source": [ + "###Shape of Tensors\n", + "Now that we've talked about the rank of tensors it's time to talk about the shape. The shape of a tensor is simply the number of elements that exist in each dimension. TensorFlow will try to determine the shape of a tensor but sometimes it may be unknown.\n", + "\n", + "To **get the shape** of a tensor we use the shape attribute.\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "L_NRXsFOraYa", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "96bbc16c-6560-43a0-d32a-576d9ae0b824" + }, + "source": [ + "rank2_tensor.shape" + ], + "execution_count": 6, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "TensorShape([2, 2])" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 6 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wVDmLJeFs086", + "colab_type": "text" + }, + "source": [ + "###Changing Shape\n", + "The number of elements of a tensor is the product of the sizes of all its shapes. There are often many shapes that have the same number of elements, making it convient to be able to change the shape of a tensor.\n", + "\n", + "The example below shows how to change the shape of a tensor." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "dZ8Rbs2xtNqj", + "colab_type": "code", + "colab": {} + }, + "source": [ + "tensor1 = tf.ones([1,2,3]) # tf.ones() creates a shape [1,2,3] tensor full of ones\n", + "tensor2 = tf.reshape(tensor1, [2,3,1]) # reshape existing data to shape [2,3,1]\n", + "tensor3 = tf.reshape(tensor2, [3, -1]) # -1 tells the tensor to calculate the size of the dimension in that place\n", + " # this will reshape the tensor to [3,3]\n", + " \n", + "# The numer of elements in the reshaped tensor MUST match the number in the original" + ], + "execution_count": 7, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "M631k7UDv1Wh", + "colab_type": "text" + }, + "source": [ + "Now let's have a look at our different tensors." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "IFNmUxaEv6s3", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 269 + }, + "outputId": "a0adf3e4-0196-4e04-8762-be28264abdb5" + }, + "source": [ + "print(tensor1)\n", + "print(tensor2)\n", + "print(tensor3)\n", + "# Notice the changes in shape" + ], + "execution_count": 8, + "outputs": [ + { + "output_type": "stream", + "text": [ + "tf.Tensor(\n", + "[[[1. 1. 1.]\n", + " [1. 1. 1.]]], shape=(1, 2, 3), dtype=float32)\n", + "tf.Tensor(\n", + "[[[1.]\n", + " [1.]\n", + " [1.]]\n", + "\n", + " [[1.]\n", + " [1.]\n", + " [1.]]], shape=(2, 3, 1), dtype=float32)\n", + "tf.Tensor(\n", + "[[1. 1.]\n", + " [1. 1.]\n", + " [1. 1.]], shape=(3, 2), dtype=float32)\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "q88pJucBolsp", + "colab_type": "text" + }, + "source": [ + "###Slicing Tensors\n", + "You may be familiar with the term \"slice\" in python and its use on lists, tuples etc. Well the slice operator can be used on tensors to select specific axes or elements.\n", + "\n", + "When we slice or select elements from a tensor, we can use comma seperated values inside the set of square brackets. Each subsequent value refrences a different dimension of the tensor.\n", + "\n", + "Ex: ```tensor[dim1, dim2, dim3]```\n", + "\n", + "I've included a few examples that will hopefully help illustrate how we can manipulate tensors with the slice operator." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "b0YrD-hRqD-W", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Creating a 2D tensor\n", + "matrix = [[1,2,3,4,5],\n", + " [6,7,8,9,10],\n", + " [11,12,13,14,15],\n", + " [16,17,18,19,20]]\n", + "\n", + "tensor = tf.Variable(matrix, dtype=tf.int32) \n", + "print(tf.rank(tensor))\n", + "print(tensor.shape)" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Wd85uGI7qyfC", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Now lets select some different rows and columns from our tensor\n", + "\n", + "three = tensor[0,2] # selects the 3rd element from the 1st row\n", + "print(three) # -> 3\n", + "\n", + "row1 = tensor[0] # selects the first row\n", + "print(row1)\n", + "\n", + "column1 = tensor[:, 0] # selects the first column\n", + "print(column1)\n", + "\n", + "row_2_and_4 = tensor[1::2] # selects second and fourth row\n", + "print(row2and4)\n", + "\n", + "column_1_in_row_2_and_3 = tensor[1:3, 0]\n", + "print(column_1_in_row_2_and_3)\n" + ], + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UU4MMhB_rxvz", + "colab_type": "text" + }, + "source": [ + "###Types of Tensors\n", + "Before we go to far, I will mention that there are diffent types of tensors. These are the most used and we will talk more in depth about each as they are used.\n", + "- Variable\n", + "- Constant\n", + "- Placeholder\n", + "- SparseTensor\n", + "\n", + "With the execption of ```Variable``` all these tensors are immuttable, meaning their value may not change during execution.\n", + "\n", + "For now, it is enough to understand that we use the Variable tensor when we want to potentially change the value of our tensor.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "F2OoXbe7aSVl", + "colab_type": "text" + }, + "source": [ + "#Sources\n", + "Most of the information is taken direclty from the TensorFlow website which can be found below.\n", + "\n", + "https://www.tensorflow.org/guide/tensor" + ] + } + ] +} \ No newline at end of file