Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Types Warm-up.ipynb #647

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 311 additions & 1 deletion module1-exploratory-data-analysis/Data Types Warm-up.ipynb
Original file line number Diff line number Diff line change
@@ -1 +1,311 @@
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Data Types Warm-up.ipynb","provenance":[],"authorship_tag":"ABX9TyNIvbKj+LoSj5Y1+CWCXEdv"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"mUdn5YhXqe3m"},"source":["## Overview\n","\n","One of the cornerstones of Exploratory Data Analysis (EDA) is being able to identify variable types such as categorical, quantitative, continuous, discrete, ordinal, nominal and identifier. We will need different statistical methods to display and describe each of these different types of data."]},{"cell_type":"markdown","metadata":{"id":"CE1OpSBpljcU"},"source":["## Follow Along\n","\n","First, data can most easily be classified as categorical or quantitative.\n","\n","- Categorical data places each observation into one and only one category: hair color, eye color, favorite flavor of ice cream, letter grade in a class, zip code\n","\n","- Quantitative data measures something: height, weight, income, number of children"]},{"cell_type":"markdown","metadata":{"id":"pLIcXUwgGZ4m"},"source":["Categorical data can further be classified as ordinal, nominal or an identifier variable.\n","- Nominal data has no natural ordering: hair color, eye color\n","- Oridnal data has a natural ordering: letter grades - A, B, C, D, F\n","- Identifier variables identify each record uniquely and are not analyzed\n","\n","Quantitative data can further be classified as discrete or continuous.\n","- Discrete data can be counted in a finite amount of time: Number of individuals riding on a bus\n","- Continuous data can be measured ever more precisely: My age is 38.334283948577 years old."]},{"cell_type":"markdown","metadata":{"id":"-cPyM8eLlIQB"},"source":["#### Let's import the Titanic.csv dataset and identify the different variable types:"]},{"cell_type":"markdown","metadata":{"id":"PYgryGq0Ye3X"},"source":["Run the code block below to import and print the top 5 observations from the Titanic dataset. We'll cover exactly how this works in today's Guided Project.\n","\n","Then take a look at the Titanic Data Dictionary linked below."]},{"cell_type":"code","metadata":{"id":"vGXfTAyJlU4J","colab":{"base_uri":"https://localhost:8080/","height":215},"executionInfo":{"status":"ok","timestamp":1617111990713,"user_tz":240,"elapsed":830,"user":{"displayName":"Chelsea Myers","photoUrl":"","userId":"05871651112741478957"}},"outputId":"84055e96-cd7c-4eeb-f9f0-b76da050daf2"},"source":["import pandas as pd\n","\n","data_url = 'https://raw.githubusercontent.com/LambdaSchool/data-science-practice-datasets/main/unit_1/Titanic/Titanic.csv'\n","\n","df = pd.read_csv(data_url, skipinitialspace=True, header=0)\n","\n","print(df.shape)\n","df.head()"],"execution_count":1,"outputs":[{"output_type":"stream","text":["(887, 8)\n"],"name":"stdout"},{"output_type":"execute_result","data":{"text/html":["<div>\n","<style scoped>\n"," .dataframe tbody tr th:only-of-type {\n"," vertical-align: middle;\n"," }\n","\n"," .dataframe tbody tr th {\n"," vertical-align: top;\n"," }\n","\n"," .dataframe thead th {\n"," text-align: right;\n"," }\n","</style>\n","<table border=\"1\" class=\"dataframe\">\n"," <thead>\n"," <tr style=\"text-align: right;\">\n"," <th></th>\n"," <th>Survived</th>\n"," <th>Pclass</th>\n"," <th>Name</th>\n"," <th>Sex</th>\n"," <th>Age</th>\n"," <th>Siblings/Spouses_Aboard</th>\n"," <th>Parents/Children_Aboard</th>\n"," <th>Fare</th>\n"," </tr>\n"," </thead>\n"," <tbody>\n"," <tr>\n"," <th>0</th>\n"," <td>0</td>\n"," <td>3</td>\n"," <td>Mr. Owen Harris Braund</td>\n"," <td>male</td>\n"," <td>22.0</td>\n"," <td>1</td>\n"," <td>0</td>\n"," <td>7.2500</td>\n"," </tr>\n"," <tr>\n"," <th>1</th>\n"," <td>1</td>\n"," <td>1</td>\n"," <td>Mrs. John Bradley (Florence Briggs Thayer) Cum...</td>\n"," <td>female</td>\n"," <td>38.0</td>\n"," <td>1</td>\n"," <td>0</td>\n"," <td>71.2833</td>\n"," </tr>\n"," <tr>\n"," <th>2</th>\n"," <td>1</td>\n"," <td>3</td>\n"," <td>Miss. Laina Heikkinen</td>\n"," <td>female</td>\n"," <td>26.0</td>\n"," <td>0</td>\n"," <td>0</td>\n"," <td>7.9250</td>\n"," </tr>\n"," <tr>\n"," <th>3</th>\n"," <td>1</td>\n"," <td>1</td>\n"," <td>Mrs. Jacques Heath (Lily May Peel) Futrelle</td>\n"," <td>female</td>\n"," <td>35.0</td>\n"," <td>1</td>\n"," <td>0</td>\n"," <td>53.1000</td>\n"," </tr>\n"," <tr>\n"," <th>4</th>\n"," <td>0</td>\n"," <td>3</td>\n"," <td>Mr. William Henry Allen</td>\n"," <td>male</td>\n"," <td>35.0</td>\n"," <td>0</td>\n"," <td>0</td>\n"," <td>8.0500</td>\n"," </tr>\n"," </tbody>\n","</table>\n","</div>"],"text/plain":[" Survived Pclass ... Parents/Children_Aboard Fare\n","0 0 3 ... 0 7.2500\n","1 1 1 ... 0 71.2833\n","2 1 3 ... 0 7.9250\n","3 1 1 ... 0 53.1000\n","4 0 3 ... 0 8.0500\n","\n","[5 rows x 8 columns]"]},"metadata":{"tags":[]},"execution_count":1}]},{"cell_type":"markdown","metadata":{"id":"Au0x261CIlWc"},"source":["[Titanic Data Dictionary](https://github.com/LambdaSchool/data-science-practice-datasets/tree/main/unit_1/Titanic)\n"]},{"cell_type":"markdown","metadata":{"id":"mNpAAsxkY5PY"},"source":["###Use the resources above to answer the following questions about the Titanic dataset:"]},{"cell_type":"markdown","metadata":{"id":"SMdasMbEJq8r"},"source":["\n","\n","1. Which variable is the identifier variable?\n","\n"]},{"cell_type":"markdown","metadata":{"id":"g_uCjN2fZLvn"},"source":["Answer: "]},{"cell_type":"markdown","metadata":{"id":"B3bA1ntVZHmj"},"source":["\n","2. Which variables are categorical? Are they ordinal or nominal?\n"]},{"cell_type":"markdown","metadata":{"id":"OxB_tfWJZOhr"},"source":["Answer: "]},{"cell_type":"markdown","metadata":{"id":"P5J6Ck6xZIL7"},"source":["\n","3. Which variables are quantiative? Are they quantitative or discrete?\n"]},{"cell_type":"markdown","metadata":{"id":"vOJHdoCUZPI2"},"source":["Answer: "]}]}
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Data Types Warm-up.ipynb",
"provenance": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/iamjasonlevin/DS-Unit-1-Sprint-1-Data-Wrangling-and-Storytelling/blob/master/module1-exploratory-data-analysis/Data%20Types%20Warm-up.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mUdn5YhXqe3m"
},
"source": [
"## Overview\n",
"\n",
"One of the cornerstones of Exploratory Data Analysis (EDA) is being able to identify variable types such as categorical, quantitative, continuous, discrete, ordinal, nominal and identifier. We will need different statistical methods to display and describe each of these different types of data."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CE1OpSBpljcU"
},
"source": [
"## Follow Along\n",
"\n",
"First, data can most easily be classified as categorical or quantitative.\n",
"\n",
"- Categorical data places each observation into one and only one category: hair color, eye color, favorite flavor of ice cream, letter grade in a class, zip code\n",
"\n",
"- Quantitative data measures something: height, weight, income, number of children"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pLIcXUwgGZ4m"
},
"source": [
"Categorical data can further be classified as ordinal, nominal or an identifier variable.\n",
"- Nominal data has no natural ordering: hair color, eye color\n",
"- Oridnal data has a natural ordering: letter grades - A, B, C, D, F\n",
"- Identifier variables identify each record uniquely and are not analyzed\n",
"\n",
"Quantitative data can further be classified as discrete or continuous.\n",
"- Discrete data can be counted in a finite amount of time: Number of individuals riding on a bus\n",
"- Continuous data can be measured ever more precisely: My age is 38.334283948577 years old."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-cPyM8eLlIQB"
},
"source": [
"#### Let's import the Titanic.csv dataset and identify the different variable types:"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PYgryGq0Ye3X"
},
"source": [
"Run the code block below to import and print the top 5 observations from the Titanic dataset. We'll cover exactly how this works in today's Guided Project.\n",
"\n",
"Then take a look at the Titanic Data Dictionary linked below."
]
},
{
"cell_type": "code",
"metadata": {
"id": "vGXfTAyJlU4J",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 221
},
"outputId": "181e25a2-d89b-418d-ca1d-bd9d2fc6645c"
},
"source": [
"import pandas as pd\n",
"\n",
"data_url = 'https://raw.githubusercontent.com/LambdaSchool/data-science-practice-datasets/main/unit_1/Titanic/Titanic.csv'\n",
"\n",
"df = pd.read_csv(data_url, skipinitialspace=True, header=0)\n",
"\n",
"print(df.shape)\n",
"df.head()"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"(887, 8)\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Survived</th>\n",
" <th>Pclass</th>\n",
" <th>Name</th>\n",
" <th>Sex</th>\n",
" <th>Age</th>\n",
" <th>Siblings/Spouses_Aboard</th>\n",
" <th>Parents/Children_Aboard</th>\n",
" <th>Fare</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Mr. Owen Harris Braund</td>\n",
" <td>male</td>\n",
" <td>22.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>7.2500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Mrs. John Bradley (Florence Briggs Thayer) Cum...</td>\n",
" <td>female</td>\n",
" <td>38.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>71.2833</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>Miss. Laina Heikkinen</td>\n",
" <td>female</td>\n",
" <td>26.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>7.9250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Mrs. Jacques Heath (Lily May Peel) Futrelle</td>\n",
" <td>female</td>\n",
" <td>35.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>53.1000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>Mr. William Henry Allen</td>\n",
" <td>male</td>\n",
" <td>35.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>8.0500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Survived Pclass ... Parents/Children_Aboard Fare\n",
"0 0 3 ... 0 7.2500\n",
"1 1 1 ... 0 71.2833\n",
"2 1 3 ... 0 7.9250\n",
"3 1 1 ... 0 53.1000\n",
"4 0 3 ... 0 8.0500\n",
"\n",
"[5 rows x 8 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 1
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Au0x261CIlWc"
},
"source": [
"[Titanic Data Dictionary](https://github.com/LambdaSchool/data-science-practice-datasets/tree/main/unit_1/Titanic)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mNpAAsxkY5PY"
},
"source": [
"###Use the resources above to answer the following questions about the Titanic dataset:"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SMdasMbEJq8r"
},
"source": [
"\n",
"\n",
"1. Which variable is the identifier variable?"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "g_uCjN2fZLvn"
},
"source": [
"Answer: Name"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "B3bA1ntVZHmj"
},
"source": [
"\n",
"2. Which variables are categorical? Are they ordinal or nominal?\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OxB_tfWJZOhr"
},
"source": [
"Answer: Survived, Pclass, Sex. Survived and Sex are nominal. Pclass is ordinal."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "P5J6Ck6xZIL7"
},
"source": [
"\n",
"3. Which variables are quantiative? Are they quantitative or discrete?\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vOJHdoCUZPI2"
},
"source": [
"Answer: Age, Siblings/Spouses_Aboard, Parents/Children_Aboard, and Fare are all quantitative. Age, Siblings/Spouses_Aboard, Parents/Children_Aboard, are discrete. Fare is continuous."
]
}
]
}