For people who struggle to start in data analysis with Python
This hands-on in-person workshop is based on Data Analysis with Python Course by IBM Cognitive Class
Learn how to prepare data for analysis, perform simple statistical analyses, create meaningful data visualizations, predict future trends from data using Jupyter-based environment.
The workshop will cover core topics:
Problem | Attributes | Types |
---|---|---|
- Understanding the Domain
- Understanding the Dataset
- Python package for data science
- Importing and Exporting Data in Python
- Basic Insights from Datasets
Distribution | Bins | Histogram |
---|---|---|
- Identify and Handle Missing Values
- Data Formatting
- Data Normalization Sets
- Binning
- Indicator variables
Heatmap | Scatterplot | Boxplot |
---|---|---|
- Descriptive Statistics
- Basic of Grouping
- ANOVA
- Correlation
3rd Polynomial | Actual/Fitted | 11th Polynomial |
---|---|---|
- Simple and Multiple Linear Regression
- Model Evaluation Using Visualization
- Polynomial Regression and Pipelines
- R-squared and MSE for In-Sample Evaluation
- Prediction and Decision Making
5th Polynomial | R^2 | 4 Features |
---|---|---|
- Model Evaluation
- Over-fitting, Under-fitting and Model Selection
- Ridge Regression
- Grid Search
You will need a laptop that can access the internet
Install miniconda or install the (larger) Anaconda distribution
Install Python using Miniconda
OR Install Python using Ananconda
Clone the repository
git clone [email protected]:aymanibrahim/dapy.git
OR Download the repository as a .zip file
Change current directory to dapy directory
cd dapy
Install Python with the required packages into an environment named dapy as per environment.yml YAML file.
conda env create -f environment.yml
When conda asks if you want to proceed, type "y" and press Enter.
Change the current default environment (base) into dapy environment.
conda activate dapy
Enable ipywidgets Jupyter Notebook extension
jupyter contrib nbextension install --user
jupyter nbextension enable --py widgetsnbextension
jupyter nbextension enable python-markdown/main
# Notebooks w/ extensions that auto-run code must be "trusted" to work the first time
jupyter trust ./notebooks/05_Model_Evaluation.ipynb
Install ipywidgets JupyterLab extension
jupyter labextension install @jupyter-widgets/jupyterlab-manager
Enable widgetsnbextension
jupyter nbextension enable --py widgetsnbextension --sys-prefix
Use check_environment.py script to make sure everything was installed correctly, open a terminal, and change its directory (cd) so that your working directory is the workshop directory dapy you cloned or downloaded. Then enter the following:
python check_environment.py
If everything is OK, you will get the following message:
Your workshop environment is set up
Start JupyterLab using:
jupyter lab
JupyterLab will open automatically in your browser.
You may access JupyterLab by entering the notebook server’s URL into the browser.
Press CTRL + C in the terminal to stop JupyterLab.
Change the current environment (dapy) into the previous environment.
conda deactivate
- Python: Programming language
- Conda: Package and environment manager
- Anaconda: Python distribution
- Miniconda: Minimal installer for conda
- NumPy: Fundamental package for scientific computing with Python
- Matplotlib: Python 2D plotting library
- seaborn: Statistical Data Visualization
- pandas: Python data analysis library
- scikit-learn: Machine Learning in Python
- Jupyter Notebook: Web application to create documents with code, equations, visualizations and text
- JupyterLab: Web-based development environment for Jupyter Notebooks
- Python for Data Science: Course by IBM Cognitive Class
- Data Analysis with Python: Course by IBM Cognitive Class
Thanks for your interest in contributing! There are many ways to contribute to this project. Get started here.
Data Analysis with Python Workshop by Ayman Ibrahim is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at IBM Cognitive Class Data Analysis with Python by Joseph Santarcangelo, PhD. and Mahdi Noorian, PhD.