This repository contains my work during the Himmah data science Bootcamp. It's presented by SDA and Coding Dojo.
This bootcamp divided into 4 main stacks:
- Week-1
- Analysis data using Excel.
- Week-2
- Understand SQL and NoSQL.
- Week-3
- Business Intelligence overview.
- Tableau.
- Week-1
- Introduction to R.
- Load data and packages.
- Conditional statements and data vectorization.
- Week-2
- Data visualization using ggplot.
- Understand Exploratory Data Analysis (EDA).
- Understand Tidy Data.
- Data types.
- Week-3
- Probability and decision analysis.
- Understand training, validation, and testing sets.
- Build a shiny app (Dashboard).
- Week-1
- Intro to anaconda and Jupyter notebook.
- Intro to Numpy and linear equations.
- Loops, conditional statements, and functions.
- Week-2
- Extract data from an API.
- Data manipulation and statistic analysis.
- Data visualization using Matplotlib and Seaborn.
- EDA with insightful visualization.
- Week-3
- Deal with missing values and implication techniques.
- Build a dash application.
- Week-1
- Intro to Machine Learning.
- Build heuristic model.
- Understanding Cost functions.
- Build a linear regression model.
- Week-2
- Understand the data pipeline.
- Understand Scikit-learn API.
- Build logistic regression.
- Apply feature engineering techniques.
- Improve model using GridSearch.
- Week-3
- Overview of ensemble modeling and understanding boosting and bagging techniques.
- Understand Principal Component Analysis (PCA)
- Understand the Decision trees and random forests.
- Clustering technique.
The aim of this project is to build customer segmentation models for a delivery app using Kmena and DBSCAN. For more details about the capstone project see project repository.