A data project lifecycle has many phases, rather than being just an isolated analysis in a single tool. In this project you will experience doing an analysis using both Python and SQL to obtain the final result, by exploring each tool's behavior.
Pick up a dataset in our common datasets repos and break your work into big steps:
- Pick a topic and choose a dataset on that topic. Build around 10 Business questions to answer about this topic.
- Try to build the questions before knowing everything about the data
- If not possible, do step 2. first
- Data Analysis: Understand your dataset and create a report (word document) about it
- Data Exploration and Business Understanding:
- Import your dataset into SQL
- Answer your Business questions with SQL Queries
- Bonus points if you augment your data with data your obtain through WebScrapping
- Bonus points if you include visualizations from Python and/or Tableau in the final presentation
- Python Code: Provide well-documented Python code that conducts the analysis and SQL upload.
- SQL text file (.sql) well commented document with all the queries answering the Business questions
- Short Presentation: Structure the presentation in the following way:
- Intro Slides: introduce the problem and the datasets
- Data cleaning and assumptions
- Business questions and SQL query (1 slide per question with a print screen of the query and the answer is enough)
- PDF Document with notes you might want to share