If you like this workshop, you'd love my Practical Hands on Data Engineering Workshop.
The workshop will be streamed on YouTube live: Advanced Data Processing in SQL YouTub Live. Post stream, it will be available to watch and follow at your own pace.
How to use nested data types in SQL, YouTube Link
- Sign up for a Github account.
- Go through the Setup process and complete the 0-basics notebook exercises.
You have two options to run the exercises in this repo
Steps:
- Create Github codespaces with this link.
- Wait for Github to install the requirements.txt. This step can take about 5minutes.
- In the terminal run
python setup.py
to create the tables and data necessary for the exercises. - Now open the
0-basics.ipynb
(or any ipynb) and it will open in a Jupyter notebook interface. You will be asked for your kernel choice, choosePython Environments
and thenpython3.10.13 Global
. - Complete the 0-basics notebook as prerequisite.
Steps:
- Clone this repo, cd into the cloned repo
- Start a virtual env and install requirements.
- In the terminal run
python setup.py
to create the tables and data necessary for the exercises. - Start Jupyter lab and run the
ipynb
notebooks. - Complete the 0-basics notebook as prerequisite.
git clone https://github.com/josephmachado/adv_data_transformation_in_sql.git
cd adv_data_transformation_in_sql
python -m venv ./env # create a virtual env
source env/bin/activate # use virtual environment
pip install -r requirements.txt
python setup.py
jupyter lab
The TPC-H data represents a car parts seller’s data warehouse, where we record orders, items that make up that order (lineitem), supplier, customer, part (parts sold), region, nation, and partsupp (parts supplier).
Note: Have a copy of the data model as you follow along; this will help in understanding the examples provided and in answering exercise questions.
I'd love to hear any feedback, please send them by clicking here.