NOTE: If you are using jupyter available in the cluster you can skip this setup. It is useful for people wanting to run workshop exercises locally.
Install anaconda
Create conda environment with packages from requirements file
> conda create -y --name pyspark_env --file environment/requirements.txt
- Activate newly created conda environment
> source activate pyspark_env
- Run jupyter notebook
> jupyter notebook
- Open notebook with exercises
pySpark SQL exercises.ipynb
Mikołaj Kromka Grzegorz Gawron