NOTE: If you are using jupyter available in the cluster you can skip this setup. It is useful for people wanting to run workshop exercises locally.
-
Install anaconda https://conda.io/projects/conda/en/latest/user-guide/install/index.html#id2
-
Create conda environment with packages from requirements file
> conda create -y --name pyspark_env --file environment/requirements.txt
- Activate newly created conda environment
> source activate pyspark_env
- Run jupyter notebook
> jupyter notebook
- Open notebook with exercises
pySpark SQL exercises.ipynb
Mikołaj Kromka Grzegorz Gawron