Hi, thank you for taking the time to improve Snowflake's Snowpark Python or Snowpark pandas APIs!
Many questions can be answered by checking our docs or looking for already existing bug reports and enhancement requests on our issue tracker.
Please start by checking these first!
In that case we'd love to hear from you! Please open a new issue to get in touch with us.
We encourage everyone to first open a new issue to discuss any feature work or bug fixes with one of the maintainers. The following should help guide contributors through potential pitfalls.
We require our contributors to sign a CLA, available at https://github.com/snowflakedb/CLA/blob/main/README.md. A Github Actions bot will assist you when you open a pull request.
git clone <YOUR_FORKED_REPO>
cd snowpark-python
-
Create a new Python virtual environment with any Python version that we support.
-
The Snowpark Python API supports Python 3.8, Python 3.9, Python 3.10, and Python 3.11.
-
The Snowpark pandas API supports Python 3.9, Python 3.10, and Python 3.11. Additionally, Snowpark pandas requires Modin 0.28.1 and pandas 2.2.1.
conda create --name snowpark-dev python=3.9
-
-
Activate the new Python virtual environment. For example,
conda activate snowpark-dev
-
Go to the cloned repository root folder.
-
To install the Snowpark Python API in edit/development mode, use:
python -m pip install -e ".[development, pandas]"
-
To install the Snowpark pandas API in edit/development mode, use:
python -m pip install -e ".[modin-development]"
The
-e
tellspip
to install the library in edit, or development mode. -
You can use PyCharm, VS Code, or any other IDE. The following steps assume you use PyCharm, VS Code or any other similar IDE.
Download the newest community version of PyCharm and follow the installation instructions.
Download and install the latest version of VS Code
Open project and browse to the cloned git directory. Then right-click the directory src
in PyCharm
and "Mark Directory as" -> "Source Root". NOTE: VS Code doesn't have "Source Root" so you can skip this step if you use VS Code.
Configure PyCharm interpreter or Configure VS Code interpreter to use the previously created Python virtual environment.
The README under tests folder tells you how to set up to run tests.
If this happens to you do not panic! Any PRs originating from a fork will fail some automated tests. This is because forks do not have access to our repository's secrets. A maintainer will manually review your changes then kick off the rest of our testing suite. Feel free to tag @snowflakedb/snowpark-python-api or @snowflakedb/snowpark-pandas-api if you feel like we are taking too long to get to your PR.
Following tree diagram shows the high-level structure of the Snowpark pandas.
snowflake
└── snowpark
└── modin
└── pandas ← pandas API frontend layer
└── core
├── dataframe ← folder containing abstraction
│ for Modin frontend to DF-algebra
│── execution ← additional patching for I/O
└── plugin
├── _interal ← Snowflake specific internals
├── io ← Snowpark pandas IO functions
├── compiler ← query compiler, Modin -> Snowpark pandas DF
└── utils ← util classes from Modin, logging, …