Skip to content

Commit

Permalink
SNOW-1300434: Merge Snowpark pandas back to Snowpark Python (#1389)
Browse files Browse the repository at this point in the history
* SNOW-1300434: Merge Snowpark pandas back to Snowpark Python

Signed-off-by: Naren Krishna <[email protected]>

* forgot to add precommit changes

Signed-off-by: Naren Krishna <[email protected]>

* Get back to only Snowpark Python tests

Signed-off-by: Naren Krishna <[email protected]>

* fix tox env and add path to pandas changelog

Signed-off-by: Naren Krishna <[email protected]>

* remove unnecessary changes

Signed-off-by: Naren Krishna <[email protected]>

* remove checkprs

Signed-off-by: Naren Krishna <[email protected]>

* add back file to pre-commit-config

Signed-off-by: Naren Krishna <[email protected]>

* fix rebase issue

Signed-off-by: Naren Krishna <[email protected]>

* Add Snowpark pandas tests

Signed-off-by: Naren Krishna <[email protected]>

* add iris.csv to tests/resources

Signed-off-by: Naren Krishna <[email protected]>

* revert daily precommit changes

Signed-off-by: Naren Krishna <[email protected]>

* fix test_utils_suite to add iris.csv

Signed-off-by: Naren Krishna <[email protected]>

* fix Snowpark pandas test failures

Signed-off-by: Naren Krishna <[email protected]>

* fix lint

Signed-off-by: Naren Krishna <[email protected]>

* Update to pandas 2.2.1 and fix some more tests

Signed-off-by: Naren Krishna <[email protected]>

* fix more tests

Signed-off-by: Naren Krishna <[email protected]>

* update conftest

Signed-off-by: Naren Krishna <[email protected]>

* fix upload name and remove pandas 2.2.1 from requirements

Signed-off-by: Naren Krishna <[email protected]>

* fix doctest and read_json_empty error

Signed-off-by: Naren Krishna <[email protected]>

* address some comments

Signed-off-by: Naren Krishna <[email protected]>

* remove exclued files from precommit

Signed-off-by: Naren Krishna <[email protected]>

* xfail test_read_json_empty_file

Signed-off-by: Naren Krishna <[email protected]>

* restore dataframe reader

Signed-off-by: Naren Krishna <[email protected]>

* update tox.ini

Signed-off-by: Naren Krishna <[email protected]>

* fix test

Signed-off-by: Naren Krishna <[email protected]>

* snowpark_package_to_sproc_packages change

Signed-off-by: Naren Krishna <[email protected]>

* Snowpandas changes including commit 1008

Signed-off-by: Naren Krishna <[email protected]>

* update query count

Signed-off-by: Naren Krishna <[email protected]>

* Pull in Modin dependency PR

Signed-off-by: Naren Krishna <[email protected]>

* add Varnika's binary op PR

Signed-off-by: Naren Krishna <[email protected]>

---------

Signed-off-by: Naren Krishna <[email protected]>
  • Loading branch information
sfc-gh-nkrishna authored Apr 24, 2024
1 parent ac18513 commit 8832a62
Show file tree
Hide file tree
Showing 375 changed files with 154,797 additions and 3 deletions.
3 changes: 3 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
* @snowflakedb/snowpark-python-api-reviewers
/src/snowflake/snowpark/modin/ @snowflakedb/snowpandas
/tests/integ/modin/ @snowflakedb/snowpandas
/tests/unit/modin/ @snowflakedb/snowpandas
22 changes: 22 additions & 0 deletions .github/workflows/changedoc_snowpark_pandas.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Snowpark pandas Changedoc Check

on:
pull_request:
types: [opened, synchronize, labeled, unlabeled]
branches:
- pandas-main
paths:
- 'src/snowflake/snowpark/modin/**'

jobs:
check_pandas_change_doc:
runs-on: ubuntu-latest
if: ${{!contains(github.event.pull_request.labels.*.name, 'NO-PANDAS-CHANGEDOC-UPDATES')}}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Ensure Snowpark pandas docs is updated
run: git diff --name-only --diff-filter=ACMRT ${{ github.event.pull_request.base.sha }} ${{ github.sha }} | grep -q "docs/source/modin"
22 changes: 22 additions & 0 deletions .github/workflows/changelog_snowpark_pandas.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Snowpark pandas Changelog Check

on:
pull_request:
types: [opened, synchronize, labeled, unlabeled]
branches:
- pandas-main
paths:
- 'src/snowflake/snowpark/modin/**'

jobs:
check_pandas_change_log:
runs-on: ubuntu-latest
if: ${{!contains(github.event.pull_request.labels.*.name, 'NO-PANDAS-CHANGELOG-UPDATES')}}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Ensure PANDAS_CHANGELOG.md is updated
run: git diff --name-only --diff-filter=ACMRT ${{ github.event.pull_request.base.sha }} ${{ github.sha }} | grep -wq "src/snowflake/snowpark/modin/plugin/PANDAS_CHANGELOG.md"
111 changes: 110 additions & 1 deletion .github/workflows/precommit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,115 @@ jobs:
.tox/.coverage
.tox/coverage.xml
test-snowpark-pandas:
name: Test modin-${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.cloud-provider }}
needs: build
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [macos-latest, windows-latest-64-cores, ubuntu-latest-64-cores]
python-version: [ "3.9", "3.10", "3.11" ]
cloud-provider: [aws, azure, gcp]
exclude:
# only run macos with aws py3.9 for doctest
- os: macos-latest
python-version: "3.10"
- os: macos-latest
python-version: "3.11"
- os: macos-latest
python-version: "3.9"
cloud-provider: azure
- os: macos-latest
python-version: "3.9"
cloud-provider: gcp
# only run ubuntu with py3.9 on aws and py3.10 on azure
- os: ubuntu-latest-64-cores
python-version: "3.11"
- os: ubuntu-latest-64-cores
python-version: "3.9"
cloud-provider: azure
- os: ubuntu-latest-64-cores
python-version: "3.9"
cloud-provider: gcp
- os: ubuntu-latest-64-cores
python-version: "3.10"
cloud-provider: aws
- os: ubuntu-latest-64-cores
python-version: "3.10"
cloud-provider: gcp
# only run windows with py3.10 on gcp
- os: windows-latest-64-cores
python-version: "3.9"
- os: windows-latest-64-cores
python-version: "3.10"
- os: windows-latest-64-cores
python-version: "3.11"
cloud-provider: aws
- os: windows-latest-64-cores
python-version: "3.11"
cloud-provider: azure
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Display Python version
run: python -c "import sys; print(sys.version)"
- name: Decrypt parameters.py
shell: bash
run: .github/scripts/decrypt_parameters.sh
env:
PARAMETER_PASSWORD: ${{ secrets.PARAMETER_PASSWORD }}
CLOUD_PROVIDER: ${{ matrix.cloud-provider }}
- name: Download wheel(s)
uses: actions/download-artifact@v4
with:
name: wheel
path: dist
- name: Show wheels downloaded
run: ls -lh dist
shell: bash
- name: Upgrade setuptools, pip and wheel
run: python -m pip install -U setuptools pip wheel
- name: Install tox
run: python -m pip install tox
# only run doctest for macos on aws
- if: ${{ matrix.os == 'macos-latest' && matrix.cloud-provider == 'aws' }}
name: Run Snowpark pandas API doctests
run: python -m tox -e "py${PYTHON_VERSION}-doctest-snowparkpandasdoctest-modin-ci"
env:
PYTHON_VERSION: ${{ matrix.python-version }}
cloud_provider: ${{ matrix.cloud-provider }}
PYTEST_ADDOPTS: --color=yes --tb=short
TOX_PARALLEL_NO_SPINNER: 1
# Specify SNOWFLAKE_IS_PYTHON_RUNTIME_TEST: 1 when adding >= python3.11 with no server-side support
# For example, see https://github.com/snowflakedb/snowpark-python/pull/681
shell: bash
# do not run other tests for macos on aws
- if: ${{ !(matrix.os == 'macos-latest' && matrix.cloud-provider == 'aws') }}
name: Run Snowpark pandas API tests (excluding doctests)
run: python -m tox -e "py${PYTHON_VERSION/\./}-snowparkpandasnotdoctest-modin-ci"
env:
PYTHON_VERSION: ${{ matrix.python-version }}
cloud_provider: ${{ matrix.cloud-provider }}
PYTEST_ADDOPTS: --color=yes --tb=short
TOX_PARALLEL_NO_SPINNER: 1
shell: bash
- name: Combine coverages
run: python -m tox -e coverage --skip-missing-interpreters false
shell: bash
env:
SNOWFLAKE_IS_PYTHON_RUNTIME_TEST: 1
- uses: actions/upload-artifact@v4
with:
name: coverage_${{ matrix.os }}-${{ matrix.python-version }}-${{ matrix.cloud-provider }}-snowpark-pandas-testing
path: |
.tox/.coverage
.tox/coverage.xml
combine-coverage:
if: ${{ success() || failure() }}
name: Combine coverage
Expand Down Expand Up @@ -365,7 +474,7 @@ jobs:
- name: Upgrade setuptools and pip
run: python -m pip install -U setuptools pip
- name: Install Snowpark
run: python -m pip install ".[development, pandas]"
run: python -m pip install ".[modin-development, development, pandas]"
- name: Install Sphinx
run: python -m pip install sphinx
- name: Build document
Expand Down
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@
same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright (c) 2012-2023 Snowflake Computing, Inc.
Copyright (c) 2012-2024 Snowflake Computing, Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#!/usr/bin/env python3
#
# Copyright (c) 2012-2024 Snowflake Computing Inc. All rights reserved.
#
import ast


class DecoratorError(Exception):
pass


def check_standalone_function_snowpark_pandas_telemetry_decorator(
target_file: str,
telemetry_decorator_name: str,
) -> None:
"""
Check if all standalone functions in the target file have been decorated by the decorator with
name telemetry_decorator_name.
Raises a DecoratorError if the decorator is missing.
Args:
target_file (str): Path to the target file.
telemetry_decorator_name: Name of the telemetry decorator that is checked.
"""
# Get the source code of the target file
with open(target_file) as file:
source_code = file.read()
assert source_code.strip(), f"Source code in '{target_file}' is empty."
# Parse the abstract syntax tree
tree = ast.parse(source_code)

# List of str: function names that need the decorator.
failed_funcs = []

# Apply the decorator to the functions with matching return types
# Exclude sub-functions with iter_child_nodes which yields direct child nodes
for node in ast.iter_child_nodes(tree):
if (
isinstance(node, ast.FunctionDef) # Check if it is function type
and not node.name.startswith(
"_"
) # the function is not private (does not start with an underscore)
and node.name
):
has_telemetry_decorator = False
for decorator in node.decorator_list:
if (
hasattr(decorator, "id")
and decorator.id == telemetry_decorator_name
):
has_telemetry_decorator = True
break
if not has_telemetry_decorator:
failed_funcs.append(node.name)
if len(failed_funcs) > 0:
raise DecoratorError(
f"functions {failed_funcs} should be decorated with {telemetry_decorator_name}"
)


if __name__ == "__main__":
check_standalone_function_snowpark_pandas_telemetry_decorator(
target_file="src/snowflake/snowpark/modin/pandas/io.py",
telemetry_decorator_name="snowpark_pandas_telemetry_standalone_function_decorator",
)
check_standalone_function_snowpark_pandas_telemetry_decorator(
target_file="src/snowflake/snowpark/modin/pandas/general.py",
telemetry_decorator_name="snowpark_pandas_telemetry_standalone_function_decorator",
)
check_standalone_function_snowpark_pandas_telemetry_decorator(
target_file="src/snowflake/snowpark/modin/plugin/extensions/pd_extensions.py",
telemetry_decorator_name="snowpark_pandas_telemetry_standalone_function_decorator",
)
check_standalone_function_snowpark_pandas_telemetry_decorator(
target_file="src/snowflake/snowpark/modin/plugin/extensions/pd_overrides.py",
telemetry_decorator_name="snowpark_pandas_telemetry_standalone_function_decorator",
)
29 changes: 29 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
THIS_DIR = os.path.dirname(os.path.realpath(__file__))
SRC_DIR = os.path.join(THIS_DIR, "src")
SNOWPARK_SRC_DIR = os.path.join(SRC_DIR, "snowflake", "snowpark")
MODIN_DEPENDENCY_VERSION = (
"==0.28.1" # Snowpark pandas requires modin 0.28.1, which depends on pandas 2.2.1
)
CONNECTOR_DEPENDENCY_VERSION = ">=3.6.0, <4.0.0"
INSTALL_REQ_LIST = [
"setuptools>=40.6.0",
Expand Down Expand Up @@ -65,6 +68,22 @@
"snowflake.snowpark._internal",
"snowflake.snowpark._internal.analyzer",
"snowflake.snowpark.mock",
"snowflake.snowpark.modin",
"snowflake.snowpark.modin.config",
"snowflake.snowpark.modin.core.dataframe.algebra.default2pandas",
"snowflake.snowpark.modin.core.execution.dispatching",
"snowflake.snowpark.modin.core.execution.dispatching.factories",
"snowflake.snowpark.modin.pandas",
"snowflake.snowpark.modin.pandas.api.extensions",
"snowflake.snowpark.modin.plugin",
"snowflake.snowpark.modin.plugin._internal",
"snowflake.snowpark.modin.plugin.compiler",
"snowflake.snowpark.modin.plugin.docstrings",
"snowflake.snowpark.modin.plugin.default2pandas",
"snowflake.snowpark.modin.plugin.docstrings",
"snowflake.snowpark.modin.plugin.extensions",
"snowflake.snowpark.modin.plugin.io",
"snowflake.snowpark.modin.plugin.utils",
],
package_dir={
"": "src",
Expand All @@ -76,6 +95,9 @@
"pandas": [
f"snowflake-connector-python[pandas]{CONNECTOR_DEPENDENCY_VERSION}",
],
"modin": [
f"modin{MODIN_DEPENDENCY_VERSION}",
],
"secure-local-storage": [
f"snowflake-connector-python[secure-local-storage]{CONNECTOR_DEPENDENCY_VERSION}",
],
Expand All @@ -88,6 +110,13 @@
"pytest-timeout",
"pre-commit",
],
"modin-development": [
"pytest-assume", # Snowpark pandas
"decorator", # Snowpark pandas
"scipy", # Snowpark pandas 3rd party library testing
"statsmodels", # Snowpark pandas 3rd party library testing
f"modin{MODIN_DEPENDENCY_VERSION}",
],
"localtest": [
"pandas",
"pyarrow",
Expand Down
2 changes: 1 addition & 1 deletion src/snowflake/snowpark/_internal/open_telemetry.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright (c) 2012-2023 Snowflake Computing Inc. All rights reserved.
# Copyright (c) 2012-2024 Snowflake Computing Inc. All rights reserved.
#

#
Expand Down
Loading

0 comments on commit 8832a62

Please sign in to comment.