Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Server side snowpark #1538

Closed
wants to merge 78 commits into from

Conversation

sfc-gh-lspiegelberg
Copy link
Contributor

DO NOT MERGE.

this branch helps us compare snowpark server side changes vs. current main/HEAD

sfc-gh-azwiegincew and others added 4 commits May 3, 2024 15:17
…: table(), filter() (#1468)

A very basic initial attempt at serializing the AST.

I'm trying to maintain a parallel codebase for phases 0 and 1 for now,
since it would be a shame to do this work twice. Once we complete and
ship phase 0, we'll be able to drastically simplify the phase 1 client.

Unlike what I mentioned before, this implementation doesn't flush
dependencies of eagerly evaluated expressions. Instead, any client-side
value is appended to the pending batch. This is simpler to implement and
will likely work well, although we may need to do some dependency
analysis on the server to ensure we don't issue unnecessary queries.
Updates our server branch with recent snowpark changes.
sfc-gh-oplaton and others added 16 commits May 14, 2024 08:38
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-0

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [x] I am adding a new dependency

3. Please describe how your code solves the related issue.

Update `ast_pb2.py` (already present in the repository).
Add the `setuptools` dependencies required for development.
Include the module path for `ast_pb2.py` in the manifest, so that the
file makes it into the Snowpark wheel.
Run `update-from-devvm.sh` from within `src/snowflake/snowpark/_internal/proto/` with a running local devvm to update the proto file on the thin client.
@github-actions github-actions bot added the local testing Local Testing issues/PRs label Jun 5, 2024
sfc-gh-azwiegincew and others added 8 commits June 10, 2024 11:33
….py (#1766)

Modifies `setup.py` to use the latest HEAD of https://github.com/snowflakedb/snowflake-connector-python/tree/server-side-snowpark which includes connector changes (most notable adding the `_dataframe_ast` field for phase 0).

To update your local dev environment run
```
pip uninstall snowflake-connector-python -y
python -m pip install --no-cache -e ".[development,pandas]"
```
Running the pip command should show `git clone` in the logs.
…frameAST field. (#1794)

Vendors snowflake vcrpy from https://github.com/Snowflake-Labs/snowflake-vcrpy (could not get install working, therefore vendoring it) with custom Snowflake changes to track requests in vendored urllib3 within the snowflake python connector.

Adds decorator `check_ast_encode_invoked` (applied with `autouse=True` to all tests) which checks that every query send contains `dataframeAst` property for phase 0, and errors out together with traceback information whenever tests need to be fixed / APIs are missing that need to be encoded within the AST.
… with Python 3.8 (#1796)

Remove temporarily Modin tests as Modin is incompatible with Python 3.8.
sfc-gh-oplaton and others added 26 commits July 24, 2024 10:03
… to pandas.DataFrame (#1973)

Support all data/schema cases for `session.create_dataframe`, except for the data being a pandas.DataFrame.
…1994)

Adds support for `pandas.Dataframe` in `session.create_dataframe`. Effectively encodes in the IR information about the temporary table which stores the data of the `pandas.DataFrame` server-side.
Support `DataFrame.agg` in Snowpark IR.
…2009)

Support `DataFrame.{collect,collect_nowait,count}` for Snowpark IR. Furthermore, modifies test infrastructure to be able to pass multiple ASTs to the unparser in order so multiple evals can be unparsed. Adds new AstListener class for both server connection and mock server connection to capture ASTs easily. Modified steel-thread to work with AstListener to print out AST for interested audience.
Support `DataFrame.describe` for Snowpark IR. Adds preliminary `stddev` implementation to local testing API given this feature is missing at the moment.
Client-side support for DataFrame write APIs (as Eval), `DataFrame.write.{copy_into_location, csv, json, parquet, save_as_table}`.
Support `session.write_pandas` for pandas Dataframes. Snowpark pandas Dataframes are out of scope for phase 0 and subsequently error. Fix local testing rest mock object by returning mock request with error message whenever called that will get propagated to user. This helps to more easily trace missing features within Mock/local testing feature.
…unctions. (#2096)

Adds AST generation for `DataFrame.stat.{approx_quantile,corr,cov,crosstab,sample_by}`.

Other changes:
Existing functions `DataFrame.{col,sample,union_all,pivot}` get `_emit_ast` as new parameter to allow AST generation to be disabled on demand. The Snowpark `Column` class allows to disable AST generation with added `_emit_ast` now too. Adding In local testing mock functions `approx_percentile_accumulate,approx_percentile_estimate,covar_samp,corr_samp` , returning dummy values until properly implemented, to allow the mock server connection to successfully run the stats test.
…r better performance (#2074)

[SNOW-1491175](https://snowflakecomputing.atlassian.net/browse/SNOW-1491175)

Remove all uses of `set_src_position` and `get_first_non_snowpark_stack_frame`, update the necessary test cases, and use the `inspect` module with better practices including
- Deleting the retrieved frame to prevent reference cycles and memory leaks
- Avoid capturing code context for every file along the stack
- Skip two frames out of the Snowpark library code to retrieve relevant user code immediately when possible
- Add comments discussing functionality in detail for future improvements

[SNOW-1491175]: https://snowflakecomputing.atlassian.net/browse/SNOW-1491175?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Co-authored-by: Leonhard Spiegelberg <[email protected]>
…elease (#2122)

Updates branch with recent release to minimize merge conflicts. Snowpark IR supports new arguments, i.e. Table got a new (optional) boolean parameter `is_temp_table_for_cleanup` and `regexp` got a new optional parameter `parameters`.

Other:
- Fixes CI for linting, deactivating Modin for Python3.8, vcrpy deps.
- Removes steel-thread.py to avoid merge conflict.
- Fixes to_date tests to be closer to original calls.
- Changes SQLCounter interface to match desired QueryListener interface.
Refresh using git commits since 1.21 release, reduces further merge-conflicts with main.
@sfc-gh-lspiegelberg
Copy link
Contributor Author

sfc-gh-lspiegelberg commented Aug 21, 2024

due to force pushes the git history here is compromised, closing this branch for now.

Use instead https://github.com/snowflakedb/snowpark-python/tree/ls-SNOW-1491199-merge-phase0-server-side and DO NOT FORCE PUSH.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 21, 2024
@sfc-gh-azwiegincew sfc-gh-azwiegincew deleted the server-side-snowpark branch December 5, 2024 18:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DO-NOT-MERGE local testing Local Testing issues/PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants