Allow Spark SQL as a dialect #968

gilandose · 2023-12-18T22:03:21Z

Describe your changes

Making is possible to use basic functionality of the library with Spark, this should allow %%sql usage and returning a Spark DataFrame for further processing. %%sql this integrates with most of the libraries existing functionality and also introduces lazy_execution: This allows frameworks that support it to bypass the ResultSet in this library, and stay in native formats. This is useful when going back and forth between sql and python, and also for query validation without full execution when composing CTEs

Issue number

Related #536
Closes #965

Checklist before requesting a review

Performed a self-review of my code
Formatted my code with pkgmt format
Added tests (when necessary).
Added docstring documentation and update the changelog (when needed)

📚 Documentation preview 📚: https://jupysql--968.org.readthedocs.build/en/968/

doc/integrations/spark.ipynb

neelasha23 · 2023-12-19T05:42:39Z

I run the notebook and faced some missing dependency issues even after installing pyspark==3.4.1.

I had to install the following packages: grpcio, grpcio-status, google, protobuf @gilandose

doc/integrations/spark.ipynb

src/sql/connection/connection.py

src/tests/test_magic.py

src/tests/test_connection.py

gilandose · 2023-12-19T10:05:47Z

@neelasha23 I've added grpcio-status it provides all the other missing dependencies

neelasha23 · 2023-12-19T11:02:57Z

please add a Changelog entry and ensure that the Ci is passing @gilandose

gilandose · 2023-12-20T00:29:08Z

Integration tests added should be good to go

gilandose · 2023-12-20T10:23:23Z

yes, ready for review @neelasha23 , fixed linting and hopefully resolved CI environment variable look up issue, during integration

neelasha23 · 2023-12-21T15:44:16Z

The CI is still failing: https://github.com/ploomber/jupysql/actions/runs/7274454343/job/19847788457?pr=968
@gilandose

edublancas

great work, mostly minor stuff.

give another stab to the integration tests, if you have difficulties to get them to work, let me know so someone on the team can help you!

doc/api/configuration.md

setup.py

src/sql/error_handler.py

src/sql/magic.py

src/sql/plot.py

src/sql/run/resultset.py

src/sql/run/run.py

src/sql/run/sparkdataframe.py

gilandose · 2023-12-21T18:12:13Z

great work, mostly minor stuff.

give another stab to the integration tests, if you have difficulties to get them to work, let me know so someone on the team can help you!

struggling to get the postgresSQL tests to run locally which seems to be the ones failing will have another attempt to see
@edublancas

gilandose · 2023-12-21T20:19:20Z

postgreSQL tests passing locally, it was the print statements I'd left in error_handler! Let me know thoughts on lazy_execution?

edublancas

please check our contribution guidelines: https://ploomber-contributing.readthedocs.io/en/latest/contributing/responding-pr-review.html

pasting a link to the commit with the changes simplifies reviewing

doc/integrations/spark.ipynb

src/sql/error_handler.py

src/sql/run/resultset.py

src/sql/plot.py

edublancas · 2023-12-23T19:07:32Z

@gilandose please check the failed CI tests

gilandose · 2023-12-23T20:13:14Z

@gilandose please check the failed CI tests

Didn't realise there was a separate list in noxfile.py
Added

Should be everything addressed

gilandose · 2023-12-23T20:18:57Z

@edublancas any idea why the docker containers for integration tests don't start automatically for me when running the integration tests locally?

edublancas

just fix the connection.py file so the notebook passes and fix the CI

edublancas · 2023-12-24T00:38:59Z

@edublancas any idea why the docker containers for integration tests don't start automatically for me when running the integration tests locally?

no idea, are you seeing any issues? we've had other contributors encounter issues when running integration tests locally, we definitely need to improve the setup. I changed the repository settings so the CI runs whenever you push new code (previously, I had to approve every run). this should allow you to quickly test changes in the CI. the integration tests are passing. only the unit tests are failing. we're getting closer!

edublancas · 2023-12-24T01:10:49Z

🎉 thanks a lot for working on this, this is great!

edublancas · 2023-12-24T01:10:58Z

I'll make a release now

gilandose added 2 commits December 18, 2023 01:09

add basic spark support to library

6da5cc5

adding tests

280b646

gilandose requested a review from edublancas as a code owner December 18, 2023 22:03

gilandose added 5 commits December 18, 2023 22:06

formatting

145ae6a

add spark connection

7cd9e61

add spark connection

0f6a328

fixed test and formating

0b7e3bf

added docs

e60f0ea

gilandose requested a review from neelasha23 as a code owner December 19, 2023 00:57

exclude execution

c4acca1