add support for spark (#968)

* add basic spark support to library * adding tests * formatting * add spark connection * add spark connection * fixed test and formating * added docs * exclude execution * documentation updates * adjust doc string for close * add generic * integrated better with existing functionality * finishing integration tests * pass config and alias correctly * fixed issue with backticks and also implemented fake cursor * change configuration name * fix env variable error integration tests CI * fixing lint errors * change log formating * metadata ipynb * addressing comments * update changelog * changelog * fix row count * spelling * spelling * remove pypark dev dependency * review comments * missed readStream in connection.py
ploomber · Dec 24, 2023 · f4088a3 · f4088a3
1 parent 4fda165
commit f4088a3
Show file tree

Hide file tree

Showing 25 changed files with 1,858 additions and 11 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,8 @@
 
 ## 0.10.7dev
 
+* [Feature] Add Spark Connection as a dialect for Jupysql ([#965](https://github.com/ploomber/jupysql/issues/965)) (by [@gilandose](https://github.com/gilandose))
+
 ## 0.10.6 (2023-12-21)
 
 * [Fix] Fix error when `%sql` includes a query with negative numbers ([#958](https://github.com/ploomber/jupysql/issues/958))

diff --git a/doc/_toc.yml b/doc/_toc.yml
@@ -43,6 +43,7 @@ parts:
       - file: integrations/duckdb-native
       - file: integrations/compatibility
       - file: integrations/chdb
+      - file: integrations/spark
 
   - caption: API Reference
     chapters:

diff --git a/doc/api/configuration.md b/doc/api/configuration.md
@@ -234,6 +234,26 @@ value enables the ones from previous values plus new ones:
 - `2`: All feedback
   - Footer to distinguish pandas/polars data frames from JupySQL's result sets
 
+## `lazy_execution`
+
+```{versionadded} 0.10.7
+This option only works when connecting to Spark
+```
+
+Default: `False`
+
+Return lazy relation to dataset rather than executing through JupySql.
+
+```{code-cell} ipython3
+%config SqlMagic.lazy_execution = True
+df = %sql SELECT * FROM languages
+```
+
+```{code-cell} ipython3
+%config SqlMagic.lazy_execution = False
+res = %sql SELECT * FROM languages
+```
+
 ## `named_parameters`
 
 ```{versionadded} 0.9

diff --git a/doc/conf.py b/doc/conf.py
@@ -27,6 +27,7 @@
     "integrations/oracle.ipynb",
     "integrations/snowflake.ipynb",
     "integrations/redshift.ipynb",
+    "integrations/spark.ipynb",
 ]
 nb_execution_in_temp = True
 nb_execution_show_tb = True

diff --git a/doc/integrations/compatibility.md b/doc/integrations/compatibility.md
@@ -114,4 +114,20 @@ These table reflects the compatibility status of JupySQL `>=0.7`
 - Listing tables with `%sqlcmd tables` ✅
 - Listing columns with `%sqlcmd columns` ✅
 - Parametrized SQL queries via `{{parameter}}` ✅
-- Interactive SQL queries via `--interact` ✅
+- Interactive SQL queries via `--interact` ✅
+
+## Spark
+
+- Running queries with `%%sql` ✅
+- CTEs with `%%sql --save NAME` ✅
+- Plotting with `%%sqlplot boxplot` ❓
+- Plotting with `%%sqlplot bar` ✅
+- Plotting with `%%sqlplot pie` ✅
+- Plotting with `%%sqlplot histogram` ✅
+- Plotting with `ggplot` ✅ 
+- Profiling tables with `%sqlcmd profile` ✅
+- Listing tables with `%sqlcmd tables` ❌
+- Listing columns with `%sqlcmd columns` ❌
+- Parametrized SQL queries via `{{parameter}}` ✅
+- Interactive SQL queries via `--interact` ✅
+- Persisting Dataframes via `--persist` ✅