Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas 2.0 conflicts with the SQL connectors #1085

Closed
gventuri opened this issue Apr 2, 2024 · 18 comments
Closed

Pandas 2.0 conflicts with the SQL connectors #1085

gventuri opened this issue Apr 2, 2024 · 18 comments

Comments

@gventuri
Copy link
Collaborator

gventuri commented Apr 2, 2024

System Info

OS version: any
Python version: any
pandasai: any

🐛 Describe the bug

Our current setup is facing issues when attempting to upgrade to pandas versions beyond 2.0, as this update seems to interfere with the functionality of our SQL connectors. It's essential for us to delve into the root cause of this problem to identify a viable solution. This will enable us to successfully transition to a more recent version of pandas, as well as to the latest version of modin.

@ArslanSaleem
Copy link
Collaborator

@gventuri
In pandas 2.0, the read_sql function now mandates SQLAlchemy version 2.0 or higher. However, upgrading SQLAlchemy may cause certain connectors to break, as they do not yet support SQLAlchemy 2.0 or above.

@YarShev
Copy link
Contributor

YarShev commented Apr 11, 2024

@ArslanSaleem, do you have any insight on which connectors do not support sqlalchemy 2.0 or above?

@sfc-gh-mvashishtha
Copy link

@ArslanSaleem it looks like snowflake-connector-python[pandas] does not require sqlalchemy at all, and the latest version of snowflake-sqlalchemy seems to allow 2.0.29. As @YarShev , please do let us know which snowflake connector is not working for you. Please also share python and operating system versions.

@asishm
Copy link
Contributor

asishm commented Apr 14, 2024

@sfc-gh-mvashishtha xref- snowflakedb/snowflake-sqlalchemy#380 - it looks like the latest release on Apr 11 (EDIT: That release got yanked - no reason provided) might have addressed it.

pip install snowflake-sqlalchemy==1.5.1 'sqlalchemy>2.0' fails with

ERROR: Cannot install snowflake-sqlalchemy==1.5.1 and sqlalchemy>2.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested sqlalchemy>2.0
    snowflake-sqlalchemy 1.5.1 depends on sqlalchemy<2.0.0 and >=1.4.0

not sure if this was the core issue, if not apologies!

@gventuri
Copy link
Collaborator Author

@asishm @sfc-gh-mvashishtha from what I see, it might be a problem only from pandas >= 2.2. Can we assume we can safely migrate to pandas > 2 < 2.2 for the time being?

@ArslanSaleem would that work?

@YarShev
Copy link
Contributor

YarShev commented Apr 15, 2024

@gventuri, pandas added support for SQLAlchemy 2.0 as of pandas 2.0.

@YarShev
Copy link
Contributor

YarShev commented Apr 15, 2024

@ArslanSaleem it looks like snowflake-connector-python[pandas] does not require sqlalchemy at all, and the latest version of snowflake-sqlalchemy seems to allow 2.0.29. As @YarShev , please do let us know which snowflake connector is not working for you. Please also share python and operating system versions.

@sfc-gh-mvashishtha, @ArslanSaleem, given that, can we assume there is no an issue with the latest version of snowflake-sqlalchemy?

@rafaelclp
Copy link

rafaelclp commented Apr 18, 2024

@YarShev There is. It's not possible to use pandas 2.2 and snowflake-sqlalchemy in the same project.

Someone tried to add support for 2.0 on their own almost 1 year ago, and sent this PR to the snowflake-sqlalchemy repo: snowflakedb/snowflake-sqlalchemy#414, but it was completely ignored and it's now closed.

There are many issues open in snowflake-sqlalchemy about this lack of support for >=2.0, with the main one being snowflakedb/snowflake-sqlalchemy#380

In that same issue on Mar 9, someone from the snowflake team said (regarding support for sqlalchemy>=2.0):

I can confirm the implementation is currently in progress and we plan to release it by end of Q1 (April 2024). Please note this is not a committed-to date, just a rough estimation which is subject to change.

Hopefully we'll see support for sqlalchemy 2 by the end of April, but I wouldn't bet on it.

If snowflake-sqlalchemy is the only connector blocking the upgrade, I wouldn't wait much longer, unless the snowflake team provided a committed-to-date to support the new version.

@gventuri
Copy link
Collaborator Author

@rafaelclp then what would you recommend? We could also consider installing Snowflake from that feature branch for the short term hoping they'll eventually fix it?

@gventuri
Copy link
Collaborator Author

@ArslanSaleem is only Snowflake blocking the upgrade as far as you know?

@rafaelclp
Copy link

rafaelclp commented Apr 22, 2024

@rafaelclp then what would you recommend? We could also consider installing Snowflake from that feature branch for the short term hoping they'll eventually fix it?

This is just my opinion, so hardly a recommendation. My reasoning is that people using that connector are already stuck with older pandas versions and a 1 year old sqlalchemy version, so I don't see why they couldn't afford to be stuck with older pandas-ai versions as well. That's all. Edit: if someone absolutely must use the latest version of pandas-ai, I think your own suggestion is the best option indeed, just install snowflake-sqlalchemy from the PR that adds support for sqlalchemy 2.0 while waiting for it to be merged.

In any case, I was wrong in betting against support for sqlalchemy 2 in snowflake-sqlalchemy by the end of April: snowflakedb/snowflake-sqlalchemy#469. It's almost here!

@YarShev
Copy link
Contributor

YarShev commented Jul 9, 2024

Guys, latest snowflake-sqlalchemy finally supports sqlalchemy>2 (link). Is everyone okay if I open a PR upgrading pandas?

@gventuri
Copy link
Collaborator Author

gventuri commented Jul 9, 2024

@YarShev sure, would be great. Let's try to upgrade pandas and sqlalchemy, so we can merge it soon! Thanks a lot @YarShev!!

@YarShev
Copy link
Contributor

YarShev commented Jul 9, 2024

@gventuri, it seems sqlalchemy-databricks depends on SQLAlchemy (>=1,<2). When making the following changes in pyproject.toml

diff --git a/pyproject.toml b/pyproject.toml
index c94e4c3..0c3dffc 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,18 +10,18 @@ packages = [{include = "pandasai"}]
 [tool.poetry.dependencies]
 python = ">=3.9,<3.9.7 || >3.9.7,<4.0"
 python-dotenv = "^1.0.0"
-pandas = "1.5.3"
+pandas = ">=2.0,<3.0"
 astor = "^0.8.1"
 openai = "<2"
 matplotlib = "^3.7.1"
 pydantic = ">=1,<3"
-sqlalchemy = ">=1.4,<3"
+sqlalchemy = ">=2.0,<3"
 duckdb = "<1"
 faker = "^19.12.0"
 pillow = "^10.1.0"
 requests = "^2.31.0"
 jinja2 = "^3.1.3"
-modin = {version = "0.18.1", optional = true, extras=["ray"]}
+modin = {version = ">=0.23.0", optional = true, extras=["ray"]}
 beautifulsoup4 = {version="^4.12.2", optional = true}
 google-generativeai = {version = "^0.3.2", optional = true}
 google-cloud-aiplatform = {version = "^1.26.1", optional = true}
@@ -42,7 +42,7 @@ pymysql = { version = "^1.1.0", optional = true }
 psycopg2-binary = { version = "^2.9.7", optional = true }
 yfinance = { version = "^0.2.28", optional = true }
 sqlalchemy-databricks = { version = "^0.2.0", optional = true }
-snowflake-sqlalchemy = { version = "^1.5.0", optional = true }
+snowflake-sqlalchemy = { version = "^1.6.1", optional = true }

I get the following error.

Because sqlalchemy-databricks (0.2.0) depends on SQLAlchemy (>=1,<2)
 and no versions of sqlalchemy-databricks match >0.2.0,<0.3.0, sqlalchemy-databricks (>=0.2.0,<0.3.0) requires SQLAlchemy (>=1,<2).
So, because pandasai depends on both sqlalchemy (>=2.0,<3) and sqlalchemy-databricks (^0.2.0), version solving failed.

What can we do in this case?

@gventuri
Copy link
Collaborator Author

gventuri commented Jul 9, 2024

@YarShev I guess the package is deprecated, this one (https://github.com/databricks/databricks-sql-python) is the one officially maintained and supports sqlalchemy >2.0.

We should probably figure out how hard it would be to migrate to the new one. I had a quick look at it and seems a quite straightforward migration! Is it the only blocker?

@YarShev
Copy link
Contributor

YarShev commented Jul 9, 2024

I tried to replace sqlalchemy-databricks to databricks-sql-python as follows

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,18 +10,18 @@ packages = [{include = "pandasai"}]
 [tool.poetry.dependencies]
 python = ">=3.9,<3.9.7 || >3.9.7,<4.0"
 python-dotenv = "^1.0.0"
-pandas = "1.5.3"
+pandas = ">=2.0,<3.0"
 astor = "^0.8.1"
 openai = "<2"
 matplotlib = "^3.7.1"
 pydantic = ">=1,<3"
-sqlalchemy = ">=1.4,<3"
+sqlalchemy = ">=2.0,<3"
 duckdb = "<1"
 faker = "^19.12.0"
 pillow = "^10.1.0"
 requests = "^2.31.0"
 jinja2 = "^3.1.3"
-modin = {version = "0.18.1", optional = true, extras=["ray"]}
+modin = {version = ">=0.23.0", optional = true, extras=["ray"]}
 beautifulsoup4 = {version="^4.12.2", optional = true}
 google-generativeai = {version = "^0.3.2", optional = true}
 google-cloud-aiplatform = {version = "^1.26.1", optional = true}
@@ -41,8 +41,8 @@ openpyxl = { version = "^3.0.7", optional = true }
 pymysql = { version = "^1.1.0", optional = true }
 psycopg2-binary = { version = "^2.9.7", optional = true }
 yfinance = { version = "^0.2.28", optional = true }
-sqlalchemy-databricks = { version = "^0.2.0", optional = true }
-snowflake-sqlalchemy = { version = "^1.5.0", optional = true }
+databricks-sql-python = { version = "3.2.0", optional = true }
+snowflake-sqlalchemy = { version = "^1.6.1", optional = true }
 flask = { version = "^3.0.2", optional = true }
 sqlalchemy-cockroachdb = { version = "^2.0.2", optional = true }
 sqlalchemy-bigquery = {version = "^1.8.0", optional = true, markers = "python_version >= '3.8' and python_version < '3.13'"}
@@ -71,7 +71,7 @@ sourcery = "^1.11.0"


 [tool.poetry.extras]
-connectors = [ "pymysql", "psycopg2-binary", "sqlalchemy-cockroachdb", "sqlalchemy-databricks", "sqlalchemy-bigquery", "snowflake-sqlalchemy", "cx-Oracle"]
+connectors = [ "pymysql", "psycopg2-binary", "sqlalchemy-cockroachdb", "databricks-sql-python", "sqlalchemy-bigquery", "snowflake-sqlalchemy", "cx-Oracle"]

but got this error.

Because pandasai depends on databricks-sql-python (^3.2.0) which doesn't match any versions, version solving failed.

Do you have any insights on this? databricks-sql-python 3.2.0 is available on PyPI though.

@YarShev
Copy link
Contributor

YarShev commented Jul 10, 2024

My bad :) The package on pypi is databricks-sql-connector but not databricks-sql-python. I was able to generate a new lock file but pandas is not still a latest one. There are probably some other cross-dependencies that have effect. I opened #1272, let's proceed there.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 9, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Oct 16, 2024
@YarShev
Copy link
Contributor

YarShev commented Oct 20, 2024

@gventuri, is this issue planned to be fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants