-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas 2.0 conflicts with the SQL connectors #1085
Comments
@gventuri |
@ArslanSaleem, do you have any insight on which connectors do not support sqlalchemy 2.0 or above? |
@ArslanSaleem it looks like snowflake-connector-python[pandas] does not require sqlalchemy at all, and the latest version of snowflake-sqlalchemy seems to allow 2.0.29. As @YarShev , please do let us know which snowflake connector is not working for you. Please also share python and operating system versions. |
@sfc-gh-mvashishtha xref- snowflakedb/snowflake-sqlalchemy#380 - it looks like the latest release on Apr 11 (EDIT: That release got yanked - no reason provided) might have addressed it.
not sure if this was the core issue, if not apologies! |
@asishm @sfc-gh-mvashishtha from what I see, it might be a problem only from pandas >= 2.2. Can we assume we can safely migrate to pandas > 2 < 2.2 for the time being? @ArslanSaleem would that work? |
@gventuri, pandas added support for SQLAlchemy 2.0 as of pandas 2.0. |
@sfc-gh-mvashishtha, @ArslanSaleem, given that, can we assume there is no an issue with the latest version of snowflake-sqlalchemy? |
@YarShev There is. It's not possible to use pandas 2.2 and snowflake-sqlalchemy in the same project.
Someone tried to add support for 2.0 on their own almost 1 year ago, and sent this PR to the snowflake-sqlalchemy repo: snowflakedb/snowflake-sqlalchemy#414, but it was completely ignored and it's now closed. There are many issues open in snowflake-sqlalchemy about this lack of support for >=2.0, with the main one being snowflakedb/snowflake-sqlalchemy#380 In that same issue on Mar 9, someone from the snowflake team said (regarding support for sqlalchemy>=2.0):
Hopefully we'll see support for sqlalchemy 2 by the end of April, but I wouldn't bet on it. If |
@rafaelclp then what would you recommend? We could also consider installing Snowflake from that feature branch for the short term hoping they'll eventually fix it? |
@ArslanSaleem is only Snowflake blocking the upgrade as far as you know? |
This is just my opinion, so hardly a recommendation. My reasoning is that people using that connector are already stuck with older pandas versions and a 1 year old sqlalchemy version, so I don't see why they couldn't afford to be stuck with older pandas-ai versions as well. That's all. Edit: if someone absolutely must use the latest version of pandas-ai, I think your own suggestion is the best option indeed, just install snowflake-sqlalchemy from the PR that adds support for sqlalchemy 2.0 while waiting for it to be merged. In any case, I was wrong in betting against support for sqlalchemy 2 in snowflake-sqlalchemy by the end of April: snowflakedb/snowflake-sqlalchemy#469. It's almost here! |
Guys, latest snowflake-sqlalchemy finally supports sqlalchemy>2 (link). Is everyone okay if I open a PR upgrading pandas? |
@gventuri, it seems sqlalchemy-databricks depends on SQLAlchemy (>=1,<2). When making the following changes in pyproject.toml diff --git a/pyproject.toml b/pyproject.toml
index c94e4c3..0c3dffc 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,18 +10,18 @@ packages = [{include = "pandasai"}]
[tool.poetry.dependencies]
python = ">=3.9,<3.9.7 || >3.9.7,<4.0"
python-dotenv = "^1.0.0"
-pandas = "1.5.3"
+pandas = ">=2.0,<3.0"
astor = "^0.8.1"
openai = "<2"
matplotlib = "^3.7.1"
pydantic = ">=1,<3"
-sqlalchemy = ">=1.4,<3"
+sqlalchemy = ">=2.0,<3"
duckdb = "<1"
faker = "^19.12.0"
pillow = "^10.1.0"
requests = "^2.31.0"
jinja2 = "^3.1.3"
-modin = {version = "0.18.1", optional = true, extras=["ray"]}
+modin = {version = ">=0.23.0", optional = true, extras=["ray"]}
beautifulsoup4 = {version="^4.12.2", optional = true}
google-generativeai = {version = "^0.3.2", optional = true}
google-cloud-aiplatform = {version = "^1.26.1", optional = true}
@@ -42,7 +42,7 @@ pymysql = { version = "^1.1.0", optional = true }
psycopg2-binary = { version = "^2.9.7", optional = true }
yfinance = { version = "^0.2.28", optional = true }
sqlalchemy-databricks = { version = "^0.2.0", optional = true }
-snowflake-sqlalchemy = { version = "^1.5.0", optional = true }
+snowflake-sqlalchemy = { version = "^1.6.1", optional = true } I get the following error. Because sqlalchemy-databricks (0.2.0) depends on SQLAlchemy (>=1,<2)
and no versions of sqlalchemy-databricks match >0.2.0,<0.3.0, sqlalchemy-databricks (>=0.2.0,<0.3.0) requires SQLAlchemy (>=1,<2).
So, because pandasai depends on both sqlalchemy (>=2.0,<3) and sqlalchemy-databricks (^0.2.0), version solving failed. What can we do in this case? |
@YarShev I guess the package is deprecated, this one (https://github.com/databricks/databricks-sql-python) is the one officially maintained and supports sqlalchemy >2.0. We should probably figure out how hard it would be to migrate to the new one. I had a quick look at it and seems a quite straightforward migration! Is it the only blocker? |
I tried to replace sqlalchemy-databricks to databricks-sql-python as follows --- a/pyproject.toml
+++ b/pyproject.toml
@@ -10,18 +10,18 @@ packages = [{include = "pandasai"}]
[tool.poetry.dependencies]
python = ">=3.9,<3.9.7 || >3.9.7,<4.0"
python-dotenv = "^1.0.0"
-pandas = "1.5.3"
+pandas = ">=2.0,<3.0"
astor = "^0.8.1"
openai = "<2"
matplotlib = "^3.7.1"
pydantic = ">=1,<3"
-sqlalchemy = ">=1.4,<3"
+sqlalchemy = ">=2.0,<3"
duckdb = "<1"
faker = "^19.12.0"
pillow = "^10.1.0"
requests = "^2.31.0"
jinja2 = "^3.1.3"
-modin = {version = "0.18.1", optional = true, extras=["ray"]}
+modin = {version = ">=0.23.0", optional = true, extras=["ray"]}
beautifulsoup4 = {version="^4.12.2", optional = true}
google-generativeai = {version = "^0.3.2", optional = true}
google-cloud-aiplatform = {version = "^1.26.1", optional = true}
@@ -41,8 +41,8 @@ openpyxl = { version = "^3.0.7", optional = true }
pymysql = { version = "^1.1.0", optional = true }
psycopg2-binary = { version = "^2.9.7", optional = true }
yfinance = { version = "^0.2.28", optional = true }
-sqlalchemy-databricks = { version = "^0.2.0", optional = true }
-snowflake-sqlalchemy = { version = "^1.5.0", optional = true }
+databricks-sql-python = { version = "3.2.0", optional = true }
+snowflake-sqlalchemy = { version = "^1.6.1", optional = true }
flask = { version = "^3.0.2", optional = true }
sqlalchemy-cockroachdb = { version = "^2.0.2", optional = true }
sqlalchemy-bigquery = {version = "^1.8.0", optional = true, markers = "python_version >= '3.8' and python_version < '3.13'"}
@@ -71,7 +71,7 @@ sourcery = "^1.11.0"
[tool.poetry.extras]
-connectors = [ "pymysql", "psycopg2-binary", "sqlalchemy-cockroachdb", "sqlalchemy-databricks", "sqlalchemy-bigquery", "snowflake-sqlalchemy", "cx-Oracle"]
+connectors = [ "pymysql", "psycopg2-binary", "sqlalchemy-cockroachdb", "databricks-sql-python", "sqlalchemy-bigquery", "snowflake-sqlalchemy", "cx-Oracle"] but got this error. Because pandasai depends on databricks-sql-python (^3.2.0) which doesn't match any versions, version solving failed. Do you have any insights on this? databricks-sql-python 3.2.0 is available on PyPI though. |
My bad :) The package on pypi is databricks-sql-connector but not databricks-sql-python. I was able to generate a new lock file but pandas is not still a latest one. There are probably some other cross-dependencies that have effect. I opened #1272, let's proceed there. |
@gventuri, is this issue planned to be fixed? |
System Info
OS version: any
Python version: any
pandasai: any
🐛 Describe the bug
Our current setup is facing issues when attempting to upgrade to pandas versions beyond 2.0, as this update seems to interfere with the functionality of our SQL connectors. It's essential for us to delve into the root cause of this problem to identify a viable solution. This will enable us to successfully transition to a more recent version of pandas, as well as to the latest version of modin.
The text was updated successfully, but these errors were encountered: