You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Greetings. Please for renewed consideration, #2556 could we evaluate using Apache Arrow as the bridge between the JVM and Python process in the CPython transform?
Perhaps just border to border level changes might make it easy to take incremental steps towards Arrow. Being a pragmatist, I would try to prioritize the enhancement for the Cpython step to get rid of the "Server" process it spins up that can die / hang if any data is not escaped in the dataframe or variables passed between Hop and the Python process running outside the JVM. This is a matter of stability, not just data transport. A company that will not be named used py4j that came out before Arrow. It uses sockets and ports to facilitate transfers between JVM / Python Processes, but I feel like Arrow will be more portable and is not such an outlier project like VFS etc. I sense greater longevity and process with Apache Arrow.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Greetings. Please for renewed consideration, #2556 could we evaluate using Apache Arrow as the bridge between the JVM and Python process in the CPython transform?
More broadly there are probably connectivity efficiency gains to be had in connecting to more modern data sources that support ADBC
https://arrow.apache.org/docs/format/ADBC.html
Feb 2024 - Snowflake, BigQuery, Postgres, SQLite, and Pandas (think CPython 30x to 80x faster) transporting data using Arrow.
https://voltrondata.com/blog/go-inside-the-arrow-database-connectivity-roadmap-background-and-community?utm_source=chatgpt.com
Perhaps just border to border level changes might make it easy to take incremental steps towards Arrow. Being a pragmatist, I would try to prioritize the enhancement for the Cpython step to get rid of the "Server" process it spins up that can die / hang if any data is not escaped in the dataframe or variables passed between Hop and the Python process running outside the JVM. This is a matter of stability, not just data transport. A company that will not be named used py4j that came out before Arrow. It uses sockets and ports to facilitate transfers between JVM / Python Processes, but I feel like Arrow will be more portable and is not such an outlier project like VFS etc. I sense greater longevity and process with Apache Arrow.
Beta Was this translation helpful? Give feedback.
All reactions