[stdlib] [proposal] First class support for ODBC and (maybe) Apache Arrow #2681
Locked
martinvuyk
started this conversation in
Ideas
Replies: 1 comment
-
There's an effort to implement arrow in mojo underway over here (source: I'm the author) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
IMO an Open DataBase Connectivity driver should be part of the stdlib (for out of the box OLTP support).
The other aspect as mentioned in the title would be adding Apache Arrow support. The spec is becoming pretty much ubiquitous in OLAP. They also have an experimental DBC driver spec for analytical workloads.
Both of the mentioned connectivity items would be beneficial to Modular as a company as well, given that offering a fast native and easy way to integrate in the future with Apache Spark and client libraries for cloud providers (all serious OLAP DBMS use Arrow) would make migration easier. And also given the fact that Arrow allows for some really fast data streaming and overall over the wire transmission that is being adapted to distributed systems all the time. If the MAX platform had such an easy integration, it would also benefit from the perceived performance gains that come from using such an efficient data transfer protocol.
Another secondary benefit of implementing the FlatBuffer IDL spec that is needed for Arrow, is that JSON serialization would be way faster than most other implementations.
A third benefit is that this would be the first of many steps of basically becoming the backend for performance critical Python code. Pandas would benefit greatly if it could just interop with Mojo using Arrow APIs.
A fourth benefit is that having a clearly defined IDL spec would make creating tools for things like Python API docs, OpenAPI docs, etc. much easier.
Beta Was this translation helpful? Give feedback.
All reactions