-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support dataframe protocol (tested with Vaex) #3387
support dataframe protocol (tested with Vaex) #3387
Conversation
fb3646e
to
12a5349
Compare
Cool, thanks! I'll be happy to merge this once it's a bit more ready :) |
See also #3901 for an alternative approach |
This allows plotly express to take in any dataframe that supports the dataframe protocol, see: https://data-apis.org/blog/dataframe_protocol_rfc/ https://data-apis.org/dataframe-protocol/latest/index.html Test includes an example with vaex, which should work with vaexio/vaex#1509 (not yet released)
12a5349
to
f727bfa
Compare
Since pandas 1.5.0 it has support for the protocol: https://pandas.pydata.org/docs/whatsnew/v1.5.0.html#dataframe-interchange-protocol-implementation |
Not sure if you'd like a vaex dependency for testing, but in case you're ok with it, where should that do? |
Thanks Maarten! There should be a requirements_optional.txt somewhere that I can add a vaex dependency :) Once we merge this, we can still offer #3901 as a fallback, for cases where someone has an old pandas or an old vaex without the interchange stuff but, say, a vaex that does still export itself to_pandas, right? |
Absolutely, although we have a to_pandas_df() method |
Hi @nicolaskruchten Is there an update regarding support for the dataframe exchange protocol? It would be useful for interoperability with Plotly and Modin dataframes! |
Is this still active? If so, I'd strongly suggest setting 2.0.2* as the minimum pandas version to try interchanging from, because there's some pretty basic mistakes in earlier versions: In [1]: df = pl.DataFrame({'a': [1,2,3]})
In [2]: pd.api.interchange.from_dataframe(df[1:])
Out[2]:
a
0 2
1 3
2 125822987010162 (😱 ) *not yet available, but should be out tomorrow |
Hi, 2.0.2 is out. As an option, one can change the condition so that this protocol is used for those users who have already switched to pandas 2.0.2 and not wait until the plotly's minimum supported version of pandas will be 2.0.2. from packaging import version
if hasattr(args["data_frame"], "__dataframe__") and version.parse(pd.__version__) >= version.parse("2.0.2"): |
thanks @anmyachev - yes, sorry, that's what I meant, rather than bumping the minimum pandas version for everything |
@nicolaskruchten using interchange protocol and also having a fallback as in #3901 looks the best option for now, since IIUC interchange protocol doesn't work for Series. |
@LiamConnors @nicolaskruchten do you have a plan to use interchange protocol in addition to In this case, plotly.py would also be able to accept Modin dataframes. |
This is probably a good idea still yes, if someone wants to update this PR to implement this fallback :) |
Good! In that case, I'll take care of it, if no one minds :) |
@nicolaskruchten I made a separate pull request, with the continuation of this work. #4244 |
@nicolaskruchten I guess this PR can be closed? |
Yes, thanks @anmyachev! |
This allows plotly express to take in any dataframe that supports
the dataframe protocol, see:
https://data-apis.org/blog/dataframe_protocol_rfc/
https://data-apis.org/dataframe-protocol/latest/index.html
Test includes an example with vaex, which should work with
vaexio/vaex#1509
(not yet released)
This is only a POC, I think this needs to wait till Pandas implemented the
from_dataframe
, and if you'd like to keep this test, would require a Vaex version with above mentioned PR merged and released.Usage:
Note that this does not speed up any aggregation/processing, although reading from hdf5/arrow/parquet might be faster.