Visualizations In Daft #1169
-
What is the preferred way for visualizing data (e.g. numerical data) with something like a scatter plot from Matplotlib, when the data resides in a Daft DataFrame? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @RagingTiger! To interact with libraries such as Matplotlib, it's easiest to use Pandas as an interchange format. Because Daft uses Arrow as a backend, the conversion from a Daft dataframe into a Pandas dataframe with You can think of the workflow as:
Here is some sample code: # Do work in Daft
df = daft.read_parquet(...)
df = df.with_column(...)
df = df.agg(...)
# Execute
df.collect()
# If data is small-enough (<=hundreds of megabytes), you can
# retrieve this data into local driver memory as a Pandas dataframe for visualization
pd_df = df.to_pandas()
matplotlib.plt(..., pd_df) |
Beta Was this translation helpful? Give feedback.
-
Aweomse @jaychia that's exactly what I suspected. Thank you for making this so clear!!!!! |
Beta Was this translation helpful? Give feedback.
Hi @RagingTiger!
To interact with libraries such as Matplotlib, it's easiest to use Pandas as an interchange format. Because Daft uses Arrow as a backend, the conversion from a Daft dataframe into a Pandas dataframe with
.to_pandas()
is very cheap.You can think of the workflow as:
Here is some sample code: