-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose PySpark's persist()
method to the Evaluation class
#292
Comments
I am certainly in favor exploring this. Not sure I totally understand where this would fit in, but lets discuss. |
Some notes on persist in pyspark: The general idea is that My thought is that this could allow us to cache/persist the results of a query (on the metric and timeseries classes), which would enable us to make different types of plots (and run any other analyses) on the cached data more efficiently (without having to re-execute the query each time). This would be an alternative to the dataframe accessor class for visualization. So instead of returning a pandas dataframe from a query we could return the class object (which we already do to support chaining) with the cached dataframe. I added this method to the
then we can do:
then we could create different types of plots with methods on the class itself (each method would collect the cached spark df as a pandas df)
The results of some limited testing are a bit confusing to me though, if I run the above query, and then convert to a pandas df twice, the second call is much much faster than the first even without persist:
which seems like something is already being cached, even though |
Wondering if we could make use of the persist or cache methods in pyspark to load the dataframe into memory, which could be attached to the Evaluation class object (similar to how the df is attached to the accessor class), and eliminate the need to re-calculate things for visualizations and method chaining.
I guess this could replace the accessor or complement it?
The text was updated successfully, but these errors were encountered: