You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run a UDF pipeline on a dataset using Weld (or grizzly, I suppose).
Grizzly, however, (as far as I know) does not offer an optimized function to apply for example a scalar UDF on a specific column of the dataset.
I found that one way to do it is to access the internal data using to_pandas() which has a function called “apply” and use this function to run a Python UDF on a column.
The problem is that I want to measure Weld’s performance on UDFs and by accessing the internal data and applying the functions just like a normal python program would do is not a fair way to measure Weld’s performance regarding (Python) UDF execution.
How can I apply a python UDF on a column of the dataset in an optimized way using Weld?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
I am trying to run a UDF pipeline on a dataset using Weld (or grizzly, I suppose).
Grizzly, however, (as far as I know) does not offer an optimized function to apply for example a scalar UDF on a specific column of the dataset.
I found that one way to do it is to access the internal data using to_pandas() which has a function called “apply” and use this function to run a Python UDF on a column.
The problem is that I want to measure Weld’s performance on UDFs and by accessing the internal data and applying the functions just like a normal python program would do is not a fair way to measure Weld’s performance regarding (Python) UDF execution.
How can I apply a python UDF on a column of the dataset in an optimized way using Weld?
Thanks in advance!
The text was updated successfully, but these errors were encountered: