Running Python UDFs in Weld. #523

kchasialis · 2022-05-09T15:38:49Z

I am trying to run a UDF pipeline on a dataset using Weld (or grizzly, I suppose).

Grizzly, however, (as far as I know) does not offer an optimized function to apply for example a scalar UDF on a specific column of the dataset.

I found that one way to do it is to access the internal data using to_pandas() which has a function called “apply” and use this function to run a Python UDF on a column.

The problem is that I want to measure Weld’s performance on UDFs and by accessing the internal data and applying the functions just like a normal python program would do is not a fair way to measure Weld’s performance regarding (Python) UDF execution.

How can I apply a python UDF on a column of the dataset in an optimized way using Weld?

Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Python UDFs in Weld. #523

Running Python UDFs in Weld. #523

kchasialis commented May 9, 2022

Running Python UDFs in Weld. #523

Running Python UDFs in Weld. #523

Comments

kchasialis commented May 9, 2022