You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey there.
I'm currently experimenting with using vaex for processing large datasets in Python. I encountered an unexpected behavior when applying a custom function using vaex.apply. Specifically, while printing the result within the function yields the correct output, the returned value seems to be incorrect. Here's a simplified version of my code:
import numpy as np
import pandas as pd
import vaex
from scipy.stats import gamma
Creating a DataFrame
d = {'A':[i for i in range(1000000)]}
df = pd.DataFrame(data=d)
a, b = 0.09717545806463647, 407034.13749400195
Setting up random seed
np.random.seed(1234)
Defining a custom function
def my_func(A):
f = np.random.poisson(lam=100)
sim = np.random.uniform(low=0, high=1, size=f)
lossx1 = np.sum(gamma.ppf(sim, a, scale=b))
print(lossx1) # Printing the loss value for debugging
return np.array(lossx1)
Hey there.
I'm currently experimenting with using vaex for processing large datasets in Python. I encountered an unexpected behavior when applying a custom function using vaex.apply. Specifically, while printing the result within the function yields the correct output, the returned value seems to be incorrect. Here's a simplified version of my code:
import numpy as np
import pandas as pd
import vaex
from scipy.stats import gamma
Creating a DataFrame
d = {'A':[i for i in range(1000000)]}
df = pd.DataFrame(data=d)
a, b = 0.09717545806463647, 407034.13749400195
Setting up random seed
np.random.seed(1234)
Defining a custom function
def my_func(A):
f = np.random.poisson(lam=100)
sim = np.random.uniform(low=0, high=1, size=f)
lossx1 = np.sum(gamma.ppf(sim, a, scale=b))
print(lossx1) # Printing the loss value for debugging
return np.array(lossx1)
Converting DataFrame to vaex DataFrame
df_vaex = vaex.from_pandas(df)
Applying the function using vaex
df_result = df_vaex.apply(my_func, arguments=[df_vaex["A"]], vectorize=True, multiprocessing=False).values
Software information
import vaex; vaex.__version__)
:The text was updated successfully, but these errors were encountered: