Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Use multithreading for simple operations on larger dfs #19924

Open
Chuck321123 opened this issue Nov 22, 2024 · 0 comments
Open
Labels
enhancement New feature or an improvement of an existing feature

Comments

@Chuck321123
Copy link

Description

So polars seems to be equally as fast as pandas on simple operations on larger dfs. Is there nothing that can be done to make the operations faster?

import numpy as np
import polars as pl

# Define the number of rows
n_rows = 200_000_000

df = pl.DataFrame({
    "col1": np.random.rand(n_rows)
})

%timeit -r 1 -n 7 df.select(pl.col("col1")+1)

df=df.to_pandas()
%timeit -r 1 -n 7 df["col1"] + 1
504 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 7 loops each)
484 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 7 loops each)
@Chuck321123 Chuck321123 added the enhancement New feature or an improvement of an existing feature label Nov 22, 2024
@Chuck321123 Chuck321123 changed the title Using multithreading for simple operations on larger dfs Feature request: Use multithreading for simple operations on larger dfs Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant