Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Use strict=True argument to zip once py39 support is dropped #15835

Open
wence- opened this issue May 23, 2024 · 2 comments
Open

[ENH] Use strict=True argument to zip once py39 support is dropped #15835

wence- opened this issue May 23, 2024 · 2 comments
Labels
improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.

Comments

@wence-
Copy link
Contributor

wence- commented May 23, 2024

In many places in the cudf code we zip two (or more) iterables together with the assumption/precondition that they are all of equal length. The pattern is (approximately):

names: list[str]
columns: list[Column]
data: dict[str, Column] = dict(zip(names, columns))

This has, by design of zip, a potential problem lurking in that if the two inputs are not of equal length, we only keep the first min(len(names), len(columns)) objects.

To avoid check this at user-input boundaries we need to be careful to check (and then raise if not) that the inputs we are zipping do have equal length. This is easy to forget.

Python 3.10 introduces a new keyword-argument to zip, strict, which can be used to assert that the inputs have the same length. We should consider using this across the code-base when no longer supporting Python 3.9.

@wence- wence- added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function labels May 23, 2024
@mroeschke
Copy link
Contributor

mroeschke commented May 23, 2024

+1. Also there's a ruff rule that would allow us to enforce this https://docs.astral.sh/ruff/rules/zip-without-explicit-strict/

@vyasr
Copy link
Contributor

vyasr commented Aug 29, 2024

#16637 partially addressed this. There remains more work to do since that PR only changed this in the cudf-polars package, whereas we still need to make this change throughout the rest of the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function Python Affects Python cuDF API.
Projects
Status: Todo
Development

No branches or pull requests

3 participants