Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve querying #264

Merged
merged 11 commits into from
Dec 28, 2024
Merged

Improve querying #264

merged 11 commits into from
Dec 28, 2024

Conversation

olejandro
Copy link
Member

@olejandro olejandro commented Dec 26, 2024

This PR allows other fields (in addition to process and commodity) to be passed as lists to query. It also allows column-wise control of which dataframe entries should be exploded if a comma is found.

@olejandro olejandro marked this pull request as ready for review December 27, 2024 05:20
Copy link
Collaborator

@siddharth-krishna siddharth-krishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Olex, this looks good.

I'm wondering if we need to remove the old match_uc_wildcards test though? It's common practice to test a function by comparing its results to a slower-but-simpler implementation, and I think we don't have unit tests that test the new code on a series of small examples, do we? So why not leave it in.

(Ideally, eventually, we would have unit tests for more transforms using small examples.)

@siddharth-krishna
Copy link
Collaborator

Wow, this PR also improves runtime by around 50%! Do you have any idea what is responsible for the speedup? 🤩

@olejandro
Copy link
Member Author

Wow, this PR also improves runtime by around 50%! Do you have any idea what is responsible for the speedup? 🤩

Good question! TIMES-NZ runs are responsible for the reported wild improment. It's runtimes have varied a lot before. As far as I can see from the log, the time it takes to run convert_to_string transform went down significantly with the changes in this PR. Let me check what changed for it...

@olejandro
Copy link
Member Author

I'm wondering if we need to remove the old match_uc_wildcards test though? It's common practice to test a function by comparing its results to a slower-but-simpler implementation, and I think we don't have unit tests that test the new code on a series of small examples, do we? So why not leave it in.

The changes in this PR brake the test, because some inputs change. Since we have had the faster version running for a while now, without ever having any issues with it, I thought it was okay just to delete the test. I could try to fix it instead, if you think it is worth keeping it?

@olejandro
Copy link
Member Author

Wow, this PR also improves runtime by around 50%! Do you have any idea what is responsible for the speedup? 🤩

Good question! TIMES-NZ runs are responsible for the reported wild improment. It's runtimes have varied a lot before. As far as I can see from the log, the time it takes to run convert_to_string transform went down significantly with the changes in this PR. Let me check what changed for it...

I've timed it: convert_to_string take less time on the mig table now. It is probably because the table has fewer rows, since we keep previously exploded entries as lists for querying.

@siddharth-krishna
Copy link
Collaborator

Thanks for looking into it! And no, not worth spending too much time on the test. It might be easier to write unit tests once we've converted many transforms into methods of the TimesModel class, perhaps.

@olejandro olejandro merged commit 5302aa5 into main Dec 28, 2024
2 checks passed
@olejandro olejandro deleted the olex/improve-querying branch December 28, 2024 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants