Speed Improvement in binscatter Method #7

jameshgrn · 2023-03-05T03:59:12Z

Hey there! had to use this function on millions on points and decided to try and make it faster! The refactored code is expected to be faster due to several changes that eliminate unnecessary steps, reduce redundancy, and improve performance. Specifically, the changes include:

Replacing np.linspace with np.arange in _get_bins function to avoid casting to int, which can be computationally expensive for large arrays. Using np.arange also avoids the need to add 1 to n_bins, as np.arange already excludes the endpoint.
Using np.sort([x, y], axis=1) in place of separate sorts for x and y in get_binscatter_objects function. This change eliminates the need for a separate argsort step and reduces redundancy.
Using the same linear regression object for both x and y demeaning in get_binscatter_objects function reduces redundancy and avoids the need to fit two separate regression models.
Removing unnecessary checks for sortedness and using np.all instead of np.any in the check for negative differences in x eliminates unnecessary computation.
Passing numpy arrays directly to get_binscatter_objects rather than converting them via np.asarray avoids unnecessary copying and casting.

Overall, these changes make the code more efficient and faster without sacrificing functionality. The updated binscatter method passes all tests and is ready for review.

…mprove performance and readability. Replaced np.linspace with np.arange in _get_bins to avoid casting to int. Replaced np.sort with np.sort([x, y], axis=1) to sort both x and y arrays simultaneously, eliminating the need for a separate argsort step. Reduced redundancy by using the same linear regression object for both x and y demeaning. Removed unnecessary checks for sortedness and replaced np.any with np.all in the check for negative differences in x. Passed numpy arrays directly to get_binscatter_objects rather than converting them to arrays via np.asarray. Made minor documentation improvements. All tests pass.

jameshgrn added 2 commits March 4, 2023 22:51

added docstrings

6342961

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed Improvement in binscatter Method #7

Speed Improvement in binscatter Method #7

jameshgrn commented Mar 5, 2023

Speed Improvement in binscatter Method #7

Are you sure you want to change the base?

Speed Improvement in binscatter Method #7

Conversation

jameshgrn commented Mar 5, 2023