Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance of RollingWindow.GetEnumerator #8444

Conversation

starteleport
Copy link
Contributor

@starteleport starteleport commented Dec 4, 2024

Description

This optimises the RollingWindow.GetEnumerator method in terms of execution time and memory allocation.

Related Issue

Closes #8443

Motivation and Context

The RollingWindow.GetEnumerator method allocates a new list for each invocation and copies values to the list via this[int], which in turn enters/exits reader lock for each invocation. I was able to speed up this method by more than 50% in terms of execution time and achieve marginally better memory consumption by switching from List<T> to T[] and inlining the respective part of this[int].

Many built-in indicators as well as some reasonable real-world use cases for RollingWindow would benefit from this change in backtesting/optimisation.

Requires Documentation Change

No

How Has This Been Tested?

Benchmark code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using QuantConnect.Indicators;

var summary = BenchmarkRunner.Run<Bench>();

[Config(typeof(BenchmarkConfig))]
public class Bench
{
    private readonly RollingWindow<decimal> _baseline = new RollingWindow<decimal>(100);
    private readonly RollingWindowToArray<decimal> _toArray = new RollingWindowToArray<decimal>(100);
    private readonly RollingWindowToArrayInlinedIndexCalculation<decimal> _toArrayInlinedIndexer = new
        RollingWindowToArrayInlinedIndexCalculation<decimal>(100);
    private readonly RollingWindowToArrayInlinedIndexerReversedFor<decimal> _toArrayInlinedIndexerReversed = new
        RollingWindowToArrayInlinedIndexerReversedFor<decimal>(100);

    [Benchmark]
    public decimal Baseline()
    {
        _baseline.Add(1);
        if (_baseline.IsReady)
        {
            return _baseline.Sum();
        }
        return _baseline.Max();
    }

    [Benchmark]
    public decimal ToArray()
    {
        _toArray.Add(1);
        if (_toArray.IsReady)
        {
            return _toArray.Sum();
        }

        return _toArray.Max();
    }

    [Benchmark]
    public decimal ToArrayInlinedIndexer()
    {
        _toArrayInlinedIndexer.Add(1);
        if (_toArrayInlinedIndexer.IsReady)
        {
            return _toArrayInlinedIndexer.Sum();
        }

        return _toArrayInlinedIndexer.Max();
    }

    [Benchmark]
    public decimal ToArrayInlinedIndexerReversed()
    {
        _toArrayInlinedIndexerReversed.Add(1);
        if (_toArrayInlinedIndexerReversed.IsReady)
        {
            return _toArrayInlinedIndexerReversed.Sum();
        }

        return _toArrayInlinedIndexerReversed.Max();
    }
}

public class BenchmarkConfig : ManualConfig
{
    public BenchmarkConfig()
    {
        // Configure your benchmarks, see for more details: https://benchmarkdotnet.org/articles/configs/configs.html.
        AddJob(Job.Default);
        AddDiagnoser(MemoryDiagnoser.Default);
    }
}

Several options were considered:

  • Baseline: current RollingWindow implementation in master
  • ToArray: changed List<T> to T[]
  • ToArrayInlinedIndexer: manually inlined the appropriate part of the this[int] indexer
  • ToArrayInlinedIndexerReversed: changed for loop to the reversed version (iterating with counter being decremented)

Benchmarking results for RollingWindow of size 100:

|                        Method |     Mean |     Error |    StdDev |   Gen0 | Allocated |
|------------------------------ |---------:|----------:|----------:|-------:|----------:|
|                      Baseline | 5.048 us | 0.0969 us | 0.1037 us | 0.1984 |   1.66 KB |
|                       ToArray | 5.099 us | 0.1017 us | 0.1808 us | 0.1907 |   1.62 KB |
|         ToArrayInlinedIndexer | 1.682 us | 0.0227 us | 0.0223 us | 0.1965 |   1.62 KB |
| ToArrayInlinedIndexerReversed | 1.680 us | 0.0281 us | 0.0312 us | 0.1965 |   1.62 KB |

Benchmarking results for RollingWindow of size 10:

|                        Method |     Mean |   Error |  StdDev |   Gen0 | Allocated |
|------------------------------ |---------:|--------:|--------:|-------:|----------:|
|                      Baseline | 625.4 ns | 8.17 ns | 6.82 ns | 0.0315 |     264 B |
|                       ToArray | 594.6 ns | 9.92 ns | 9.75 ns | 0.0257 |     216 B |
|         ToArrayInlinedIndexer | 275.7 ns | 5.47 ns | 8.83 ns | 0.0257 |     216 B |
| ToArrayInlinedIndexerReversed | 267.8 ns | 5.05 ns | 4.73 ns | 0.0257 |     216 B |

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor (non-breaking change which improves implementation)
  • Performance (non-breaking change which improves performance. Please add associated performance test and results)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Non-functional change (xml comments/documentation/etc)

Checklist:

  • My code follows the code style of this project.
  • I have read the CONTRIBUTING document.
  • I have NOT added tests to cover my changes because it's performance-only related change.
  • All new and existing tests passed.
  • My branch follows the naming convention bug-<issue#>-<description> or feature-<issue#>-<description>

@starteleport starteleport changed the title improve the performance of RollingWindow.GetEnumerator Improve the performance of RollingWindow.GetEnumerator Dec 4, 2024
@jaredbroad
Copy link
Member

💪💪 Nice one @starteleport , drop us an email [email protected] for a gift of cloud credit.

@starteleport
Copy link
Contributor Author

Thanks @jaredbroad! I wonder whether it's worth adding a performance benchmark for this case? There are not so many benchmarks now in the code base, hence the question.

@Martin-Molinero
Copy link
Member

Martin-Molinero commented Dec 5, 2024

Nice work @starteleport! Could you please add some unit tests at RollingWindowTests for the enumeration case, there aren't many really, I see only 1 EnumeratesAsExpected, think would be nice to add cases where we add beyond the rolling size and assert it works as expected, reset/resize. Your indexing logic for list is different than the one used in [] => would be nice to create maybe some private method to get the index for i and have it in a single location, avoid the duplication in get/set/enumerate & potential bugs

I wonder whether it's worth adding a performance benchmark for this case? There are not so many benchmarks now in the code base, hence the question.

A regression algorithm? I think for this case because it's a micro optimization we would need a benchmark unit test like the ones you've shared but atm we don't have support for those, believe we would need to add a new benchmark project and CI for them, out of scope for this PR 👍

@starteleport
Copy link
Contributor Author

would be nice to create maybe some private method to get the index for i and have it in a single location, avoid the duplication in get/set/enumerate & potential bugs

@Martin-Molinero, I would be happy to do so but sadly this method wouldn't get inlined even if I use [MethodImpl(MethodImplOptions.AggressiveInlining)]. That's why I left it as-is. I will try one more time though, I have an idea.

@starteleport
Copy link
Contributor Author

@Martin-Molinero done, I reverted custom indexing logic and extended EnumeratesAsExpected test case. Think it's enough. WDYT?

Copy link
Member

@Martin-Molinero Martin-Molinero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

@starteleport
Copy link
Contributor Author

There's more to come 😄

@Martin-Molinero Martin-Molinero merged commit 22e0491 into QuantConnect:master Dec 5, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve RollingWindow.GetEnumerator performance
3 participants