-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add robustness to leading null
values in series in 'ewma_by_time'.
#72
Conversation
thanks for your pr I think we need to distinguish nan with null here |
…line with pl.ewma_mean. Added robustness to null values at the start of value arrays
Nice one, thanks Something to discuss is In [18]: >>> import polars as pl
...: >>> s = pl.Series([1.1, 2.5, 2.6, 2.1, None, 5.1])
...: >>> s.fill_nan(None).ewm_mean(alpha=.1, ignore_nulls=True)
Out[18]:
shape: (6,)
Series: '' [f64]
[
1.1
1.836842
2.11845
2.113085
2.113085
2.842473
]
In [19]: >>> import polars as pl
...: >>> s = pl.Series([1.1, 2.5, 2.6, 2.1, None, 5.1])
...: >>> s.fill_nan(None).ewm_mean(alpha=.1, ignore_nulls=False)
Out[19]:
shape: (6,)
Series: '' [f64]
[
1.1
1.836842
2.11845
2.113085
2.113085
2.902107
] Here, on the other hand, the weights are determined based on Maybe this is a chance to revisit the |
Yeah you're right about this inconsistency, I can see how this could be confusing... would it be better to call the kwarg |
Does there need to be another kwarg for this? It's easy enough to just do I've made an issue about this in Polars if you want to voice your opinion pola-rs/polars#15258 |
I think you're right here, making the |
That brings the question back to |
sounds good to me, thanks! |
…to check for consistent behaviour of ewma_by_time for series with leading nulls.
null
values in series in 'ewma_by_time'.
I've added the changes, and changed the heading to reflect this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! minor comment but looks good
polars_xdt/functions.py
Outdated
ignore_nulls | ||
Ignore missing values when calculating weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
src/ewma_by_time.rs
Outdated
for (idx, value) in values.iter().enumerate() { | ||
match value { | ||
Some(value) => { | ||
prev_time = times.get(idx).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this going to panic if times[idx]
is None
? do we need to zip over both values
and times
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it will; I've fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @wbeardall !
@wbeardall would you be interested in trying to upstream this to Polars itself? |
Yes, schedule allowing! What is your timeline looking like for upstreaming polars-xdt to the main codebase? |
There's no schedule, this was just created to allow to quickly make features available that it's not clear should be included in Polars itself. There's appetite for upstreaming this feature, as well as business days, but the rest isn't as clear |
Following up #71, I've added an
ignore_nulls
flag to 'ewma_by_time' with a defaultTrue
value. The main design choices are as follows:When the sequence starts with a
NaN
, all values in the EWMA should also beNaN
, up until the first non-NaN
index, which should return its own value.A string of consecutive
NaN
values should all have the same output as proceeding non-NaN
value, as no additional information has been added to the sequence.