-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Continuize: prevent crashing - column with equal and NaN values #2144
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2144 +/- ##
==========================================
+ Coverage 71.44% 71.65% +0.21%
==========================================
Files 268 318 +50
Lines 53093 54543 +1450
==========================================
+ Hits 37932 39085 +1153
- Misses 15161 15458 +297 Continue to review full report at Codecov.
|
Orange/widgets/data/owcontinuize.py
Outdated
@@ -344,7 +346,9 @@ def normalize_by_span(var, data_or_dist, zero_based=True): | |||
def normalize_by_sd(var, data_or_dist): | |||
dist = _ensure_dist(var, data_or_dist) | |||
mean, sd = dist.mean(), dist.standard_deviation() | |||
return normalized_var(var, mean, 1 / sd) | |||
with np.errstate(divide='ignore'): | |||
reciprocal_sd = np.divide(1, sd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to mute error instead of doing something more explicit, like sd[sd < 1e-10] = 1
(as an approximation for sd[sd==0] = 1
)?
Orange/widgets/data/owcontinuize.py
Outdated
if dist.shape[1] > 0: | ||
mean, sd = dist.mean(), dist.standard_deviation() | ||
else: | ||
mean, sd = 0, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default mean
should be 0, but the default sd
should be 1.
You can set it to 0 and the lines below will, if you follow my suggestion for the previous commit, set it to 1. But reading the code will be easier if you immediately set it to 1.
self.send_signal("Data", table) | ||
# Normalize.NormalizeBySD | ||
self.widget.continuous_treatment = 2 | ||
self.widget.unconditional_commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test where a single value is set to nan
. The expected behaviour is that this value is ignored, but the column is still normalized.
# Normalize.NormalizeBySpan | ||
self.widget.continuous_treatment = 1 | ||
self.widget.unconditional_commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here: try also with a single nan, which should do any harm.
Crashing prevented when dealing with equal values and choosing option normalize by standard deviation https://sentry.io/biolab/orange3/issues/242520198/
Crashing prevented when dealing with NaN values and choosing option normalize by standard deviation https://sentry.io/biolab/orange3/issues/227119466/
Crashing prevented when dealing with NaN values and choosing option normalize by span https://sentry.io/biolab/orange3/issues/242519777/
Issue
Crashing prevented when dealing with a column with NaN or equal values
and choosing option normalize by standard deviation or by span.
https://sentry.io/biolab/orange3/issues/242520198/
https://sentry.io/biolab/orange3/issues/227119466/
https://sentry.io/biolab/orange3/issues/242519777/
Includes