Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Continuize: prevent crashing - column with equal and NaN values #2144

Merged
merged 3 commits into from
Apr 21, 2017
Merged

[FIX] Continuize: prevent crashing - column with equal and NaN values #2144

merged 3 commits into from
Apr 21, 2017

Conversation

jerneju
Copy link
Contributor

@jerneju jerneju commented Mar 28, 2017

Issue

Crashing prevented when dealing with a column with NaN or equal values
and choosing option normalize by standard deviation or by span.
https://sentry.io/biolab/orange3/issues/242520198/
https://sentry.io/biolab/orange3/issues/227119466/
https://sentry.io/biolab/orange3/issues/242519777/

Includes
  • Code changes
  • Tests
  • Documentation

@jerneju
Copy link
Contributor Author

jerneju commented Mar 28, 2017

@jerneju
Copy link
Contributor Author

jerneju commented Mar 28, 2017

@codecov-io
Copy link

codecov-io commented Mar 28, 2017

Codecov Report

Merging #2144 into master will increase coverage by 0.21%.
The diff coverage is 95.23%.

@@            Coverage Diff             @@
##           master    #2144      +/-   ##
==========================================
+ Coverage   71.44%   71.65%   +0.21%     
==========================================
  Files         268      318      +50     
  Lines       53093    54543    +1450     
==========================================
+ Hits        37932    39085    +1153     
- Misses      15161    15458     +297

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 54d8498...e8d303f. Read the comment docs.

@@ -344,7 +346,9 @@ def normalize_by_span(var, data_or_dist, zero_based=True):
def normalize_by_sd(var, data_or_dist):
dist = _ensure_dist(var, data_or_dist)
mean, sd = dist.mean(), dist.standard_deviation()
return normalized_var(var, mean, 1 / sd)
with np.errstate(divide='ignore'):
reciprocal_sd = np.divide(1, sd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to mute error instead of doing something more explicit, like sd[sd < 1e-10] = 1 (as an approximation for sd[sd==0] = 1)?

if dist.shape[1] > 0:
mean, sd = dist.mean(), dist.standard_deviation()
else:
mean, sd = 0, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default mean should be 0, but the default sd should be 1.

You can set it to 0 and the lines below will, if you follow my suggestion for the previous commit, set it to 1. But reading the code will be easier if you immediately set it to 1.

self.send_signal("Data", table)
# Normalize.NormalizeBySD
self.widget.continuous_treatment = 2
self.widget.unconditional_commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test where a single value is set to nan. The expected behaviour is that this value is ignored, but the column is still normalized.

# Normalize.NormalizeBySpan
self.widget.continuous_treatment = 1
self.widget.unconditional_commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here: try also with a single nan, which should do any harm.

Crashing prevented when dealing with equal values
and choosing option normalize by standard deviation
https://sentry.io/biolab/orange3/issues/242520198/
Crashing prevented when dealing with NaN values
and choosing option normalize by standard deviation
https://sentry.io/biolab/orange3/issues/227119466/
Crashing prevented when dealing with NaN values
and choosing option normalize by span
https://sentry.io/biolab/orange3/issues/242519777/
@astaric astaric added this to the 3.4.2 milestone Apr 7, 2017
@astaric astaric self-assigned this Apr 7, 2017
@astaric astaric modified the milestones: future, 3.4.2 Apr 19, 2017
@astaric astaric merged commit 1d37f63 into biolab:master Apr 21, 2017
@jerneju jerneju deleted the zerodivision_index-continuize branch April 24, 2017 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants