Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX]OWFreeViz: Fix optimization for data with missing values #3358

Merged
merged 1 commit into from
Nov 6, 2018

Conversation

VesnaT
Copy link
Contributor

@VesnaT VesnaT commented Nov 6, 2018

Issue

The widget crashed when received dataset with missing values on the input.

Description of changes
Includes
  • Code changes
  • Tests
  • Documentation

@VesnaT VesnaT changed the title OWFreeViz: Fix optimization for data with missing values [FIX]OWFreeViz: Fix optimization for data with missing values Nov 6, 2018
@janezd janezd merged commit dec74fb into biolab:master Nov 6, 2018
np.max(np.linalg.norm(embedding[self.valid_data], axis=1)) or 1
EX = np.dot(self._X, self.projection)
EX /= np.max(np.linalg.norm(EX, axis=1)) or 1
embedding = np.zeros((len(self.data), 2), dtype=np.float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original data had missing values, and the embedding is not available for all points, I think it is better to assign missing values to those points as the embedding too (instead of zeros).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while I doubt these projections will ever be generalized to more than 2D, I would still prefer to not make implicit assumptions like that in the code and use EX.shape[1] instead of 2

@lanzagar
Copy link
Contributor

lanzagar commented Nov 6, 2018

I see that @janezd was quicker than my comments... :)

@janezd
Copy link
Contributor

janezd commented Nov 6, 2018

I apologize for being over-eager. :)

np.zeros bothered me, too, and I started commenting, but then I saw it's filtered out immediately afterwards, so I ignored it.

You're right about 2D. On the other hand, if it would say EX.shape[1], it would make may wonder "Hey, isn't this always 2?!". 2 is more explicit and informative. Besides, Freeviz won't be generalized beyond 2D because it can't be plotted then (you'd need another projection over it). And if it was generalized, a lot of other code would have to be rewritten anyway.

@lanzagar
Copy link
Contributor

lanzagar commented Nov 6, 2018

I know it is not going to be generalized. I just really dislike hardcoded constants in code (numbers, strings) when they can be avoided :)
But I guess both options have their arguments. Using shape might not make it immediately obvious what the final size will be. But you know that EX will fit into the container without dimension errors, by just looking at these 2 lines (no need to be familiar with the widget, freeviz algorithm, what self.projections is, ...)
But I wouldn't spend time over it, either way is acceptable.

However, I would change the zeros to nans, as I think that is conceptually more correct.

@janezd
Copy link
Contributor

janezd commented Nov 6, 2018

I agree about nans.

I guess @VesnaT should make a change in a new PR. I promise I won't merge it. :)

@lanzagar lanzagar added this to the 3.18 milestone Nov 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants