[FIX]OWFreeViz: Fix optimization for data with missing values #3358

VesnaT · 2018-11-06T09:39:29Z

Issue

The widget crashed when received dataset with missing values on the input.

Description of changes

Includes

Code changes
Tests
Documentation

lanzagar · 2018-11-06T11:03:12Z

Orange/widgets/visualize/owfreeviz.py

-            np.max(np.linalg.norm(embedding[self.valid_data], axis=1)) or 1
+        EX = np.dot(self._X, self.projection)
+        EX /= np.max(np.linalg.norm(EX, axis=1)) or 1
+        embedding = np.zeros((len(self.data), 2), dtype=np.float)


If the original data had missing values, and the embedding is not available for all points, I think it is better to assign missing values to those points as the embedding too (instead of zeros).

Also, while I doubt these projections will ever be generalized to more than 2D, I would still prefer to not make implicit assumptions like that in the code and use EX.shape[1] instead of 2

lanzagar · 2018-11-06T11:09:04Z

I see that @janezd was quicker than my comments... :)

janezd · 2018-11-06T11:16:03Z

I apologize for being over-eager. :)

np.zeros bothered me, too, and I started commenting, but then I saw it's filtered out immediately afterwards, so I ignored it.

You're right about 2D. On the other hand, if it would say EX.shape[1], it would make may wonder "Hey, isn't this always 2?!". 2 is more explicit and informative. Besides, Freeviz won't be generalized beyond 2D because it can't be plotted then (you'd need another projection over it). And if it was generalized, a lot of other code would have to be rewritten anyway.

lanzagar · 2018-11-06T11:45:29Z

I know it is not going to be generalized. I just really dislike hardcoded constants in code (numbers, strings) when they can be avoided :)
But I guess both options have their arguments. Using shape might not make it immediately obvious what the final size will be. But you know that EX will fit into the container without dimension errors, by just looking at these 2 lines (no need to be familiar with the widget, freeviz algorithm, what self.projections is, ...)
But I wouldn't spend time over it, either way is acceptable.

However, I would change the zeros to nans, as I think that is conceptually more correct.

janezd · 2018-11-06T11:47:32Z

I agree about nans.

I guess @VesnaT should make a change in a new PR. I promise I won't merge it. :)

OWFreeViz: Fix optimization for data with missing values

3e2bd8d

VesnaT changed the title ~~OWFreeViz: Fix optimization for data with missing values~~ [FIX]OWFreeViz: Fix optimization for data with missing values Nov 6, 2018

janezd merged commit dec74fb into biolab:master Nov 6, 2018

lanzagar reviewed Nov 6, 2018

View reviewed changes

lanzagar added this to the 3.18 milestone Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX]OWFreeViz: Fix optimization for data with missing values #3358

[FIX]OWFreeViz: Fix optimization for data with missing values #3358

VesnaT commented Nov 6, 2018

lanzagar Nov 6, 2018

lanzagar Nov 6, 2018

lanzagar commented Nov 6, 2018

janezd commented Nov 6, 2018

lanzagar commented Nov 6, 2018

janezd commented Nov 6, 2018

[FIX]OWFreeViz: Fix optimization for data with missing values #3358

[FIX]OWFreeViz: Fix optimization for data with missing values #3358

Conversation

VesnaT commented Nov 6, 2018

Issue

Description of changes

Includes

lanzagar Nov 6, 2018

Choose a reason for hiding this comment

lanzagar Nov 6, 2018

Choose a reason for hiding this comment

lanzagar commented Nov 6, 2018

janezd commented Nov 6, 2018

lanzagar commented Nov 6, 2018

janezd commented Nov 6, 2018