Real batchwise PCA #12

hantek · 2015-03-24T01:20:42Z

The current version of PCA loads the whole dataset once and then do computing. We need a completely batchwise PCA, which sucks input from some kind of iterator, and do the computing in an unloaded way.

The iterator can be a requirement for all dataset wrappers. (Generates one itierant if the dataset is not big, or generates N itierants according to user.)

hantek · 2015-05-12T15:28:07Z

Write PCA class into a inherited class of Layer, that allows one to put the PCA step into a pipeline before fitting it for some data.

The ideal way to build up a preprocessing pipeline is to first buildup all preprocessing steps at once in a line, concetenated with "+" operator, and then fit each of the step (if needed) in a layerwise manner.

So for preprocess classes, they should have the following methods in addition to those inherited from the Layer class.

fit()
an init() method
an output() method which prevents being called before the layer is fitted.

Specially for PCA, it should be like this:

init()
instantiate an forward and backward layer,
fit()
fit_partwise()
method for compute correlation matrix
method for the rest of the steps in PCA, which can be shared by both fit() and fit_partwise()

Also, we need to come up with a way to deal with unspecified output dimension. Currently the StackedLayer class doesn't accept layers with an unspecified output dimension.

The problem which comes up with PCA in a high layer also rises, i.e., to have an mechanism which automatically prepares data for the input of PCA, on-the-fly. I still don't know how to achieve that yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real batchwise PCA #12

Real batchwise PCA #12

hantek commented Mar 24, 2015

hantek commented May 12, 2015

Real batchwise PCA #12

Real batchwise PCA #12

Comments

hantek commented Mar 24, 2015

hantek commented May 12, 2015