Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real batchwise PCA #12

Open
hantek opened this issue Mar 24, 2015 · 1 comment
Open

Real batchwise PCA #12

hantek opened this issue Mar 24, 2015 · 1 comment

Comments

@hantek
Copy link
Owner

hantek commented Mar 24, 2015

The current version of PCA loads the whole dataset once and then do computing. We need a completely batchwise PCA, which sucks input from some kind of iterator, and do the computing in an unloaded way.

The iterator can be a requirement for all dataset wrappers. (Generates one itierant if the dataset is not big, or generates N itierants according to user.)

@hantek
Copy link
Owner Author

hantek commented May 12, 2015

Write PCA class into a inherited class of Layer, that allows one to put the PCA step into a pipeline before fitting it for some data.

The ideal way to build up a preprocessing pipeline is to first buildup all preprocessing steps at once in a line, concetenated with "+" operator, and then fit each of the step (if needed) in a layerwise manner.

So for preprocess classes, they should have the following methods in addition to those inherited from the Layer class.

  1. fit()
  2. an init() method
  3. an output() method which prevents being called before the layer is fitted.

Specially for PCA, it should be like this:

  1. init()
    instantiate an forward and backward layer,
  2. fit()
  3. fit_partwise()
  4. method for compute correlation matrix
  5. method for the rest of the steps in PCA, which can be shared by both fit() and fit_partwise()

Also, we need to come up with a way to deal with unspecified output dimension. Currently the StackedLayer class doesn't accept layers with an unspecified output dimension.

The problem which comes up with PCA in a high layer also rises, i.e., to have an mechanism which automatically prepares data for the input of PCA, on-the-fly. I still don't know how to achieve that yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant