You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current version of PCA loads the whole dataset once and then do computing. We need a completely batchwise PCA, which sucks input from some kind of iterator, and do the computing in an unloaded way.
The iterator can be a requirement for all dataset wrappers. (Generates one itierant if the dataset is not big, or generates N itierants according to user.)
The text was updated successfully, but these errors were encountered:
Write PCA class into a inherited class of Layer, that allows one to put the PCA step into a pipeline before fitting it for some data.
The ideal way to build up a preprocessing pipeline is to first buildup all preprocessing steps at once in a line, concetenated with "+" operator, and then fit each of the step (if needed) in a layerwise manner.
So for preprocess classes, they should have the following methods in addition to those inherited from the Layer class.
fit()
an init() method
an output() method which prevents being called before the layer is fitted.
Specially for PCA, it should be like this:
init()
instantiate an forward and backward layer,
fit()
fit_partwise()
method for compute correlation matrix
method for the rest of the steps in PCA, which can be shared by both fit() and fit_partwise()
Also, we need to come up with a way to deal with unspecified output dimension. Currently the StackedLayer class doesn't accept layers with an unspecified output dimension.
The problem which comes up with PCA in a high layer also rises, i.e., to have an mechanism which automatically prepares data for the input of PCA, on-the-fly. I still don't know how to achieve that yet.
The current version of PCA loads the whole dataset once and then do computing. We need a completely batchwise PCA, which sucks input from some kind of iterator, and do the computing in an unloaded way.
The iterator can be a requirement for all dataset wrappers. (Generates one itierant if the dataset is not big, or generates N itierants according to user.)
The text was updated successfully, but these errors were encountered: