Skip to content

udellgroup/pyglrm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pyglrm

Pyglrm is a python package for modeling and fitting generalized low rank models (GLRMs), based on the Julia package LowRankModels.jl.

GLRMs model a data array by a low rank matrix, and include many well known models in data analysis, such as principal components analysis (PCA), matrix completion, robust PCA, nonnegative matrix factorization, k-means, and many more.

For more information on GLRMs, see our paper.

Requirements

  • OS: Any linux or MacOS
  • Python (2.7 or 3.x)
  • Working installation of git

This package has been tested on Ubuntu linux and may not work on non-linux OSes.

This package also relies on Julia. On linux systems this package can install the most recent version of Julia automatically. The version of Julia distributed by most linux distributions is may not be recent enough to run this package. We recommend you use the official binary from the Julia website.

Note If you use the version of Julia installed by this package, you may need to run

export PATH=$PATH:$HOME/.julia/julia-903644385b/bin

in order to access Julia.

Installation

Windows

Windows based installations are not supported yet.

MacOS Installation

  1. Install the most recent version of Julia (0.6) by following downloading the appropriate installer from the Julia website and following the direction for your operating system on the instructions page.

  2. Check that Julia runs on the command line by running the command julia on the command line.

  3. Using your choice of pip, pip2, or pip3 depending on the version of Python you intend on using, run the command

    pip install git+https://github.com/udellgroup/pyglrm --user
    

    The installation will get the package via git - you may need to enter you password for gitlab.

Linux

  1. Note that the default distribution of Julia included in most package managers is not sufficiently up to date to run this package. We instead recommend using the version of Julia from the Julia website. The installer for this package can install Julia for you.

  2. Using your choice of pip, pip2, or pip3 depending on the version of Python you intend on using, run the command

    pip install git+https://github.com/udellgroup/pyglrm --user
    

    The installation will get the package via git - you may need to enter you password for gitlab.

  3. If you let pip install Julia, you may need to run the command

    export PATH=$PATH:$HOME/.julia/julia-903644385b/bin
    

Common Troubleshooting

  1. Segmentation faults

    This sometimes corresponds to the error message

    SystemError: initialization of _heapq did not return an extension module.
    

    The underlying software that runs the package compiles itself for one version of Python at a time. For example, if you install the package using Python 2.7 and then use Python 3.6 you will get a segmentation fault.

    If switching between versions of Python is your problem, there is a simple solution. Each time you switch version of Python first run

    whereis python
    whereis python3
    

    or

    which python
    which python3
    

    to find the absolute path to the version of Python you plan to use. Then run the following commands in Julia

    ENV["PYTHON"] = path_to_python_binary
    Pkg.build("PyCall")
    exit()
    

    This should resolve the issue.

  2. On linux, after installation "Julia" cannot be found.

    You may need to run the command

    export PATH=$PATH:$HOME/.julia/julia-903644385b/bin
    

Generalized Low Rank Models

GLRMs form a low rank model for tabular data A with m rows and n columns, which can be input as an array or any array-like object (for example, a data frame). The desired model is specified by choosing a rank k for the model, an array of loss functions losses, and two regularizers, rx and ry. The data is modeled as X'*Y, where X is a kxm matrix and Y is a kxn matrix. X and Y are found by solving the optimization problem

minimize sum_{(i,j) in obs} losses[j]((X'*Y)[i,j], A[i,j]) + sum_i rx(X[:,i]) + sum_j ry(Y[:,j])

The basic type used by LowRankModels.jl is the GLRM. To form a GLRM, the user specifies

  • the data A
  • the array of loss functions losses
  • the regularizers rx and ry
  • the rank k

Losses and regularizers must be of type Loss and Regularizer, respectively, and may be chosen from a list of supported losses and regularizers, which include

Losses:

  • quadratic loss QuadLoss
  • hinge loss HingeLoss
  • logistic loss LogisticLoss
  • poisson loss PoissonLoss (not yet implemented)
  • weighted hinge loss WeightedHingeLoss
  • l1 loss L1Loss
  • ordinal hinge loss OrdinalHingeLoss
  • periodic loss PeriodicLoss
  • multinomial categorical loss MultinomialLoss
  • multinomial ordinal (aka ordered logit) loss MultinomialOrdinalLoss

Regularizers:

  • quadratic regularization QuadReg
  • constrained squared euclidean norm QuadConstraint
  • l1 regularization OneReg
  • no regularization ZeroReg
  • nonnegative constraint NonNegConstraint (eg, for nonnegative matrix factorization)
  • 1-sparse constraint OneSparseConstraint (eg, for orthogonal NNMF)
  • unit 1-sparse constraint UnitOneSparseConstraint (eg, for k-means)
  • simplex constraint SimplexConstraint
  • l1 regularization, combined with nonnegative constraint NonNegOneReg
  • fix features at values y0 FixedLatentFeaturesConstraint(y0)

Each of these losses and regularizers can be scaled (for example, to increase the importance of the loss relative to the regularizer) by having arguments like QuadLoss(scale=1.0) in class initializations.

Example

For example, the following code performs PCA with n_components=2 (corresponds to the target rank k) on the 3x4 matrix A:

import numpy as np
from pyglrm import *
A = np.array([[1, 2, 3, 4], [2, 4, 6, 8], [4, 5, 6, 7]])
losses = QuadLoss()
rx = ZeroReg()
ry = ZeroReg()
g = glrm(losses, rx, ry, n_components=2) #create a class for GLRM (Here it does PCA), in which n_components is the number of target dimensions
g.set_training_data(inputs=A)
g.fit()
a_new = np.array([6, 7, 8, 9]) #initialize a new row to be tested
x = g.transform(inputs=a_new) #get the latent representation of a_new

which runs an alternating directions proximal gradient method on g to find the X and Y minimizing the objective function. (ch gives the convergence history; see Technical details below for more information.)

The losses argument can also be an array of loss functions, with one for each column (in order). For example, for a data set with 3 columns, you could use

losses = [QuadLoss(), LogisticLoss(), HingeLoss()]

Similiarly, the ry argument can be an array of regularizers, with one for each column (in order). For example, for a data set with 3 columns, you could use

ry = [QuadReg(1), QuadReg(10), FixedLatentFeaturesConstraint([1.,2.,3.])]

This regularizes the first to columns of Y with ||Y[:,1]||^2 + 10||Y[:,2]||^2 and constrains the third (and last) column of Y to be equal to [1,2,3].

More examples here.

Standard low rank models

Low rank models can easily be used to fit standard models such as PCA, k-means, and nonnegative matrix factorization. The following functions are available:

  • pca: principal components analysis
  • qpca: quadratically regularized principal components analysis
  • rpca: robust principal components analysis
  • nnmf: nonnegative matrix factorization

To create a class for one of these standard models, replace glrm in the above example with the model name above. Any keyword argument valid for a GLRM object, such as an initial value for X or Y or a list of observations, can also be used with these standard low rank models.

Citing this package

If you use LowRankModels for published work, we encourage you to cite the software.

Use the following BibTeX citation:

@article{glrm,
  title = {Generalized Low Rank Models},
  author ={Madeleine Udell and Horn, Corinne and Zadeh, Reza and Boyd, Stephen},
  doi = {10.1561/2200000055},
  year = {2016},
  archivePrefix = "arXiv",
  eprint = {1410.0342},
  primaryClass = "stat-ml",
  journal = {Foundations and Trends in Machine Learning},
  number = {1},
  volume = {9},
  issn = {1935-8237},
  url = {http://dx.doi.org/10.1561/2200000055},
}

About

The Python wrapper for LowRankModels.jl.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •