Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric learning #12

Open
maxentile opened this issue May 16, 2015 · 15 comments
Open

Metric learning #12

maxentile opened this issue May 16, 2015 · 15 comments

Comments

@maxentile
Copy link
Owner

Kinetic discriminatory metric learning: http://pubs.acs.org/doi/abs/10.1021/ct400132h

Questions:

  • Can any aspects of the learned distance metric generalize across protein examples?
@maxentile
Copy link
Owner Author

"Large-margin component analysis" -- combining dimensionality reduction and metric learning-- http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_791.pdf

@maxentile
Copy link
Owner Author

@maxentile
Copy link
Owner Author

@maxentile
Copy link
Owner Author

Nonlinear metric learning: http://www1.cse.wustl.edu/~kilian/papers/chilmnn.pdf

@maxentile
Copy link
Owner Author

Other ideas for follow-on work:

  • "Neural-net-based metric learning" e.g. Siamese networks

@maxentile
Copy link
Owner Author

Bayesian active distance metric learning: http://www.cs.cmu.edu/~liuy/uai2007_bayesian.pdf

Given two arbitrary points in configuration space, the "kinetic distance" between them can be approximately determined, but at some cost. We effectively want to learn a good kinetic distance metric over all of configuration space. Seems like it could be a good fit for this approach...

@maxentile
Copy link
Owner Author

Metric learning for kernel regression: http://jmlr.org/proceedings/papers/v2/weinberger07a/weinberger07a.pdf
e.g. regressing on time lag

@maxentile
Copy link
Owner Author

Kernel regression with sparse metric learning: http://www.cst.ecnu.edu.cn/~slsun/pubs/KRSML.pdf

@maxentile
Copy link
Owner Author

Large-margin multitask metric learning: http://papers.nips.cc/paper/3935-large-margin-multi-task-metric-learning.pdf

@maxentile
Copy link
Owner Author

Idea: "Tree-preserving metric learning"

In LMNN, we consider triplets (i,j,k) of points where (i,j) belong to the same class and k belongs to a different class, and we want to learn a metric that places pairs (i,j) closer together than pairs (i,k), plus a constant margin. We could modify this approach to accept hierarchical cluster assignments instead of class labelings.

@maxentile
Copy link
Owner Author

For LMNN, is there a way to supply an arbitrary loss function, rather than 0/1 loss? E.g. if we have 3 classes A,B,C, can I say that mistakenly putting A close to B is worse than putting A close to C? If so, that would be nice: "tree-preserving metric learning" would just be a special case where the loss function is "cophenetic distance" or something.

Tactic: one approach might be simply to replace the "1" in the SDP constraint with the desired margin for each triple:
image
(ref: http://www.cse.wustl.edu/~kilian/papers/spm2010.pdf p. 4)

@maxentile
Copy link
Owner Author

There would be several cool applications for solving this general problem.
(1) Kinetic discriminatory metric learning: http://pubs.acs.org/doi/abs/10.1021/ct400132h - Given a time-ordered sequence of feature-vectors, we want to find a distance metric on the feature-vectors that preserves "kinetic distance" i.e. observations that are close in time should be close according to the learned metric. Previous approaches used a 0/1 loss at fixed lag times, but we could presumably do better if errors were weighted by the magnitude of the time lag.
(2) "Tree-preserving metric learning" - Given a hierarchical cluster tree, find a distance metric such that points far apart in the tree are far apart in the metric.

@maxentile
Copy link
Owner Author

Could apply this to learning the weight matrix M of what I'll call the "weighted Binet-Cauchy kernel"

The Binet-Cauchy kernel is:

from numpy import sqrt,eye
from numpy.linalg import det

def BC(X,Y):
    return det(X.T.dot(Y)) / sqrt(det(X.T.dot(X)) * det(Y.T.dot(Y)))

where X and Y are matrices.

When X and Y are _n_x3 matrices containing residue coordinates, this computes something similar to RMSD, but has a couple advantages: it's faster, simpler, and it's definitely a kernel. (RMSD is probably also a kernel, but I'm not sure.)

We can weight pairs of residues using a matrix M:

def weighted_BC(X,Y,M=None):
    if M==None:
        M = eye(len(X))

    return det(X.T.dot(M).dot(Y)) / sqrt(det(X.T.dot(M).dot(X)) * det(Y.T.dot(M).dot(Y)))

Question: what constraints must M satisfy for weighted_BC to be a valid kernel?
Guess: as long as M is psd, it should be okay?

@maxentile
Copy link
Owner Author

Neighborhood component analysis:
http://papers.nips.cc/paper/2566-neighbourhood-components-analysis.pdf -- see section 5 for extensions to continuous labels (e.g. time-lags)

@maxentile
Copy link
Owner Author

For fast tinkering I'm just writing down objective functions and then using autograd to fit it by gradient descent, rather than committing to a fancy learning technique earlier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant