-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convergence confusion #311
Comments
This reproduces the issue. I'm using mostly defaults; in particular, the "batch-gradient" solver. I assume I'm using the development version as well: git log dates this as a Jun 3 2019 commit. # $ git log
# commit c962c840c4669ae69e623a2595eba40ae50e460d
# (HEAD -> master, origin/master, origin/HEAD)
# Author: Mainak Jas <[email protected]>
# Date: Mon Jun 3 23:56:36 2019 -0400
# FIX test and tweak to whats_new
import pandas as pd
import numpy as np
from numpy.linalg import norm
import itertools
from pyglmnet import GLM
# read the data
df = pd.read_csv(r"https://dlennon.org/assets/repro/bddae97a.gz")
y = df['Churn']
X = df.drop('Churn', axis=1)
# group ids
pref = [c.split('_')[0] for c in X.columns]
grp_id = [j for j,r in enumerate(itertools.groupby(pref), start=1) for k in r[1]]
# fit models as a function of lambda
n_mod = 13
models = []
lam = np.logspace(-6,6,num=n_mod)
model_params = dict(
distr = 'binomial',
group = grp_id,
tol = 0.,
verbose = 3
)
for i,l in enumerate(lam):
clf = GLM(reg_lambda = l, **model_params)
# warm start, if possible
try:
clf.beta0_ = models[i-1].beta0_
clf.beta_ = models[i-1].beta_
except IndexError:
pass
clf.fit(X.values, y.values)
models.append(clf)
# ------------------------------------------------------------------------------
# These are the tol=0.5 results. Beta is zero, but beta0 is not constant and
# not converging very quickly to -1.016114.
# In [68]: %paste
# pd.DataFrame({
# 'lambda' : lam,
# 'beta0' : [m.beta0_ for m in models],
# 'norm_beta' : [norm(m.beta_) for m in models]
# })
# ## -- End pasted text --
# Out[68]:
# lambda beta0 norm_beta
# 0 0.000001 -0.042286 0.265220
# 1 0.000010 -0.075504 0.370958
# 2 0.000100 -0.092875 0.452565
# 3 0.001000 -0.104283 0.518814
# 4 0.010000 -0.114575 0.541784
# 5 0.100000 -0.139812 0.238829
# 6 1.000000 -0.372410 0.000000
# 7 10.000000 -0.427910 0.000000
# 8 100.000000 -0.478218 0.000000
# 9 1000.000000 -0.523872 0.000000
# 10 10000.000000 -0.565348 0.000000
# 11 100000.000000 -0.603068 0.000000
# 12 1000000.000000 -0.637410 0.000000
# ------------------------------------------------------------------------------
# Taking tol = 0.1 gives the expected behavior but this is much slower and
# no convergence (from verbose=True) is reported after lambda > 0.1
# In [70]: pd.DataFrame({
# ...: 'lambda' : lam,
# ...: 'beta0' : [m.beta0_ for m in models],
# ...: 'norm_beta' : [norm(m.beta_) for m in models]
# ...: })
# ...:
# Out[70]:
# lambda beta0 norm_beta
# 0 0.000001 -0.139408 0.798327
# 1 0.000010 -0.143974 0.834305
# 2 0.000100 -0.148182 0.866839
# 3 0.001000 -0.152151 0.892953
# 4 0.010000 -0.156752 0.879674
# 5 0.100000 -1.016114 0.000000
# 6 1.000000 -1.016114 0.000000
# 7 10.000000 -1.016114 0.000000
# 8 100.000000 -1.016114 0.000000
# 9 1000.000000 -1.016114 0.000000
# 10 10000.000000 -1.016114 0.000000
# 11 100000.000000 -1.016114 0.000000
# 12 1000000.000000 -1.016114 0.000000 |
hi @inferentialist sorry for the delay in getting back to you. I finally had a moment to look at this but I can't reproduce your problem. See my code here: import pandas as pd
import numpy as np
from numpy.linalg import norm
import itertools
from pyglmnet import GLM
# read the data
df = pd.read_csv(r"https://dlennon.org/assets/repro/bddae97a.gz")
y = df['Churn']
X = df.drop('Churn', axis=1)
# group ids
pref = [c.split('_')[0] for c in X.columns]
grp_id = [j for j,r in enumerate(itertools.groupby(pref), start=1) for k in r[1]]
# fit models as a function of lambda
n_mod = 13
models = []
lam = np.logspace(-6,6,num=n_mod)
model_params = dict(
distr = 'binomial',
group = grp_id,
tol = 0.1,
verbose = 3,
max_iter = 1000
)
for i,l in enumerate(lam):
clf = GLM(reg_lambda = l, **model_params)
# warm start, if possible
try:
clf.beta0_ = models[i-1].beta0_
clf.beta_ = models[i-1].beta_
except IndexError:
pass
clf.fit(X.values, y.values)
models.append(clf)
print(clf.beta0_) It seems that convergence is just fine even for |
@jasmainak, thanks for taking a look. I was hoping you'd have a quick fix; sorry to have handed you a difficult to reproduce result. For what it's worth, I love the idea of what you're doing here with pyglmnet, and I'd like to get involved. In terms of references, I'd probably be inclined to start with the Lukas Meier JRSSB 2008 paper, if only because I seem to recall that the R version of glmnet is based on it. If you have suggestions, beyond what's on the documentation webpage, on how to get from Lukas' paper to the current codebase, definitely feel free to make suggestions. |
If the R version of glmnet is based on it, I think it sounds like a great idea. What solver does it use? By the way if you are working with group lasso, it would be great if you can also suggest an example dataset. Our current example takes ages to finish ... not sure how much a better solver would help. |
Hi Jas,
Sorry to have sat on this for the last two weeks. I've been caught up in
other projects.
So to get back to your question about solvers, my understanding is that the
Meier 2008 paper puts forward an argument for block coordinate solvers
because, for this class of problem, the penalty term is separable, and so
the sub-problem is relatively simple. I think, at the time of the
publication, this was a departure from the idea of using a homotopy method
to get solutions across all lambdas.
In terms of examples, one thing I've found helpful in the past is to start
with synthetic data. For these types of problems, I can think of two
characteristics that are worth replicating in a simulation. First, the
lasso is often applied to categorical data which leads to sparse data
matrices. Second, lasso methods are nice when p>n and most of the
covariates are noisy relative to the response variable.
I suppose the last thing that comes to mind is to ask about your current
working example. There's the community crime dataset in the repo, so
that'd be my default assumption. If it's something else, and you are in a
place to share it, let's sync on that. I also feel like I saw somewhere
that you were considering a JSS submission. I haven't tracked that journal
in a while, so maybe they welcome v2.0 articles, but if we can make some
additional progress here, I'd expect to be involved in that publication as
a contributing author.
Looking forward to hearing your thoughts!
Dustin
…On Tue, Aug 13, 2019 at 5:05 AM Mainak Jas ***@***.***> wrote:
If the R version of glmnet is based on it, I think it sounds like a great
idea. What solver does it use?
By the way if you are working with group lasso, it would be great if you
can also suggest an example dataset. Our current example takes ages to
finish ... not sure how much a better solver would help.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#311?email_source=notifications&email_token=ABKABM576MS36NBQNCCJ4ELQEKPQRA5CNFSM4IJG27HKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4FOF6Q#issuecomment-520807162>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKABMYOZ3JT2ODEM7CVL6LQEKPQRANCNFSM4IJG27HA>
.
|
hi @inferentialist I had a brief look at the paper you mention, it sounds promising. But we will need some work to integrate it into the current codebase. I'd be happy to add you as a contributing author of course, the criteria will be quite liberal anyway and if you help integrating these block coordinate descent solvers, it would certainly be a sizable contribution. I would start small with a WIP pull request so you don't spend too much time in the wrong direction and we can iterate together. The current group lasso example (originally implemented by @evadyer) takes over an hour to run! I think we use the same data as in the paper that you cite so comparison should be easy between packages. and before you dig any further, let me also point you to this open PR: https://github.com/glm-tools/pyglmnet/pull/226/files. Feel free to review and/or try it. I see that the paper uses line search, I wonder if it already solves some problems. |
Hey @inferentialist thanks for your thoughtful suggestions regarding alternate solvers. We've made some improvements to our convergence criteria recently and have made a new release on PyPI along with various improvements. You can pip install the latest version. Let us know if these improvements can solve your convergence problems. As far as contributions go for future releases, we'd love to make our coordinate descent based solver faster: it's a better solver than batch gradient in terms of convergence properties in theory but it isn't the case in practice simply because our current implementation is pure Python. Translating the solver to cython is a potential enhancement. Also happy to explore the block-coordinate direction if that's your interest. |
Hey there. I'm really excited about this package, but I've been seeing some unexpected convergence behavior and was hoping this might be a good place to ask for clarity. I'm happy to write this up in more detail, or dig into the code myself if necessary; let me know what would be most helpful.
Here's the short version of my scenario. I'm running a logistic group lasso with the divergence loss. I'm just ramping up the penalty parameter (lambda) across several orders of magnitude and, in the high-lambda regimes where the non-constant coefficients are forced to zero, I'm not seeing any convergence of the constant-term coefficient.
Tinkering with the tol parameter gets me the numerical behavior I would have expected, but is orders of magnitude slower and (with the verbose parameter set to True) fails to report convergence.
I might have thought that will the non-constant terms forced to zero, estimation of the constant term coefficient would be trivial. So, even with a loose tolerance, I suppose I'm surprised that my options are (a) the constant term coefficient doesn't converge or (b) everything slows down dramatically.
As a point of comparison, the R glmnet package, while less fully featured, doesn't seem to suffer from the issue.
Let me know if you have any thoughts about how I might help or where I should start digging around in the code. Thanks!
The text was updated successfully, but these errors were encountered: