-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-8487: implement HGLM gaussian [nocheck] #16403
Conversation
9ecc510
to
925042a
Compare
60ecdae
to
d7eeb43
Compare
bb90e33
to
1f5c45b
Compare
fa0559a
to
adcc679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wendycwong. Thanks for this big contribution.
I reviewed 80/110 files. I will continue tomorrow. It would be nice if, in this PR, you keep only HGLM-related changes. For example, implementation of HGLM can be one PR, and removing old code from GLM can be another PR. Also implementation of Python and R API can be separate PR.
It would make the review process much easier. Also, there would be less space for bugs.
h2o-py/tests/testdir_algos/glm/pyunit_GH_6722_separate_linear_beta_gaussian.py
Show resolved
Hide resolved
h2o-py/tests/testdir_algos/glm/pyunit_link_functions_gaussian_glm.py
Outdated
Show resolved
Hide resolved
9777541
to
b46698e
Compare
@wendycwong, I finished my review. I found just minor bugs. I tried to check all the math, and everything looks good. Tests passed. Have you tried your test run on multinode? Just to be sure. Thanks for this huge contribution! |
6f84597
to
f1a2948
Compare
Hi @wendycwong. Thanks for incorporating the suggestions. There are still two HGLM tests failing. So, after all the tests pass, I can approve the PR. |
cbe0444
to
3ed484b
Compare
GH-8487: crafting HGLM parameters. GH-8487: implement EM algo. GH-8487: forming the fixed matrices and vectors. GH-8487: add test to make sure correct initialization of fixed, random coefficients, sigma values and T matrix. GH-8487: Finished implementing EM to estimate fixed coefficients, random coefficients, tmat and tauEVar GH-8487: finished implementing prediction but still need to figure out the model metrics calculation. GH-8487: Adding support for models without random intercept. GH-8487: adding normalization and denormalization of coefficients for fixed and random. GH-8487: Completed prediction implementation and added tests to make sure prediction is correct when standardize=true/false, random_intercept = true/false. GH-8487: fixing model metric classes. GH-8487: add python and R tests. GH-8487: adding hooks to generate synthetic data. GH-8487: added scoring history, model summary, coefficient tables. GH-8487: added modelmetrics for validation frame. GH-8487: From experiment to find best tauEVar calculation process. The one in equation 10 is best. GH-8487: add capability in Python client to extract scoring history, model summary, model metrics, model coefficients (fixed and random), icc, T matrix, residual variance. GH-8487: done checking scoring history, model summary and model metrics. GH-8487: added R client test for utility functions. GH-8487: use lambda_ instead lf Lambda in pyunit_benign_glm.py GH-8487: remove standardize from HGLM as the convention does not do standardization. Co-authored-by: Veronika Maurerová <[email protected]> Move test to check init values are set correctly to Python from Java. I was not able to find a good combination of initial betas/ubetas and t matrix to make it work.
b708d3c
to
239f3c7
Compare
This PR fixes this issue: #8487
I have separated HGLM from GLM as its own toolbox. The only family that is supported now is Gaussian. I still need to do the following:
HGLM_H2O_Implementation.pdf