-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Terminal Leafs Cubist #47
Comments
Hi! Thanks for bringing this suggestion. I wasn't aware of that Cubist package. It seems to be similar to what is perfomedby piecewise-linear library. There are some aspects of the implementation I'd need to figure out:
The papers on the original implementation seem to require paid registration. But if the general general idea on how the trees should be grown is figure out, my best guess is that it should be feasible to implement, subject to some specific tricks as noted above for the prediction. Something Julia's multiple dispatch should help simplifies. Let me know what you think! |
Here is Quinlan 1992 & Quinlan 1993, a short presentation & a longer one. I'm not an expert on the underlying algorithm, but I've found when I run all Caret models on different datasets, Cubist tends to do well. Sometimes outperforming xgb. Yes, the idea is similar to piecewise-linear. AnyBoost.jl is boosting on steroids. |
Here's a trivial example of a tree w/ two leaves where the user can define a custom Loss/Model: using Statistics
AZ_Tree= function(x, y; j=1)
splits= sort(unique(x[:,j]))
sse = []
for (i, sp) in enumerate(splits)
#println(i," ",sp)
il= x[:,j] .< sp #index less than
ig= x[:,j] .>= sp #index greater than
ℓ= L(y[il], f(x[il,:], y[il])) + L(y[ig], f(x[ig,:], y[ig]))
push!(sse, ℓ)
end
split_at= splits[argmin(sse)]
return (sc = minimum(sse), split = split_at)
end
X= rand(10,2);
y= X*rand(2,1) + rand(10);
#
L(y,ŷ)= sum( (y .- ŷ).^2 ); #L2 loss
#
f(x,y)= mean(y) * ones(size(y,1)); #constant in leaf
AZ_Tree(X,y, j=1)
#
f(x,y)= x * (x \ y); #linear model in leaf
AZ_Tree(X,y, j=1)
#Boston.
using MLJ
X, y = @load_boston; X = hcat(X...);
#
f(x,y)= mean(y) * ones(size(y,1)); #constant in leaf
AZ_Tree(X,y, j=1)
sc= []; #use all variables
for j= 1 : size(X,2)
s = AZ_Tree(X,y, j=j)
push!(sc, s)
end
sc
f(x,y)= det(x'x)≈ 0.0 ? 0.0 : x * (x \ y); #linear model in leaf
AZ_Tree(X,y, j=1)
sc= [];
for j= 1 : size(X,2)
s = AZ_Tree(X,y, j=j)
push!(sc, s)
end
sc |
I can see that such approach for a linear predictor would be functional. For what I can see, it would however be a costly operator as evaluation of the split involves a full fit on all the observations rather on left and right for each split, rather than going through an online approach as is used is gradient based approaches like EvoTrees. In particular, I'm seeing challenges relating to the aggregation of the observations performed by the histogram method, which allows for fast training on large datasets. The aggregation of the first and second order derivatives by simple sum works for the the taylor approximation currently used, but could such trick be used for an arbitrarily defined function? The above are concerns that apply considering my priority for now is about performance (training speed), so being able to perform training on histogram methods is needed. I can see though applicability on a tree algorithm working on the exact method, where no prior aggregation is performed. Potential linear effect I could see feasible to implement within EvoTrees context would be a dual pass where a linear fitting is performed on each variable in addition to the regular split checks, allowing to capture linear effect where the gain is more important than the regular binary split. |
Hi and thank you for your package!
I love how flexible it is.
I've had good results w/ Cubist models: in which terminal leaves contain linear regression models, as opposed to simple averages.
Would it be hard to include this type of option in EvoTrees?
The text was updated successfully, but these errors were encountered: