Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posterior model of random forest #254

Open
sgbaird opened this issue Dec 10, 2021 · 2 comments
Open

Posterior model of random forest #254

sgbaird opened this issue Dec 10, 2021 · 2 comments

Comments

@sgbaird
Copy link

sgbaird commented Dec 10, 2021

Are there facilities for sampling from the posterior distribution of the random forest? (e.g. for integration with Ax/BoTorch).

@bfolie
Copy link

bfolie commented Dec 13, 2021

Currently there is no method to sample from an RF distribution, though one could be easily created.

A BoTorch model only needs to produce a posterior, which only needs to implement rsample. The prediction of a random forest model in Lolo implements getUncertainty, which can reasonably be considered the standard deviation of a normal distribution, so you could draw samples in that way.

However this might not be sufficient for your needs, because you'd be treating a potentially multivariate distribution as a product of several independent univariate distributions. This is in contrast to Gaussian process regression, which naturally produces a rich posterior that takes covariance into account.

We are about to release functionality inspired by some studies into correlations between random forest predictions in a multi-output setting. A similar approach (correlation over trees) would likely work to estimate the correlation coefficient between predictions made by the same RF model at distinct input points. And in that way you could construct a covariance matrix and sample from the corresponding multivariate normal. But it's not implemented or even thoroughly studied at this time.

@sgbaird
Copy link
Author

sgbaird commented Dec 14, 2021

@bfolie thank you for the quick response and thorough reply! Treating it as a sample from the normal distribution could work, though I agree that GPR is "richer" in terms of accounting for covariance. That is interesting to hear about the multi-output study.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants