Predictive Distribution vs. Posterior Predictive Distribution #261

braydentang1 · 2021-05-29T00:17:03Z

braydentang1
May 29, 2021

Hello,

First, I really appreciate the work that went into this algorithm. I find it quite interesting and appreciate the nice documentation.

I just have a question regarding how (or if) ngboost in some fashion takes into consideration the parameter uncertainty by some method that I can't seem to understand or grasp.

It would appear that the model outputs a single set of parameters for each observation to get the predictive distributions. The graphical curves you see in the examples are just the density functions implied by the likelihood that is chosen and the parameters output by the model.

However, am I correct in saying that this ignores uncertainty regarding the actual parameters/model itself? How can we call this a posterior predictive distribution when we aren't marginalizing over the uncertainty with respect to the parameters of the distribution?

If the above interpretation is correct, I suppose one time-consuming way could be to use bootstrapping to get the parameter uncertainty or use some sort of Laplace/Gaussian approximation, but I am just wondering if 1) what I am saying is even true, 2) suggestions you may have to address this.

Thanks!

Answered by alejandroschuler

May 29, 2021

Thanks @braydentang1!

That is a great question :) As you have surmised, NGBoost says absolutely nothing about the statistical uncertainty in the predicted distribution.

You can certainly estimate resampling uncertainty in the parameters (or any functional of the predicted distribution) using the bootstrap. That's not on 100% firm theoretical ground as far as I know because the bootstrap does fail in rare occasions (e.g. matching estimators) but on the other hand I don't think there's a reason to assume that would happen here so I think you're good.

On the other hand, you have to think carefully about what that would mean. If you don't believe the parametric assumptions hold, then the esti…

View full answer

alejandroschuler · 2021-05-29T01:31:49Z

alejandroschuler
May 29, 2021
Maintainer

Thanks @braydentang1!

That is a great question :) As you have surmised, NGBoost says absolutely nothing about the statistical uncertainty in the predicted distribution.

You can certainly estimate resampling uncertainty in the parameters (or any functional of the predicted distribution) using the bootstrap. That's not on 100% firm theoretical ground as far as I know because the bootstrap does fail in rare occasions (e.g. matching estimators) but on the other hand I don't think there's a reason to assume that would happen here so I think you're good.

On the other hand, you have to think carefully about what that would mean. If you don't believe the parametric assumptions hold, then the estimates of the parameter functions don't really mean anything because the parameters themselves don't exist in the real world. Moreover there is no proof that I know of that NGBoost is actually statistically consistent for all parameters when the assumptions are true. However, I suspect it should be, and moreover that it should approach the minimum-divergence distribution in the parametric model to the true distribution; but that is conjecture. On the other hand, estimates of the resampling uncertainty of generic functionals of the predicted distribution (e.g. quantiles) may be of some value even if the equivalents for the predicted parameters are not.

While NGBoost may appear to be a tool for inference because of its probabilistic nature, it is still really a tool for prediction which should be evaluated like any other prediction model. For example, if I'm a patient and a doctor gives me a prognosis, I care a lot less about how that prediction might have changed if their original sample might have been different, and a lot more about how often that prediction is actually right for any given patient. And the latter is very easy to estimate using a test set (calculate RMSE, sensitivity, whatever). In NGBoost it's the same, except that instead of evaluating it for its prediction of the mean, you are free to also evaluate it for its prediction of quantiles, etc. For instance, if you cared about predicting the 90th percentile, you would calculate something like the percentage of data points in the test set that exceeded their predicted 90th percentile (ideally 10%) and report that measure of calibration. The point is that you treat the model as fixed and you are interested in understanding how it performs in practice.

There certainly are ways to use NGBoost for inference (i.e. to "say something" about the world) but as with all inference, that requires a boatload of assumptions that should not be made lightly.

Lastly, a note about posterior distributions: NGBoost sort of masquerades as a Bayesian method but that's really because Bayesian methods have been one of the only ways to do probabilistic regression up until now. There isn't anything inherently Bayesian about NGBoost so I usually shy away from calling its output a "posterior" conditional distribution over the target and features. You can, however, interpret it in all the same ways if you imagine that the conditional posteriors of the parameters are all delta functions at their predicted values.

4 replies

braydentang1 May 29, 2021
Author

Hi @alejandroschuler, thanks for the quick and detailed response.

I suppose my confusion does not lie in whether this is a tool for prediction or inference, but rather, how ngboost seemingly achieves such great performance by seemingly not incorporating parameter uncertainty.

From my understanding, not including this source of uncertainty leads to (posterior) predictive distributions/intervals that are too narrow. Would this not be equivalent to say, taking the MAP estimates from posterior distributions and then simulating a random variable from a distribution with these MAP estimates? From my understanding, I thought (posterior) predictive distributions were always defined by marginalizing over parameter uncertainty (either using asymptotic results like in the frequentist context with OLS models or a fully Bayesian analysis). Please correct me if this is incorrect or not the intended interpretation!

In a Bayesian analysis, one forms (posterior) predictive distributions for each observation, which are seemingly what ngboost is trying to also estimate, by marginalizing over the posterior distribution. I can then report anything I'd like as some predictive quantity of interest - mean, median, quantiles, etc. just like in ngboost. Indeed, I can also do inference here by calculating some arbitrary test statistic over each simulated dataset but the only difference here, as far as I understand, is a summarization over each generated dataset rather than a summarization over each observation (which is what I thought ngboost is doing).

Perhaps I am confused by the meaning of "predictive distribution" in this context.

alejandroschuler May 29, 2021
Maintainer

Your interpretation is correct. The difference is that there is no parameter uncertainty in NGBoost, as you have pointed out. Whether or not that makes the predicted distribution "too narrow" is a matter of interpretation, however. If you are a frequentist and you believe that the truth is characterized by a fixed set of parameters and that your model is well-specified, then estimating the fixed value of those parameters in a consistent way will give you the correct predictive distribution as the amount of data you have increases. If you are a Bayesian, however, you may believe that the parameters themselves are random and their distribution is governed by some other underlying process (which itself has fixed parameters). In that case the NGBoost model is not capturing the complexity of the underlying data-generating process, but that still doesn't mean that the predictive intervals will be narrow per se. The uncertainty that in reality comes from varying parameters across realizations may instead be soaked up in the model by increasing the assumed uncertainty of the observable variables across realizations. So ultimately you could get to a situation where the prediction intervals are right, but for the wrong reasons (and it ultimately doesn't matter!).

Alternatively you can interpret NGBoost as some sort of Bayesian model where you have an infinitely strong prior that tells you that the posterior distribution of the parameters must be a deterministic value conditional on the observed features, but you're very agnostic about what that value is. I'm not saying you can concretely build such a prior, just that you can imagine such a thing.

I should also point out that none of this is unique to NGBoost. Other methods of probabilistic regression (e.g. mixture density networks) or generative modeling in general use fixed, learned parametrizations.

My point in discussing the predictive vs. inferential use of NGBoost is to say that the interpretation is actually not relevant at all if you clearly define an evaluation. I could have a team of monkeys in a zoo spit out predictive distributions and tell you that they came from some model and with a proper test set and evaluation scheme you would still be able to tell me whether those predictions are more useful or not then the equivalent predictions from NGBoost or a Bayesian model or whatever you like. Obviously there is no meaningful interpretation of a predictive distribution from a monkey, but you can still meaningfully evaluate such a distribution in a predictive context.

braydentang1 May 29, 2021
Author

All right - I think I have a better understanding of what you are trying to explain to me, especially once you stated "as the amount of data that you have increases", which I assume is alluding to the standard error of estimators basically being negligible for large data, i.e. consistency (or perhaps being absorbed in the observable variable across realizations).

Thanks so much for your responses @alejandroschuler. This really clears things up for me - sorry if these questions were more general and not specific to ngboost in particular.

alejandroschuler May 29, 2021
Maintainer

No problem, happy to help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predictive Distribution vs. Posterior Predictive Distribution #261

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Predictive Distribution vs. Posterior Predictive Distribution #261

braydentang1 May 29, 2021

Replies: 2 comments · 4 replies

alejandroschuler May 29, 2021 Maintainer

braydentang1 May 29, 2021 Author

alejandroschuler May 29, 2021 Maintainer

braydentang1 May 29, 2021 Author

alejandroschuler May 29, 2021 Maintainer

braydentang1
May 29, 2021

Replies: 2 comments 4 replies

alejandroschuler
May 29, 2021
Maintainer

braydentang1 May 29, 2021
Author

alejandroschuler May 29, 2021
Maintainer

braydentang1 May 29, 2021
Author

alejandroschuler May 29, 2021
Maintainer