Loss functions

Given a prediction (p) and a label (y), a loss function (\ell(p,y)) measures the discrepancy between the algorithm's prediction and the desired output. VW currently supports the following loss functions:

Squared loss [\ell(p,y)=\frac{1}{2}(p-y)^2]
Logistic loss [\ell(p,y)=\log(1+\exp(-yp))]
Hinge loss [\ell(p,y)=\max(0,1-yp)]
(\tau)-Quantile loss [\ell(p,y)=\tau(p-y)\mathbb{I}(y<p) +(1-\tau)(y-p)\mathbb{I}(y \geq p) ]

To select a loss function in VW see the Command line arguments guide. The Logistic and Hinge loss are for binary classification only, and thus all samples must have class "-1" or "1".

Which loss function should I use?

If the problem is a binary classification problem your choices should be Logistic or hinge loss. Example: spam vs non-spam, odds of click vs no-click. (Q: when should hinge-loss be used vs logistic?)
If the problem is a regression problem (the target label you're trying to predict is a real value) you should be using Squared or Quantile loss. If you're trying to minimize the mean error, pick squared-loss, if OTOH you're trying to predict the median (or any other quantile), use quantile-loss. Example: revenue, height, weight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss functions

Which loss function should I use?

Clone this wiki locally