Skip to content
arielf edited this page Jul 24, 2012 · 33 revisions

Given a prediction (p) and a label (y), a loss function (\ell(p,y)) measures the discrepancy between the algorithm's prediction and the desired output. VW currently supports the following loss functions:

  1. Squared loss [\ell(p,y)=\frac{1}{2}(p-y)^2]
  2. Logistic loss [\ell(p,y)=\log(1+\exp(-yp))]
  3. Hinge loss [\ell(p,y)=\max(0,1-yp)]
  4. (\tau)-Quantile loss [\ell(p,y)=\tau(p-y)\mathbb{I}(y<p) +(1-\tau)(y-p)\mathbb{I}(y \geq p) ]

To select a loss function in VW see the Command line arguments guide. The Logistic and Hinge loss are for binary classification only, and thus all samples must have class "-1" or "1".

Which loss function should I use?

  • If the problem is a binary classification problem your choices should be Logistic or hinge loss. Example: spam vs non-spam, odds of click vs no-click. (Q: when should hinge-loss be used vs logistic?)
  • If the problem is a regression problem (the target label you're trying to predict is a real value) you should be using Squared or Quantile loss. If you're trying to minimize the mean error, pick squared-loss, if OTOH you're trying to predict the median (or any other quantile), use quantile-loss. Example: revenue, height, weight