On Bandits and Swipes – Gamification of Search

Stefan Otte

Active Learning or: How I Learned to Stop Worrying and Love Small Data

Stefan Otte

“Nothing Is Quite So Practical as a Good Theory”

– Kurt Lewin

Active Learning

“The key idea behind *active learning* is that a machine learning algorithm can achieve *greater accuracy* with *fewer training labels* if it is allowed to *choose the data* from which it learns.

An active learner may *pose queries*, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator).

Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but *labels are difficult, time-consuming, or expensive to obtain*.”

– Burr Settles, Active Learning Literature Survey

greater accuracy with fewer training labels

→ “good data^TM”

actively query for data

→ sequential decision making

${\huge \textbf{X} →} \begin{bmatrix} cat\ dog\ \vdots\ cat \end{bmatrix}$

${\huge \textbf{X} →} \begin{bmatrix} ?\ ?\ \vdots\ ? \end{bmatrix}$

What is Interesting?

uncertainty
- least confident
- margin
- entropy
query-by-committee
expected model change (decision theory)
expected error reduction
expected variance reduction
…

Gamification of Search

Multi-Armed Bandits

Problem statement

Find a multi-armed bandit
Play arms using bandit theory
Profit $$$

Problem statement

given a bandit with $n$ arms
each arm $i ∈ {1,…,n}$ returns reward

$$y ∼ P(y; θ_i)$$

Goal: Find a policy that $$max ∑_t=1^T y_t$$

UCB

past performance + exploration bonus

UCB1

Play each bandit once

Then play bandit that $$\Large arg\max_i \; \bar\mu_i + \sqrt{\frac{2ln n}{n_i}}$$

$\bar\mu_i$: mean reward of bandit $i$
$n$: total rounds played
$n_i$: rounds bandit $i$ was played

Demo

One Bandit per Feature

brand bandit
car body bandit
segment bandit

Ranking with Elasticsearch

Popularity Bias

Practical Remarks

Pythons all the way down ;D
sklearn
Flask REST API
Elasticsearch

Conclusion

Active Learning or: How I Learned to Stop Worrying and Love Small Data

Thanks!

Questions?

Stefan Otte

https://goo.gl/8JgirR

References

Active Learning Literature Survey
Finite-time Analysis of the Multiarmed Bandit Problem - Auer et al
Bandits, Global Optimization, Active Learning, and Bayesian RL – understanding the common ground - Toussaint video

Thanks!

Questions?

Stefan Otte

https://goo.gl/8JgirR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.org

index.org

On Bandits and Swipes – Gamification of Search

Active Learning or: How I Learned to Stop Worrying and Love Small Data

“Nothing Is Quite So Practical as a Good Theory”

Active Learning

${\huge \textbf{X} →} \begin{bmatrix} cat\ dog\ \vdots\ cat \end{bmatrix}$

${\huge \textbf{X} →} \begin{bmatrix} ?\ ?\ \vdots\ ? \end{bmatrix}$

What is Interesting?

What is Interesting?

Gamification of Search

Multi-Armed Bandits

Problem statement

UCB

UCB1

Demo

One Bandit per Feature

Ranking with Elasticsearch

Popularity Bias

Practical Remarks

Conclusion

Related Topics

Thanks!

References

Thanks!

Files

index.org

Latest commit

History

index.org

File metadata and controls

On Bandits and Swipes – Gamification of Search

Active Learning or: How I Learned to Stop Worrying and Love Small Data

“Nothing Is Quite So Practical as a Good Theory”

Active Learning

${\huge \textbf{X} →} \begin{bmatrix} cat\ dog\ \vdots\ cat \end{bmatrix}$

${\huge \textbf{X} →} \begin{bmatrix} ?\ ?\ \vdots\ ? \end{bmatrix}$

What is Interesting?

What is Interesting?

Gamification of Search

Multi-Armed Bandits

Problem statement

UCB

UCB1

Demo

One Bandit per Feature

Ranking with Elasticsearch

Popularity Bias

Practical Remarks

Conclusion

Related Topics

Thanks!

References

Thanks!