Pasturel_etal2019.tex

%!TeX TS-program = pdflatex
%!TeX encoding = UTF-8 Unicode
%!TeX spellcheck = en-US
%!BIB TS-program = bibtex
% -*- coding: UTF-8; -*-
% vim: set fenc=utf-8
%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\AuthorA}{Chlo\'e Pasturel}
\newcommand{\AuthorB}{Anna Montagnini}%
\newcommand{\AuthorC}{Laurent Perrinet}%
\newcommand{\Address}{Institut de Neurosciences de la Timone (UMR 7289), Aix Marseille Univ, CNRS - Marseille, France}%
\newcommand{\Website}{https://laurentperrinet.github.io}%
\newcommand{\EmailC}{Laurent.Perrinet@univ-amu.fr}%chloe.pasturel@univ-amu.fr
\newcommand{\Title}{
%Principles and psychophysics of Active Inference in anticipating a dynamic probabilistic bias
% Should I stay or should I go?
%Humans adapt their eye movements to the volatility of visual motion properties, and know about it
Humans adapt their anticipatory eye movements to the volatility of visual motion properties
%Anticipating a volatile probabilistic bias in visual motion direction
%Humans adapt to the volatility of visual motion properties :  eye movements and explicit guesses
}
\newcommand{\Acknowledgments}{This work was supported by EU Marie-Sklodowska-Curie Grant No 642961 (PACE-ITN) and by the Fondation pour le Recherche M\'edicale, under the program \textit{Equipe FRM} (DEQ20180339203/PredictEye/G Masson). Code and material on the \href{\Website/publication/pasturel-montagnini-perrinet-19}{corresponding author's website}. We thank Doctor Jean-Bernard Damasse, Guillaume S Masson and Professor Laurent Madelain for insightful discussions. }
\newcommand{\Abstract}{
Animal behavior must constantly adapt to changes, for example when the statistical properties of the environment change unexpectedly. For an agent that interacts with this volatile setting, it is important to react accurately and as quickly as possible. It has already been shown that when a random sequence of motion ramps of a visual target is biased to one direction (e.g. right or left), human observers adapt to accurately anticipate the expected direction with their eye movements. Here, we prove that this ability extends to a volatile environment where the probability bias could change at random switching times. In addition, we also recorded the explicit direction prediction reported by observers as given by a rating scale. Both results were compared to the estimates of a probabilistic agent that is optimal in relation to the event switching generating model. Compared to the classical leaky integrator model, we found a better match between our probabilistic agent and the behavioral responses, both for the anticipatory eye movements and the explicit task. Furthermore, by titrating the level of preference between exploration and exploitation in the model, we were able to fit each individual experimental data-set with different levels of estimated volatility and derive a common marker for the inter-individual variability of participants. These results prove that in such an unstable environment, human observers can still represent an internal belief about the environmental contingencies, and use this representation both for sensory-motor control and for explicit judgments. This work offers an innovative approach to more generically test the diversity of human cognitive abilities in uncertain and dynamic environments.}
\newcommand{\AuthorSummary}{
Understanding how humans adapt to changing environments to make judgments or plan motor responses based on time-varying sensory information is crucial for psychology, neuroscience and artificial intelligence. Current theories for how we deal with the environment's uncertainty most rely on the equilibrium behavior in response to the introduction of some randomness change. Here we show that in the more ecological case where the context switches at random times all along the experiment, an adaptation to this volatility can be performed online. In particular, we show in two behavioral experiments that humans can adapt to such volatility at the early sensorimotor level, through their anticipatory eye movements, but also at a higher cognitive level, through explicit ratings. Our results suggest that humans (and future artificial systems) can use much richer adaptive strategies than previously assumed.
}
\newcommand{\KeyWords}{eye movements, decision making, volatility, bayesian model; adaptation; perception}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\documentclass[profile,final,english,draft,24pt]{article}%
\documentclass[10pt,letterpaper]{article}
\usepackage[top=0.85in,left=2.75in,footskip=0.75in]{geometry}

% hello
% amsmath and amssymb packages, useful for mathematical formulas and symbols
\usepackage{amsmath,amssymb}

% Use adjustwidth environment to exceed column width (see example table in text)
\usepackage{changepage}

% Use Unicode characters when possible
\usepackage[utf8x]{inputenc}

% textcomp package and marvosym package for additional characters
\usepackage{textcomp,marvosym}

% cite package, to clean up citations in the main text. Do not remove.
\usepackage{cite}

% Use nameref to cite supporting information files (see Supporting Information section for more info)
\usepackage{nameref}

% line numbers
\usepackage[right]{lineno}

% ligatures disabled
\usepackage{microtype}
\DisableLigatures[f]{encoding = *, family = * }

% color can be used to apply background shading to table cells only
\usepackage[table]{xcolor}

% array package and thick rules for tables
\usepackage{array}

% create "+" rule type for thick vertical lines
\newcolumntype{+}{!{\vrule width 2pt}}

% create \thickcline for thick horizontal lines of variable length
\newlength\savedwidth
\newcommand\thickcline[1]{%
  \noalign{\global\savedwidth\arrayrulewidth\global\arrayrulewidth 2pt}%
  \cline{#1}%
  \noalign{\vskip\arrayrulewidth}%
  \noalign{\global\arrayrulewidth\savedwidth}%
}

% \thickhline command for thick horizontal lines that span the table
\newcommand\thickhline{\noalign{\global\savedwidth\arrayrulewidth\global\arrayrulewidth 2pt}%
\hline
\noalign{\global\arrayrulewidth\savedwidth}}


% Remove comment for double spacing
%\usepackage{setspace}
%\doublespacing

% Text layout
\raggedright
\setlength{\parindent}{0.5cm}
\textwidth 5.25in
\textheight 8.75in

% Bold the 'Figure #' in the caption and separate it from the title/caption with a period
% Captions will be left justified
\usepackage[aboveskip=1pt,labelfont=bf,labelsep=period,justification=raggedright,singlelinecheck=off]{caption}
\renewcommand{\figurename}{Fig}

% Use the PLoS provided BiBTeX style
\bibliographystyle{plos2015}

% Remove brackets from numbering in List of References
\makeatletter
\renewcommand{\@biblabel}[1]{\quad#1.}
\makeatother


% Header and Footer with logo
\usepackage{lastpage,fancyhdr,graphicx}
\usepackage{epstopdf}
%\pagestyle{myheadings}
\pagestyle{fancy}
\fancyhf{}
%\setlength{\headheight}{27.023pt}
%\lhead{\includegraphics[width=2.0in]{PLOS-submission.eps}}
\rfoot{\thepage/\pageref{LastPage}}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrule}{\hrule height 2pt \vspace{2mm}}
\fancyheadoffset[L]{2.25in}
\fancyfootoffset[L]{2.25in}
\lfoot{\today}

%% Include all macros below

\newcommand{\lorem}{{\bf LOREM}}
\newcommand{\ipsum}{{\bf IPSUM}}

%% END MACROS SECTION

%
%%\documentclass[12pt,english]{article}%
%\usepackage{fullpage}
%% https://www.overleaf.com/learn/latex/Page_size_and_margins
%%\usepackage[pass]{geometry}
%\usepackage{babel}
%\usepackage{csquotes}
%\usepackage{gensymb}
%% MATHS (AMS)
%\usepackage{amsmath}
%\usepackage{amsfonts}
%\usepackage{amssymb}
%\usepackage{amsthm}
\newcommand{\KL}[2]{\text{KL}( #1 | #2 )}
%% parenthesis
\newcommand{\pa}[1]{\left( #1 \right)}
\newcommand{\bpa}[1]{\big( #1 \big)}
\newcommand{\choice}[1]{ %
	\left\{ %
		\begin{array}{l} #1 \end{array} %
	\right. }
% ensembles
\newcommand{\ens}[1]{ \{ #1 \} }
\newcommand{\enscond}[2]{ \left\{ #1 \;;\; #2 \right\} }
% egal par définition
\newcommand{\eqdef}{\ensuremath{\stackrel{\mbox{\upshape\tiny def.}}{=}}}
\newcommand{\eqset}{\ensuremath{\stackrel{\mbox{\upshape\tiny set}}{=}}}
\newcommand{\eq}[1]{\begin{equation*}#1\end{equation*}}
\newcommand{\eql}[1]{\begin{equation}#1\end{equation}}
\newcommand{\eqs}[1]{\begin{align*}#1\end{align*}}
\newcommand{\eqa}[1]{\begin{align}#1\end{align}}

\DeclareMathOperator{\argmin}{argmin}
\DeclareMathOperator{\argmax}{argmax}
\newcommand{\uargmin}[1]{\underset{#1}{\argmin}\;}
\newcommand{\uargmax}[1]{\underset{#1}{\argmax}\;}
\newcommand{\umin}[1]{\underset{#1}{\min}\;}
\newcommand{\umax}[1]{\underset{#1}{\max}\;}
\newcommand{\usup}[1]{\underset{#1}{\sup}\;}
% for units
\usepackage{siunitx}%
\newcommand{\ms}{\si{\milli\second}}%

%% Symboles arrondis
\newcommand{\Aa}{\mathcal{A}}
\newcommand{\Bb}{\mathcal{B}}
\newcommand{\Cc}{\mathcal{C}}
\newcommand{\Dd}{\mathcal{D}}
\newcommand{\Ee}{\mathcal{E}}
\newcommand{\Ff}{\mathcal{F}}
\newcommand{\Gg}{\mathcal{G}}
\newcommand{\Hh}{\mathcal{H}}
\newcommand{\Ii}{\mathcal{I}}
\newcommand{\Jj}{\mathcal{J}}
\newcommand{\Kk}{\mathcal{K}}
\newcommand{\Ll}{\mathcal{L}}
\newcommand{\Mm}{\mathcal{M}}
\newcommand{\Nn}{\mathcal{N}}
\newcommand{\Oo}{\mathcal{O}}
\newcommand{\Pp}{\mathcal{P}}
\newcommand{\Qq}{\mathcal{Q}}
\newcommand{\Rr}{\mathcal{R}}
\newcommand{\Ss}{\mathcal{S}}
\newcommand{\Tt}{\mathcal{T}}
\newcommand{\Uu}{\mathcal{U}}
\newcommand{\Vv}{\mathcal{V}}
\newcommand{\Ww}{\mathcal{W}}
\newcommand{\Xx}{\mathcal{X}}
\newcommand{\Yy}{\mathcal{Y}}
\newcommand{\Zz}{\mathcal{Z}}
\usepackage{gensymb} % \degree
%% ========  polices de caracteres =============
\usepackage[T1]{fontenc}%
\usepackage{lmodern}%
\usepackage{t1enc}
\usepackage{ragged2e}
%============ graphics ===================
\usepackage{graphicx}%
\DeclareGraphicsExtensions{.pdf,.png,.jpg}%
%\graphicspath{{./2019_figures/}}% TODO remove {./figures/},  at the end
%============ bibliography ===================
%\usepackage[numbers,comma,sort&compress,round]{natbib} %
%\usepackage[
%%style=alphabetic-verb,
%style=plos2015, %authoryear-comp,
%%style=apa,
%maxcitenames=2,
%maxnames=2,
%%minnames=3,
%maxbibnames=99,
%giveninits=true,
%uniquename=init,
%url=false,
%isbn=false,
%eprint=false,
%texencoding=utf8,
%bibencoding=utf8,
%autocite=superscript,
%backend=bibtex,
%%sorting=none,
%sorting=nty,
%sortcites=false,
%%articletitle=false
%]{biblatex}%
%%\addbibresource{Pasturel_etal2018.bib}%
%\bibliography{Pasturel_etal2019.bib} % the ref.bib file
%\newcommand{\citep}[1]{\parencite{#1}}
%\newcommand{\citet}[1]{\textcite{#1}}
\newcommand{\citep}[1]{\cite{#1}}
\newcommand{\citet}[1]{\cite{#1}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% OPTIONAL MACRO FILES
\usepackage{tikz}
%\usepackage{tikz,tkz-euclide} \usetkzobj{all} % loading all objects
%\usetikzlibrary{positioning} \usetikzlibrary{calc}
%\usepackage{sfmath}
%\usepackage[noabbrev]{cleveref}
\newcommand{\seeFig}[1]{Figure~\ref{fig:#1}}
\newcommand{\seeEq}[1]{Equation~\ref{eq:#1}}
\newcommand{\seeApp}[1]{Appendix~\ref{app:#1}}
\newcommand{\seeSec}[1]{Section~\ref{sec:#1}}
%============ hyperref ===================
\usepackage[unicode,linkcolor=red,citecolor=red,filecolor=red,urlcolor=red,pdfborder={0 0 0}]{hyperref}%
%\hypersetup{%
%pdftitle={\Title},%
%pdfauthor={\AuthorA }%\ < \Email > \Address},%
%}%
\usepackage{color}%
\newcommand{\LP}[1]{\textbf{\textcolor{red}{[LP: #1]}}}
\newcommand{\AM}[1]{\textbf{\textcolor{blue}{[AM: #1]}}}
\newcommand{\CP}[1]{\textbf{\textcolor{green}{[CP: #1]}}}

\usepackage{listings}
\usepackage{color}

\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}
%============ code ===================
\usepackage{listings}
\lstset{frame=tb,
  language=Python,
  aboveskip=3mm,
  belowskip=3mm,
  showstringspaces=false,
  columns=flexible,
  basicstyle={\small\ttfamily},
  numbers=none,
  numberstyle=\tiny\color{gray},
  keywordstyle=\color{blue},
  commentstyle=\color{dkgreen},
  stringstyle=\color{mauve},
  breaklines=true,
  breakatwhitespace=true,
  tabsize=3
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\title{\Title}%
\author{\AuthorA,
\AuthorB,
\AuthorC\thanks{\Address} }

% GUIDELINES : https://submit.elifesciences.org/html/elife_author_instructions.html#initial

\usepackage{lineno}
\linenumbers

%%%%%%%%%%%% Her begynner selve dokumentet %%%%%%%%%%%%%%%
\begin{document}%
\maketitle%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%: Abstract
\section*{Abstract}
Animal behavior must constantly adapt to changes, for example when the statistical properties of the environment change unexpectedly. For an agent that interacts with this volatile setting, it is important to react accurately and as quickly as possible. It has already been shown that when a random sequence of motion ramps of a visual target is biased to one direction (e.g. right or left), human observers adapt to accurately anticipate the expected direction with their eye movements. Here, we prove that this ability extends to a volatile environment where the probability bias could change at random switching times. In addition, we also recorded the explicit direction prediction reported by observers as given by a rating scale. Both results were compared to the estimates of a probabilistic agent that is optimal in relation to the event switching generating model. Compared to the classical leaky integrator model, we found a better match between our probabilistic agent and the behavioral responses, both for the anticipatory eye movements and the explicit task. Furthermore, by titrating the level of preference between exploration and exploitation in the model, we were able to fit each individual experimental data-set with different levels of estimated volatility and derive a common marker for the inter-individual variability of participants. These results prove that in such an unstable environment, human observers can still represent an internal belief about the environmental contingencies, and use this representation both for sensory-motor control and for explicit judgments. This work offers an innovative approach to more generically test the diversity of human cognitive abilities in uncertain and dynamic environments.}
% Please keep the Author Summary between 150 and 200 words
% Use first person. PLOS ONE authors please skip this step.
% Author Summary not valid for PLOS ONE submissions.
\section*{Author summary}
Understanding how humans adapt to changing environments to make judgments or plan motor responses based on time-varying sensory information is crucial for psychology, neuroscience and artificial intelligence. Current theories for how we deal with the environment's uncertainty most rely on the equilibrium behavior in response to the introduction of some randomness change. Here we show that in the more ecological case where the context switches at random times all along the experiment, an adaptation to this volatility can be performed online. In particular, we show in two behavioral experiments that humans can adapt to such volatility at the early sensorimotor level, through their anticipatory eye movements, but also at a higher cognitive level, through explicit ratings. Our results suggest that humans (and future artificial systems) can use much richer adaptive strategies than previously assumed.%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Motivation}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\label{sec:intro}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Volatility of sensory contingencies and the adaptation of cognitive systems}
%: 1A : cognitive adaptation to volatility; general volatility and perceptual learning
%-------------------------------------------------------------%
% ------------------------------------------------------------------
% * the evolution  of prices on the stock market: Any Socio-economic contextual index may make the price evolve up or down, slowly or more Rapidly
% * ecological change
% * the side (left or right of the field) in which the ball is on a soccer field
% \LP{Anna, I found a better example which was less dramatic than ``
% Think for instance of the variability of environmental contingencies
% present in global climate and
% the probability of a change in its dynamic
% since the switch of civilization in an industrialized organization:
% We have access to (possibly noisy and heterogeneous) measurements
% of some (few) markers, such as carbon dioxide concentration and
% wish to predict the range of the mean temperature on Earth.
% This reveals some of the dynamics of
% the complex system constituted by the atmosphere
% and of acceptable levels of temperature for civilization.
% Based on past history and prior knowledge about
% the effect of our own emissions of carbon dioxide
% (for instance priming evidence of the link between carbon dioxide
% concentration and an elevation of temperature),
% one should be able to predict at best if either
% one could continue to exploit a similar strategy (emit gases)
% or if it is necessary to explore different paradigms (limit emissions).
% '' Hope you like it :-) }
% \AM{ I like the green example too: it only demands to define the variable that would correspond to the time series to be tracked: carbon dioxide measurements at a given critical location? Also, the fact that a switch detection may inform important decision is nice for the example but we do not really have an analogy to it in our task. Yet I am very happy with both examples, both very much on fashion, are we more ecology, or pro-vax activists?!}
We live in a fundamentally volatile world for which
our cognitive system has to constantly adapt.
In particular, this volatility may be generated
by processes with different time scales.
%Imagine for instance you are a general practitioner and that you usually report an average number of three person infected by measles per week. However, this rate is variable and over the past week you observe that the rate increased to ten cases. As such, two alternate interpretations are available: Either, there is an outbreak of measles and one should estimate its incidence (as measured as the rate of new cases) since an estimate of the outbreak’s onset and which defines a new infection rate of this outbreak but also an updated value of volatility (as given by the probability of a new outbreak) at a longer time scale. Alternatively, these cases are “unlucky” coincidences that originate from the variability of the process which drives patients to the doctor and which are in a shorter term an instance of values drawn from a stationary random process. In that option, it may be possible to readjust the estimated baseline rate of infection with this new data.  This example illustrates one fundamental problem with which our cognitive system is faced: when observing new sensory evidence, should I stay and continue to exploit this novel data with respect to my current beliefs about the environment’s state or should I go and explore a new hypothesis about the random process generating the observations since the detection of a switch in the environment?
Imagine for instance you are a general practitioner and
that you usually report an average number of
three persons infected by measles per week.
However, this rate is variable and
over the past week you observe that the rate increased to ten cases.
As such, two alternative interpretations are available:
the first possibility is that there is an outbreak of measles and
one should then estimate its incidence
(i.e. the rate of new cases)
since the inferred outbreak's onset, in order
to quantify the infection rate specific to this outbreak,
but also to update the value of the environmental volatility (as given by the probability of a new outbreak)
at a longer time scale.
Alternatively, these cases are
``unlucky'' coincidences that originate from the natural variability
of the underlying statistical process which drive patients to the doctor,
but which are instances drawn from a stationary random process.
In that option, it may be possible to readjust
the estimated baseline rate of infection with this new data.
This example illustrates one fundamental problem
with which our cognitive system is faced:
when observing new sensory evidence,
\emph{should I stay} and continue to exploit this novel data
with respect to my current beliefs about the environment's state
or \emph{should I go} and explore a new hypothesis
about the random process generating the observations
since the detection of a switch in the environment?

By definition, volatility measures the temporal variability
of the sufficient parameters of a random variable.
Such \emph{meta-analysis} of the environment's statistical properties
is an effective strategy at the large scale level of our example,
but also at all levels which are behaviorally relevant,
such as contextual changes in our everyday life.
Inferring near-future states in a dynamic environment,
such that one can prepare to act upon them
ahead of their occurrence~\citep{PerrinetAdamasFriston2014} ---
or at least forming beliefs as precise as possible
about a future environmental context ---
is an ubiquitous challenge for cognitive systems~\citep{Barack16}.
In the long term, how the human brain dynamically manages
this trade-off between exploitation and exploration
is essential to the adaptation
of the behavior through reinforcement learning~\citep{Cohen2007}.
In controlled experimental settings which challenge visual perception or sensorimotor associations,
such adaptive processes have been mostly put in evidence
by precisely analyzing the participants' behavior in a sequence of experimental trials,
typically highlighting sequential effects
at the time scale of several seconds to minutes
or even hours in the case of the adaptation to a persistent sensorimotor relation.
% TODO: talk about Gallistel / Sugrue / Brody

%: Past history of sensory event integration in vision
Indeed, stimulus history of sensory events influences
how the current stimulus is perceived~\citep{Sotiropoulos2011,Adams12,ChopinMamassian2012,FischerWhitney2014,Cicchini_PRSB_2018} and
acted upon~\citep{WallmanFuchs1998,Carpenter1995, Maus2015,Damasse18}.
Two qualitatively opposite effects of the stimulus history have been described:
negative (adaptation), and positive (priming-like) effects.
Adaptation reduces the sensitivity to recurrently presented stimuli,
thus yielding to a re-calibrated perceptual experience~\citep{Clifford2007, Webster2011, Kohn2007}. Examples of negative biases in perceptual discrimination are numerous (see for instance~\citep{KanaiVerstraten2005,ChopinMamassian2012}) and show that the visual system tends
to favor temporal and spatial stability of the stimulus.
On the other hand, priming is a facilitatory effect that
enhances the identification of repeated stimuli~\citep{Verstraten1994, Tiest2009}.
%\AM{I WOULD SKIP THIS This type of perceptual learning leads to improvements %in discrimination
%with long-term training on a perceptual judgment~\citep{Lu2009}.}
In sensorimotor control,
the same stimulus presented several times could indeed
lead to faster and more accurate responses and,
at the same time, lead to critically suboptimal behavior
when a presented stimulus is not coherent
with the participant's expectations~\citep{Hyman1953, Yu2009}. This process is highly dynamic especially in complex environments
where new contingencies can arise at every moment.
Interestingly, priming effects at cognitive levels
are sometimes paralleled by anticipatory motor responses which are positively correlated with the repetition of stimulus properties.
A well-known example of this behavior
are anticipatory smooth eye movements (aSPEM),
as we will illustrate in the next section.

%: Bayesian methods & role of predictive processing for this adaptive response
Overall, the ability to detect
statistical regularities in the event sequence appears as a fundamental ability
for the adaptive behavior of living species.
Importantly, few studies have addressed the question of whether
the estimate of such regularities is explicit,
and whether verbal reports of the dynamic statistical
estimates would eventually correlate to the measures of behavioral adaptation or priming.
Here we aim at investigating this question
in the specific case of the  processing of a target's motion direction.
In addition, we attempt to palliate to the lack of a solid modeling approach
to best understand the computation underlying behavioral adaptation to the environment's statistics,
and in particular how sequential effects are integrated
within a hierarchical statistical framework.
% TODO : clarify this paragraph
As such, Bayesian inference offers an effective methodology
to deal with this question.
In all generality, Bayesian methods allow to define and quantitatively assess
a range of hypotheses about the processing of (possibly noisy) information by some formal agents~\citep{Deneve1999, Diaconescu2014, Daunizeau10a}.
A key principle in the Bayesian inference approach is
to introduce so-called latent variables
which formalize how different hypotheses predict synthetic or experimental measurements.
Each stated hypothesis is quantitatively formalized
by defining a graph of probabilistic dependencies between specific variables
using a generative model for the prior knowledge about its structure.
In practice, the generative model is parameterized by structural variables
(such as weights or non-linear gain functions)
such that, knowing incoming measurements, beliefs about latent variables
may be represented as probabilities.
Then, using the rules of probability calculus
one can progressively update beliefs about the latent variables,
such that one can finally infer the hidden structure of received inputs~\citep{Hoyer2003, Ma2014}.
For instance, using Bayes's rule, one can combine
the likelihood of observations given the generative model and
the prior of these latent variables~\citep{Jaynes2014}.
Of particular interest for us is the possibility to
quantitatively represent in this kind of probabilistic model
the predictive and iterative nature of a sequence of events.
Indeed, once the belief about latent variables
is formed from the sensory input,
this belief can be used to update
the prior over future beliefs~\citep{Montagnini2007}.
In such models, the comparisons between expectations and actual data produces
constant updates to the estimates of the latent variables
but also on the validity of the model.
%Such a process is formalized in all generality
%within the~\textit{active inference} framework~\citep{Friston2003, Friston2010}.
%In summary, Active Inference allows to predict latent variables
%but also to understand longer time effects such as adaptation and learning.
%
There are numerous examples of Bayesian approaches
applied to the study of the adaptation to volatility.
For instance,~\citet{Meyniel16} simulated a hierarchical Bayesian model
over five previously published datasets~\citep{Squires1976, Huettel2002, Kolossa2013, Cho2002, Falk1997} in the domain of cognitive neuroscience.
Their main conclusion was that
learning the local transition probabilities
was sufficient to explain the large repertoire
of experimental effects reported in all these studies.
%
%As a consequence, Bayesian inference allows to compare the explanatory power
%of different models...
Here we focus on an extension of this approach to the study of motion processing and eye movements.
% ------------------------------------------------------------------
\subsection{Anticipatory Smooth Pursuit Eye Movements (aSPEM)}
% ------------------------------------------------------------------
%: 1B : particular case of aSPEM
%: adaptation to volatility in EMs : seen as an anticipation in SPEM - principle and function
Humans are able to accurately track a moving object
with a combination of saccades and
Smooth Pursuit Eye Movements (SPEM, for a review see~\citet{Krauzlis2008}).
These movements allow us to align and
stabilize the object on the fovea,
thus enabling high-resolution visual processing.
This process is delayed by different factors such as axonal transduction,
neural processing latencies and the inertia of the oculomotor system~\citep{Krauzlis89}.
When predictive information is available about target motion,
anticipatory SPEM (aSPEM) are
efficiently generated before the target's appearance~\citep{Westheimer1954, Kowler1979a, Kowler1979b} thereby reducing visuomotor latency.
Moreover, some experiments have demonstrated the existence
of prediction-based smooth pursuit during
the transient disappearance of a moving target~\citep{Badler2006,BeckerFuchs1985,OrbandeXivryMissalLefevre_JOV2012}.
Overall, although the initiation of SPEM is almost always driven by a visual motion signal, it is now clear that smooth pursuit behavior
can be modulated by extra-retinal, predictive information even in the absence of a direct visual stimulation.
The anticipatory smooth pursuit behavior is remarkable
in different aspects.
First, its buildup is relatively fast, such that only a few trials are sufficient
to pick up some regularity in the properties of visual motion, such as speed or direction~\citep{Kowler1984,Maus2015,Deravet_JOV2018}.
Second, it is in general an unconscious process
of which participants are not aware of.
As such, this behavior is potentially a useful marker
to study the internal representation of motion expectancy %(or Prior)
and in particular to analyze how sensorimotor expectancy
interacts dynamically with contextual contingencies in shaping oculomotor behavior.

%: linear relationship (talk about santos & kowler and others)
Typically, an aSPEM is observed after a temporal cue and
before target motion onset~\citep{Kowler1979a,Kowler1979b, Kowler1984}. %~(see \seeFig{intro}-A).
It is generally assumed that the role of aSPEMs is
to minimize as fast as possible the visual impairment due
to the amplitude of eye-to-target position and velocity mismatch.
Overall, anticipation can potentially reduce the typical sensorimotor delay
between target motion onset and foveation. In a previous study~\citep{Montagnini2010},
we have analyzed how forthcoming motion properties,
such as target speed or direction, can be
predicted and anticipated with coherently oriented eye movements. %~(see \seeFig{intro}-A).
It has been observed that the strength of anticipation,
as measured by the mean anticipatory eye velocity,
increases when the target repeatedly moves in the same direction~\citep{Kowler1984, Kowler1989, Heinen2005}.
We similarly found a graded effect of both the speed and the direction-bias
on the strength of aSPEM. % (see \seeFig{intro}-B).
In particular, this effect is linearly related
to the probability of motion's speed or direction. %~(see \seeFig{intro}-B).
These results are coherent within previous oculomotor findings
by our and also other groups~\citep{SantosKowler2017}.
These results imply that the probability bias over a target's direction is
one additional factor beyond other physical and cognitive cues~\citep{Kowler2014, SantosKowler2017,Damasse18}
that modulate the common predictive framework
driving anticipatory behavior.
%
%-------------------------------------------------------------%
%: FIGURE 1 fig:intro~\seeFig{intro}
\begin{figure}%[b!]
\centering{
\begin{tikzpicture}%[thick,scale=1, every node/.style={scale=1} ]
\node [anchor=north west]  (imgA) at (0.000\linewidth,.600\linewidth){\includegraphics[width=0.325
\linewidth]{2019_figures/1_A_Experiment_randomblock}};
\node [anchor=north west]  (imgB) at (0.335\linewidth,.595\linewidth){\includegraphics[width=0.350\linewidth]{2019_figures/1_B_protocol_recording}};
\node [anchor=north west]  (imgC) at (0.650\linewidth,.595\linewidth){\includegraphics[width=0.350\linewidth]{2019_figures/1_C_protocol_bet}};
\draw [anchor=north west] (0.000\linewidth, .62\linewidth) node {$\mathsf{(A)}$};
\draw [anchor=north west] (0.350\linewidth, .62\linewidth) node {$\mathsf{(B)}$};
\draw [anchor=north west] (0.665\linewidth, .62\linewidth) node {$\mathsf{(C)}$};
\end{tikzpicture}
}
\caption{
\textbf{Smooth pursuit eye movements and explicit direction predictions in a volatile switching environment}
\textit{(A)}~
We tested the capacity of human participants to adapt to a volatile environment
by using a simple, 3-layered generative model of fluctuations in target directions (TD)
that we call the Binary Switching Model (BSM).
This TD binary variable is chosen using a Bernoulli trial of a given probability bias.
This probability bias is constant for as many trials until a switch is generated.
At a switch, the bias is chosen at random from a given prior.
Switches are generated in the third layer as binary events drawn from a Bernoulli trial
with a given hazard rate (defined here as $1/40$ per trial).
\textit{(B)}~
The eye-movements task was an adapted version of a task developed by~\citet{Montagnini2010}.
Each one of $600$ trials consisted of sequentially:
a fixation dot (of random duration between $400$ and $800$~\ms),
a blank screen (of fixed duration of  $300$~\ms) and
a moving ring-shaped target (with $15~\degree/s$ velocity) which the observers were instructed to follow.
The direction of the target (right or left) was drawn pseudo-randomly
according to the generative model defined above.
\textit{(C)}~In order to titrate the adaptation
to the environmental volatility of target direction at the conscious level,
we invited each observer to perform on a different day a new variant of the direction-biased experiment,
where we asked participants to predict, before each trial, %the level of confidence for
their estimate of the forthcoming direction of the target.
As shown in this sample screenshot,
this was performed by moving a mouse cursor (black triangle) on a continuous rating scale
between ``sure left'', to ``unsure'' and finally ``sure right''.
}
\label{fig:intro}
\end{figure}
%-------------------------------------------------------------%
%: limits of the previous method
%In order to generalize such results to more ecological conditions,
%it is thus necessary to extend the experimental protocol of~\citet{Montagnini2010} in three aspects that will be illustrated in the next section.
% ------------------------------------------------------------------
\subsection{Contributions}%Outline}
% ------------------------------------------------------------------
%: 1C : what is novel in our work
% ------------------------------------------------------------------
%: 1Ca how we do it : or rather why we do it this way (and not like Matthys)
The goal of this study is to generalize the adaptive process
observed in the aSPEM response in previous studies~\citep{Montagnini2010,SantosKowler2017} to more ecological settings and
also to broaden its scope by showing that such adaptive processes
occur at the conscious level as well.
%The equations for this protocol will be detailed below~(\seeSec{Bayesian_change_point}).
We already mentioned that by manipulating the probability bias for target motion direction,
it is possible to modulate the direction and mean velocity of aSPEM.
This suggests that probabilistic information may be used
to inform the internal representation of motion prediction
for the initiation of anticipatory movements.
However, it is yet unclear what generative model to use
to dynamically manipulate the probability bias
and generate an ecologically relevant input sequence of target directions.
A possible confound comes from the fact that
previous studies have used trial sequences (\textit{blocks}) of fixed lengths,
stacked in a sequence of conditions defined by the different probability biases.
Indeed, observers may potentially pick up
the information on the fixed block's length
to predict the occurrence of a switch (a change in probability bias) during the experiment.
Second, we observed qualitatively that following a switch,
the strength of aSPEM changed gradually,
consistently with other adaptation paradigms~\citep{Fukushima1996,Kahlon1996,Souto13}.
The estimate of the characteristic temporal parameters for this  adaptation mechanism
may become particularly challenging in a dynamic context,
where the probabilistic contingencies vary in time in an unpredictable way.
Finally, whether and how the information processing underlying
the buildup of aSPEM and its dynamics is linked to
an explicit estimate of probabilities is still largely unknown.

%%%-------------------------------------------------------------%
%: 1Cb design of the binary switching generative model
To assess the dynamics of the adaptive processes
which compensate for the variability within sensory sequences,
one may generate random sequences of Target Directions (TDs)
using a dynamic value for the probability bias $p = \text{Pr}(\text{TD is 'right'})$,
with a parametric mechanism controlling for the volatility at each trial.
In the Hierarchical Gaussian Filter model~\citep{Mathys11}, for instance,
volatility is controlled as a non-linear transformation
of a random walk (modeled itself by a Brownian motion with a given diffusion coefficient).
Ultimately, this hierarchical model allows to generate a sequence of binary choices
where the variability fluctuates along a given trajectory.
Such a forward probabilistic model is invertible
using some simplifying assumptions and allows
to extract a time-varying inference of the agent's belief about volatility~\citep{Vossel14}.
Herein, to analyze the effect of history length in all generality,
we  extended the protocol of~\citet{Montagnini2010} such that the probability bias
is still fixed within blocks but that these blocks have variable lengths,
that is, by introducing switches occurring at random times.
Therefore, similarly to~\citet{Meyniel13}, we will use a model where
the bias $p$ in target direction varies according to a piecewise-constant function.
%We expect that within each sub-block, the uncertainty about of the value of $p$
%will progressively decrease as we accumulate samples.
In addition, in our previous study
the range of possible biases was finite.
In the present work, we extended the paradigm
by drawing $p$ as a continuous random variable
within the whole range of possible probability biases (that is, the segment $[ 0, 1 ]$).
As a summary, we first draw random events (that we denote as ``switches'')
with a given mean frequency and which controls the strength of the volatility.
Second, the value $p$ of the bias only changes at the moment of a switch,
independently of the previous bias' value
and is stationary between two switches, forming what we call an ``epoch''.
Third, target direction is drawn as a Bernoulli trial using the current value of $p$.
Such a hierarchical structure is presented in~\seeFig{intro}-A,
where we show the realization of the target's directions sequence,
the trajectory of the underlying probability bias (hidden to the observer), and
the occurrences of switches.

%: 1Cc  equations
Mathematically, this can be considered as a three-layered hierarchical model
defining the evolution of the model at each trial $t$ as the vector  $(x_2^t, x_1^t, x_0^t)$.
At the topmost layer,
the occurrence $x_2^t \in \{ 0, 1 \}$ of a switch ($1$ for true, $0$ for false)
is  drawn from a Bernoulli trial $\Bb$ parameterized by its frequency $h$, or \emph{hazard rate}.
The value of $\tau=\frac 1 h$ thus gives the average duration (in number of trials)
between the occurrence of two switches.
In the middle layer, the probability bias $p$ of target direction
is a random variable that we define as $x_1^t \in [0, 1]$.
It is chosen at random from a prior distribution $\Pp$
%(that will be described in more detail in the following sections)
at the moment of a switch,
and else it is constant until the next occurrence of a switch.
The prior distribution $\Pp$ can be for instance
the uniform distribution $\Uu$ on $ [ 0, 1 ] $ or
Jeffrey's prior $\Jj$~(see \seeApp{bcp}).
Finally, a target moves either to the left or to the right,
and we denote this variable (target direction, TD) as $x_0^t \in \{ 0, 1 \}$.
This direction is drawn from a Bernoulli trial
parameterized by the direction bias $p=x_1^t$.
In summary, this is  described according to the following equations:
%\begin{itemize}
%    \item Occurrence of a switch: $x_2^t \propto \Bb(h)$
%    \item Dynamics of probabilistic bias: \eql{\choice{\text{if} \quad x_2^t=0 \quad \text{then} \quad  x_1^t = x_1^{t-1} \\
%\text{else} \quad x_1^t \propto \Pp  }\label{eq:bsm}}
%    \item Sequence of directions:  $x_0^t \propto \Bb(x_1^t)$
%\end{itemize}
 \eql{\choice{
\text{Occurrence of a switch: } x_2^t \propto \Bb(1/\tau) \\
 % TODO: nest the choice
\text{Dynamics of probabilistic bias $p=x_1^t$: }
 \choice{\text{if} \quad x_2^t=0 \quad \text{then} \quad  x_1^t = x_1^{t-1} \\
 \text{else} \quad x_1^t \propto \Pp  \\
 } \\
\text{Sequence of directions: } x_0^t \propto \Bb(x_1^t)
 }\label{eq:bsm}}
In practice, we generated a sequence of $600$ trials,
and there is by construction a switch at $t=0$ (that is, $x_2^0=1$).
In addition, we imposed in our sequence that a switch
occurs after trial numbers $200$ and $400$,
in order to be able to compare adaptation properties
across different chunks of the trials sequence.
%The model generating the experimental sequence of trial directions, as well as the experimental protocol are illustrated in~\seeFig{intro}-A.
With such a three-layered structure, the model generates the randomized occurrence of switches,
itself generating epochs with constant direction probability %between two switches separated by a random length
%and chosen in the continuous range of possible biases' values,
and finally the random sequence of Target Direction (TD) occurrences at each trial.
To sum up, the system of three equations defined in~\seeEq{bsm}
defines the Binary Switching Model (BSM)
which we used for the generation of experimental sequences presented to human participants in the experiments.
We will use that generative model as the basis of an ideal observer model
inverting that model to predict probability biases from the observations (TDs) and
which we will test as a model for the adaptation of human behavior.

%: 1Cd outline
This paper is organized in five parts.
After this introduction where we presented the motivation for this study,
the next section~(\seeSec{Bayesian_change_point}) will present
an inversion of the BSM forward probabilistic model,
coined the Binary Bayesian Change Point (BBCP) model.
To our knowledge, such algorithm was not yet available, and
we will here provide with an exact analytical solution
by extending previous results from~\citet{AdamsMackay2007}
to the case of binary data as in the BSM presented above (see~\seeEq{bsm}).
In addition, the proposed algorithm is biologically realistic
as it uses simple computations and is \emph{online},
that is, that all computations on the sequence may be done
using solely a set of variables available at the present trial,
compactly representing all the sequence history seen in previous trials.
We will also provide a computational implementation
and a quantitative evaluation of this algorithm.
Then, we will present in~\seeSec{results_psycho} the analysis of experimental evidence
to validate the generalization of previous results %.
%In a first session, participants observe a target moving horizontally
%with constant speed from the center
%either to the right or left across trials
with this novel protocol. %~(see \seeFig{intro}-A \& B).
%The probability of either motion direction changes randomly in time.
In one session, participants were asked to estimate
``how much they are confident that
the target will move to the right or left in the next trial'' and
to adjust the cursor's position on the screen accordingly~(see \seeFig{intro}-C).
In the other experimental session on a different day,
we showed the same sequence of target directions and
recorded participants' eye movements~(see \seeFig{intro}-B).
Indeed, in order to understand the nature of
the representation of motion regularities underlying this adaptive behavior,
it is crucial to collect both
the recording of eye movements
and the verbal explicit judgments about expectations on motion direction.
%In such an explicit judgment task, we evaluated for each participant their confidence for the next trial direction
%(\emph{prior} to the appearance of the target).
%on a rating scale
%between "sure left", to "unsure" and finally "sure right"~(see \seeFig{intro}-C).
%These results will be compared to the results for eye movements.
Another novelty of our approach is to use that agent as a regressor
which will allow us to match experimental results with the BBCP
and to compare its predictive power compared to classical models such as the leaky integrator model.
Hence, we will show that behavioral results match well
with the BBCP model.
In~\seeSec{inter}, we will synthesize these results
by inferring the volatility parameters inherent to the models
by best-fitting it to each each individual participant.
This will allow the analysis of inter-individual behavioral responses for each session.
In particular, we will test if one could predict observers' prior (preferred) volatility,
that is, a measure of the dynamic compromise between exploration (``should I go?'')
and exploitation (``should I stay?'')
across the two different sessions challenging predictive adaptive processes
at the unconscious and conscious levels.
Finally, we will summarize and conclude this study and
offer some perspectives for future work in~\seeSec{outro}.
%
%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Results: Binary Bayesian Change Point (BBCP) detection model}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\label{sec:Bayesian_change_point}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%: 2 short intro
%
As we saw above, Bayesian methods provide a powerful framework for studying human behavior and adaptive  processes in particular.
For instance,~\citet{Mathys11} first defined a multi-layered generative model for
sequences of input stimuli.
By inverting this stochastic forward process,
they could extract relevant descriptors at the different levels of the model
and fit these parameters with the recorded behavior.
Here, we use a similar approach, focusing specifically on the BSM generative model,
as defined in~\seeEq{bsm}.
To begin, we define a first ideal observer as a control, the \textit{leaky integrator} (or \textit{forgetful agent}),
which has an exponentially-decaying memory for the events that occurred in the past trials.
This agent can equivalently be described as one
which assumes that volatility is stationary with a fixed characteristic frequency of switches.
Then, we will extend this model to an agent
which assumes the existence of (randomly occurring) switches, that is,
that the value of the probabilistic bias may change
at specific (yet randomly drawn) trials,
as defined by the forward probabilistic model in~\seeEq{bsm}.
%
% ------------------------------------------------------------------
\subsection{Forgetful agent model (Leaky integrator)}%
% ------------------------------------------------------------------
%: 2Aa justification from previous studies
The leaky integrator ideal observer represents a classical, widespread and
realistic model of how trial-history shapes
adaptive processes in human behavior.
It is also well adapted to model motion expectation in the direction-biased experiment which leads to anticipatory SPEMs.
In this model, given the sequence of observations $x_0^t$ from trial $0$ to $t$,
the expectation $p=\hat{x_1}^{t+1}$ of the probability for the next trial direction can be modeled by making a simple heuristic:
This probability for an event is the weighted average of
the previously estimated probability, $\hat{x_1}^{t}$, with the new information $x_0^t$,
where the weight corresponds to a leak term (or discount)
by a factor $(1 - h)$, with $h \in [0, 1]$~\citep{Anderson2006}.
At trial $t$, this model can be expressed with the following equation:
\eql{
\hat{x_1}^{t+1} = (1 - h) \cdot \hat{x_1}^{t} + h \cdot x_0^t
\label{eq:leaky}}
where $\hat{x_1}^{t=0}$ is equal to some prior value ($0.5$ in the unbiased case),
corresponding to the best guess at $t=0$ (prior to the observation of any data).
% NOTE: it's an AR(1) process https://stats.stackexchange.com/questions/358162/writing-ar1-as-a-ma-infty-process

%: from heuristics to ideal observer
In other words, the estimated probability $\hat{x_1}^{t+1}$ is computed
from the integration of previous instances
with a progressive discount of past information.
The value of the scalar $h$ represents
a compromise between responding rapidly
to changes in the environment ($h \approx 1$) and
not prematurely discarding information still of value
for slowly changing contexts  ($h \approx 0$).
As such, we will call this scalar the hazard rate.
Similarly, one can define $\tau = 1 / h$ as
a characteristic time (in units of number of trials)
for the integration of information.
Looking more closely at this expression,
the ``forgetful agent'' computed in \seeEq{leaky}
consists of an exponentially-weighted moving average (see \seeApp{leaky}).
It may thus be equivalently written in the form of a time-weighted average:
\eql{
\hat{x_1}^{t+1} = (1-h)^{t+1} \cdot \hat{x_1}^{t=0} + h \cdot \sum_{0\leq i \leq t} (1 - h)^{i} \cdot x_0^{t-i}
\label{eq:leaky2}}
The first term corresponds to the discounted effect of the prior value before any observation and it tends to $0$ when $t$ increases.
More importantly, as $1-h < 1$, the second term corresponds to the \emph{leaky} integration of novel observations.
Inversely, let us now assume that
the true probability bias for direction changes randomly with a mean rate of once
every $\tau$ trials.
As a consequence, the probability that the bias does not change is $Pr(x_2^t=0)=1-h$ at each trial.
Assuming independence of these occurrences, the estimated probability $p=\hat{x_1}^{t+1}$ is thus proportional to the sum
of the past observations weighted by the belief that the bias has not changed during $i$ trials in the past, that is, exactly as defined by the second term of the right-hand side in~\seeEq{leaky2}.
This shows that
assuming that changes occur at a constant rate ($\hat{x_2}^t=h$)
but ignoring the variability in the temporal occurrence of the switch,
the optimal solution to this inference problem is the
ideal observer defined in~\seeEq{leaky2},
which finds an online recursive solution in~\seeEq{leaky}.
We therefore proved here that the heuristic derived from~\citet{Anderson2006}
is an ideal inversion of the two-layered generative model
which assumes a constant hazard rate for the probability bias.

%: 2Ac  using \hat{p} as a regressor & limits of the leaky integrator
The correspondence that we proved between the weighted moving average heuristic
and the forgetful agent model as an ideal solution to that generative model leads
us to several interim conclusions.
First, the time series of inferred $\hat{x_1}^{t+1}$ values
can serve as a regressor for behavioral data
to test whether human observers follow a similar strategy.
In particular, the free parameter $h$
may be fitted to variations of the behavioral data across the sequence,
which itself is assumed to depend on the agents' belief in the weight decay.
%for instance to the data shown in~\seeFig{results_intro}.
Now, since we have defined a first generative model
and the corresponding ideal observer (the forgetful agent),
we next define a more complex model,
in order to overcome some of the limits of the leaky integrator.
Indeed, a first criticism could be that
this model is too rigid and does not sufficiently
account for the dynamics of contextual changes~\citep{Behrens07}
as the weight decay corresponds to assume \emph{a priori} a constant precision in the data sequence, contrary to more elaborate Bayesian models~\citep{Vilares2011}.
It seems plausible that the memory size (or history length) used by the brain
to infer any event probability can vary, and that this variation could be related
to the environmental volatility inferred from past data.
The model presented in~\seeEq{leaky2} uses a constant weight
(decaying with the distance to the current trial)
for all trials, while the actual precision of each trial
can be potentially evaluated and used
for precision-weighted estimation of the probability bias.
To address this hypothesis, our next model is inspired
by the Bayesian Change-point detection model~\citep{AdamsMackay2007}
of an ideal agent inferring
both the trajectory in time of the probability bias ($x_1^t$)
but also of the probability $Pr(x_2^t=1)$ of the occurrence of switches.
% ------------------------------------------------------------------
\subsection{Binary Bayesian Change Point (BBCP) detection model}
% ------------------------------------------------------------------
\label{sec:Binary_Bayesian_change_point}
%-------------------------------------------------------------%
%: FIGURE 3 fig:Bayesianchangepoint \seeFig{Bayesianchangepoint}
\begin{figure}%[b!]
% cf 3_Results_2.ipynb
\centering{
\begin{tikzpicture}[thick,scale=.95]
\node [anchor=north west]  (imgA) at (0.\linewidth,.55\linewidth){\includegraphics[width=0.33
\linewidth]{2019_figures/3_BCP_model}};
\node [anchor=north west]  (imgB) at (0.36\linewidth,.580\linewidth){\includegraphics[width=0.64\linewidth]{2019_figures/3_BCP_readouts}};
\draw [anchor=north west] (0.000\linewidth, .62\linewidth) node {$\mathsf{(A)}$};
\draw [anchor=north west] (0.382\linewidth, .62\linewidth) node {$\mathsf{(B)}$};
\end{tikzpicture}
}
\caption{\textbf{Binary Bayesian Change Point (BBCP) detection model.} ~\textit{(A)} This plot shows a synthesized sequence of $13$ events,
%each of them corresponding to a binary choice,
either a leftward or rightward movement of the target (TD).
Run-length estimates are expressed as hypotheses about the length of a sub-block over which the probability bias was constant,
that is, the number of trials since the last switch.
Here, the true probability bias switched from a value of $.5$ to $.9$ at trial $7$,
as can be seen by the trajectory of the true run-length (blue line).
The BBCP model tries to capture the occurrences of a switch
by inferring the probability of different possible run lengths.
At any new datum (trial), this defines a Hidden Markov Model
as a graph (treillis), where % of possible run lengths.
edges indicate that a message is being passed
to update each node's probability (as represented by arrows from trial $13$ to $14$).
Black lines denote a progression of the run length at the next step (no switch),
while gray lines stand for the possibility that a switch happened:
In this case the run length would fall back to zero.
The probability for each node is represented by the grey scale (darker grey colors denote higher probability)
and the distribution is shown in the inset for two representative trials: $5$ and $11$.
Overall, this graph shows how the model integrates information to accurately identify a switch
and produce a prediction for the next trial (e.g. for $t=14$).
%The black \CP{Bleu} [and green] \CP{plus de courbe verte} curve respectively represent
%the actual [and inferred] run length of the simulated data
%as a function of trial number.
%In this instance, the inferred switch is delayed
%by one trial with respect to the true switch.
%\CP{representation des valeurs du run length en haut a gauche pour les essais 5 (en gris) et 8 (en bleu), on peut voir que pour l'essais 5 et 8 la probabilit\'e qu'il n'y est pas eu de switch depuis le d\'ebut est la plus importante (peut \^etre prendre un autre essais plus parlant - trial 10: proba switch à l'essais 5 est plus importante (notebook3) ?)}
~\textit{(B)} On a longer sequence of $200$ trials,
representative of a sub-block of our experimental sequence (see~\seeFig{intro}-A), % and~\seeFig{results_raw}),
we show %in the top plot
the actual events which are observed by the agent (TD),
along with the (hidden) dynamics of the true probability bias $P_{\text{true}}$ (blue line),
the value inferred by a leaky integrator ($P_{\text{leaky}}$, orange line)
and the results of the BBCP model
in estimating the probability bias $P_{\text{BBCP}}$ (green line),
along with $.05$ and $.95$ quantiles (shaded area).
This shows that for the BBCP model,
the accuracy of the estimated value of the probability bias
is higher than for the leaky integrator.
Below we show the belief (as grayscales) for the different possible run lengths.
%as a function of the trial number.
%A darker color denotes a higher probability.
The green and orange line correspond to the mean run-length which is inferred,
respectively, by the BBCP and leaky models:
Note that in the BBCP, while it takes some trials to detect switches,
they are in general correctly identified (transitions between diagonal lines) and
that integration is thus faster than for the leaky integrator,
as illustrated by the inferred value of the probability bias.
}
\label{fig:Bayesianchangepoint}
\end{figure}
%-------------------------------------------------------------%
%-------------------------------------------------------------%
%: 2Ba precision in our belief of \hat{p}
%-------------------------------------------------------------%
There is a crucial difference between the forgetful agent presented above
%\AM{WE CAN SKIP THIS:which believes that changes occur at a constant rate ($\hat{x_2}^t=h$, see~\seeEq{leaky2})}
and an ideal agent which would invert the Binary Switching Model (BSM, see~\seeEq{bsm}).
Indeed, at any trial during the experiment,
the agent may infer beliefs about the probability of the volatility $x_2^t$
which itself is driving the trajectory of the probability bias $x_1^t$.
Knowing that the latter is piece-wise constant,
an agent may have a belief over the number of trials since the last switch.
This number, that is called the \emph{run-length} $r^t$, is useful in two manners.
First, it allows the agent to restrict the prediction $\hat{x_1}^{t+1}$ of $x_1^{t+1}$
only based on those samples produced since the last switch, from $t-r^t$ until $t$.
% and which we denote as $x_0^{(r^t)}=x_0^{r^t:t}$ .
Indeed, the samples $x_0^t$ which occurred before the last switch
were drawn independently from the present true value $x_1^t$
and thus cannot help estimating the latter.
Second, it is known that for this estimate, the precision
(the inverse of variance) on the estimate $\hat{x_1}^{t+1}$
grows linearly with the number of samples:
The longer the run-length, the sharper the corresponding (probabilistic) belief.
We have designed an agent inverting the BSM by extending
the Bayesian Change-Point (BCP) detection model~\citep{AdamsMackay2007}.
The latter model defines the agent as an inversion of a switching generative model
for which the observed data (input) is Gaussian.
We present here an exact solution for the case of the BSM, where the input is binary. % as in the BSM (see~\seeEq{bsm}).

%-------------------------------------------------------------%
%: 2Bb prediction / update cycle
%-------------------------------------------------------------%
In order to define in all generality the switch detection model,
we will initially describe the fundamental steps leading to its construction,
while providing the full algorithmic details in~\seeApp{bcp}.
%  by~\seeEq{run_length} (more details on this derivation in~\seeApp{bcp})
The goal of predictive processing
is to infer the probability $Pr(x_0^{t+1} | x_0^{0:t})$ of the next datum
knowing what has been observed until trial $t$
(that we denote by $x_0^{0:t} = \{ x_0^0, \ldots, x_0^t \}$), as well the agent's prior knowledge
that data is the output of a given (stochastic) generative model (here, the BSM).
To derive a Bayesian predictive model, we introduce
the run-length as a latent variable which gives to the agent the possibility to represent
different parallel hypotheses about the input.
We therefore draw a computational graph (see \seeFig{Bayesianchangepoint}-A) where, at any trial,
an hypothesis is formed on as many ``nodes'' than there are run-lengths
(and limited for instance by the total number of trials).
%
%This is for instance the task that we defined
%for the bet experiment, where each participant
%was asked to report their level of confidence for the next outcome
%for the next trial.
%
%Using the run length $r^t$ as a latent variable at each trial $t$,
%we define the distribution which represents our belief
%for all different hypotheses on $r^t$.
%which
As a readout, we can use this knowledge of the predictive probability conditioned on the run-length,
such that one can compute the marginal predictive distribution:
\eql{
Pr(x_0^{t+1} | x_0^{0:t}) =
%\sum_{r^{t}} \beta^{(r)}_t \cdot Pr(x_0^t | r^{t}, x_0^{0:t-1})
%\sum_{r^{t}} Pr(x_0^{t+1} | r^{t}, x_0^{0:t}) \cdot Pr(r^t | x_0^{0:t})
%\sum_{r^{t}} Pr(x_0^{t+1} | x_0^{(r^{t})}) \cdot \beta^{(r)}_t
\sum_{r^{t}\geq 0} Pr(x_0^{t+1} | r^{t}, x_0^{0:t}) \cdot \beta^{(r)}_t
\label{eq:pred}
}
where $Pr(x_0^{t+1} | r^{t}, x_0^{0:t})$ is
the Bernoulli trial modeling the probability of a future datum $x_0^{t+1}$
conditioned on the run-length and
$\beta^{(r)}_t=Pr(r^t | x_0^{0:t})$ is the probability for each possible run-length given the observed data.
Note that $\beta^{(r)}_t$ is scaled such that $\sum_{r \geq 0 } \beta^{(r)}_t = 1$.
Indeed, we know that, at any trial, there is a single true value for this variable $r^{t}$
and that $\beta^{(r)}_t$ thus represents the agent's inferred probability distribution over the run-length $r$. % (given the data).
%a previous value $Pr(x_0^{t} | r^{t-1}, x_0^{0:t-1})$
%and the likelihood associated to the new value $x_0^t$.
%-------------------------------------------------------------%
%: 2Bc prediction cycle
%-------------------------------------------------------------%
%-------------------------------------------------------------%
% Computing sufficient stitistics
%-------------------------------------------------------------%

With these premises, we define the BBCP
as a prediction / update cycle
which connects nodes from the previous trial to that at the current trial.
Indeed, we will \emph{predict} the probability
$\beta^{(r)}_t$ at each node, knowing either an initial prior, or its value on a previous trial.
In particular, at the occurrence of the first trial, we know for certain that there is a switch and
initial beliefs are thus set to the values $\beta^{(0)}_0=Pr(r^t=0)=1$ and
$\forall r>0$, $\beta^{(r)}_0=Pr(r^0=r)=0$.
Then, at any trial $t>0$, as we observe a new datum $x_0^t$,
we use a knowledge of $\beta^{(r)}_{t-1}$ at trial $t-1$,
the likelihood $\pi^{(r)}_{t}=Pr(x_0^{t} | r^{t-1}, x_0^{0:t-1})$  and
the transition probabilities defined by the generative model
to predict the beliefs over all nodes: %

\eqa{
\beta^{(r)}_t \propto \sum_{r^{t-1} \geq 0} \beta^{(r)}_{t-1} \cdot Pr(r^t | r^{t-1}) \cdot  \pi^{(r)}_{t}
\label{eq:pred_node}
}
In the computational graph, % (\seeFig{Bayesianchangepoint}-A),
\seeEq{pred_node} corresponds to a message passing from the nodes at time $t-1$
to that at time $t$. % and formalized by the transition matrix $Pr(r^t | r^{t-1})$.
We will now detail how to compute the transition probabilities and the likelihood.

%In the second step, one can perform prediction
%using the graph defined in \seeFig{Bayesianchangepoint}-A.
%Now that we have the vector of likelihoods $\pi^{(r)}_t=\Ll(x_0^t |  \mu^{(r)}_{t}, \nu^{(r)}_{t})$,
%one can update probabilities and perform the next prediction for trial $t+1$.
First, knowing that the data is generated by a switching model such as the BSM (see~\seeEq{bsm}),
the run-length is either null at the moment of a switch,
or its length (in number of trials) is incremented by $1$ if no switch occurred:

\eql{\choice{
\text{if} \quad x_2^t=1 \text{,} \quad r^t = 0\\
\text{if} \quad x_2^t=0 \text{,} \quad r^t = r^{t-1} +1 }\label{eq:run_length}}%see~\seeEq{run_length}
%\text{and else} \quad r^t = r^{t-1} +1 }\label{eq:run_length}}%see~\seeEq{run_length}
This may be illustrated by a graph
in which information will be represented at the different nodes for each trial $t$.
%In a switching model like the BSM, the transition matrix % defined by the graph,
This defines the transition matrix $Pr(r^t | r^{t-1})$
as a partition in two exclusive possibilities:
Either there was a switch or not.
It allows us to compute the \emph{growth probability} for each run-length. % $r \geq 0$.
On the one hand, the belief of an increment of the run-length at the next trial is: %, before observing a new datum:

\eqa{
\beta^{(r+1)}_t = \frac{1}{B} \cdot \beta^{(r)}_{t-1} \cdot \pi^{(r)}_{t} \cdot (1-h)
\label{eq:beta_noswitch}
}
where $h$ is the scalar defining the hazard rate.
On the other hand, it also allows to express the change-point probability as:

\eqa{
\beta^{(0)}_t  = \frac{1}{B} \cdot \sum_{r \geq 0} \beta^{(r)}_{t-1} \cdot \pi^{(r)}_{t} \cdot h
\label{eq:beta_switch}
}
with $B$ such that $\sum_{r \geq 0} \beta^{(r)}_{t} = 1$.
Note that $\beta^{(0)}_t=h$ and thus $B=\sum_{r \geq 0} \beta^{(r)}_{t-1} \cdot \pi^{(r)}_{t}$.
%This finalizes the prediction step.
Knowing this probability strength and the previous value of the prediction, % an estimate for our belief on the different variables at the previous trial $t-1$,
we can therefore make a prediction for our belief of the probability bias at the next trial $t+1$,
prior to the observation of a new datum $x_0^{t+1}$ and resume the prediction / update cycle (see Equations~\ref{eq:pred},~\ref{eq:beta_noswitch} and~\ref{eq:beta_switch}).

%-------------------------------------------------------------%
%: 2Bd update cycle
%-------------------------------------------------------------%
Integrated in our cycle, we \emph{update} beliefs on all nodes
by computing the likelihood $\pi^{(r)}_t$ of the current datum $x_0^{t}$
knowing the current belief at each node,
that is, based on observations from trials $0$ to $t-1$. %~\seeEq{pred}
A major algorithmic difference with the BCP model~\citep{AdamsMackay2007},
is that here the observed data is a Bernoulli trial and not a Gaussian random variable.
The random variable $x_1^t$ is the probability bias used
to generate the sequence of events $x_0^t$.
We will infer it for all different hypotheses on $r^t$,
that is, knowing there was a sequence of $r^t$ Bernoulli trials
with a fixed probability bias in that epoch.
Such an hypothesis will allow us to compute the distribution
$Pr(x_0^{t+1} | r^{t}, x_0^{0:t})$
by a simple parameterization.
Mathematically, a belief on the random variable $x_1^t$ is represented
by the conjugate probability distribution of the binomial distribution,
that is, by the beta-distribution $B(x_1^t; \mu^{(r)}_{t}, \nu^{(r)}_{t})$.
It is parameterized here by its sufficient statistics,
the mean $\mu^{(r)}_{t}$ and sample size $\nu^{(r)}_{t}$ % which in our a case is the run-length $r^t$
(see~\seeApp{beta} for our choice of parameterization).
First, at the occurrence of a switch (for the node $r^t=0$)
beliefs are set to prior values (before observing any datum)
$\mu^{(0)}_{t} = \mu_{prior}$ and $\nu^{(0)}_{t} = \nu_{prior}$.
By recurrence %(see \seeFig{Bayesianchangepoint}-A),
one can show that at any trial $t>0$,
the sufficient statistics $(\mu^{(r)}_{t}, \nu^{(r)}_{t})$
can be updated from the previous trial following:
\eql{
\nu^{(r+1)}_{t} = \nu^{(r)}_{t-1} + 1
\label{eq:update_nu}
}
As a consequence, $\forall r, t; \nu^{(r)}_{t}$ is the sample size corrected by the initial condition.
$\nu^{(r)}_{t} = r + \nu_{prior}$. For the mean, the series defined by $\mu^{(r+1)}_{t}$ is the average at trial $t$ over the $r+1$ last samples, which can also be written in a recursive fashion:
\eql{
%\mu^{(r+1)}_{t} = \frac{\nu^{(r)}_{t-1}}{\nu^{(r+1)}_{t}} \cdot \mu^{(r)}_{t-1} + \frac{1}{\nu^{(r+1)}_{t}} \cdot x_0^{t}
%\nu^{(r+1)}_{t} \cdot \mu^{(r+1)}_{t} = \nu^{(r)}_{t-1} \cdot \mu^{(r)}_{t-1} +  x_0^{t}
\mu^{(r+1)}_{t} = \frac{1}{\nu^{(r+1)}_{t}} \cdot (\nu^{(r)}_{t-1} \cdot \mu^{(r)}_{t-1} +  x_0^{t})
%\mu^{(r+1)}_{t} = (1 - \frac{1}{r + 1 + \nu_{prior}}) \cdot \mu^{(r)}_{t-1} + \frac{1}{r + 1 + \nu_{prior}} \cdot x_0^{t}
\label{eq:update_mu}
}
This updates for each node the sufficient statistics of the probability density function at the current trial.
%-------------------------------------------------------------%
% Computing the likelihood
% cf p.52 de 2017-10-05 chloe inverting the process rem jb
% https://en.wikipedia.org/wiki/Beta_distribution#Effect_of_different_prior_probability_choices_on_the_posterior_beta_distribution
%-------------------------------------------------------------%
We can now detail the computation of the likelihood of the current datum $x_0^{t}$ with respect to
the current beliefs : $\pi^{(r)}_t = Pr( x_0^{t} |  \mu^{(r)}_{t-1}, \nu^{(r)}_{t-1})$. %}
This scalar is returned by the binary function
$\Ll(r | o)$ which evaluates at each node $r$ the likelihood of the parameters of each node
whenever we observe a counterfactual alternative outcome $o=1$ or $o=0$
knowing a mean bias $p=\mu^{(r)}_{t-1}$
and a sample size $r=\nu^{(r)}_{t-1}$.
For each outcome, the likelihood of observing an occurrence of $o$,
is the probability of a binomial random variable knowing
an updated probability bias of $\frac{p \cdot r + o}{r+1}$,
a number $p \cdot r + o$ of trials going to the right and
a number $(1-p) \cdot r + 1 - o$ of trials to the left.
After some algebra, this defines the likelihood as :
\eql{
\Ll(r | o) = \frac{1}{Z} \cdot {(p \cdot r + o)}^{p \cdot r + o} \cdot {((1- p)\cdot r + 1- o)}^{(1- p)\cdot r + 1- o}
\label{eq:likelihood}
}
with $Z$ such that $\Ll(r | o=1) + \Ll(r | o=0)=1$. % where
%\eq{
%Z = {(\mu \cdot \nu + 1)}^{\mu \cdot \nu + 1}  \cdot {((1- \mu)\cdot \nu )}^{(1- \mu)\cdot \nu }  +
%    {(\mu \cdot \nu )}^{\mu \cdot \nu }  \cdot {((1- \mu)\cdot \nu + 1)}^{(1- \mu)\cdot \nu + 1}
%}
The full derivation of this function is detailed in~\seeApp{likelihood}.
This provides us with the likelihood function
and finally the scalar value $\pi^{(r)}_t = \Ll(r | x_0^{t})$.

%-------------------------------------------------------------%
%: 2Be online estimation: prediction
%-------------------------------------------------------------%
Finally, the agent infers at each trial the belief and parameters at each node
%and use for instance the maximum a posteriori or the expected value as readouts.
and uses the marginal predictive probability (see~\seeEq{pred}) as a readout.
%More precisely, we define these two strategies as following.
%For the maximum a posteriori readout
%at each trial the run-length with maximal probability is selected
%and then the predicted probability bias is estimated
%as the probability bias for that run-length:
%\eql{
%\hat{x_1}^t = \mu^{(r^\ast)}_{t} \quad \text{with} \quad r^\ast = \argmax_r \beta^{(r)}_{t}
%}
This probability bias is best estimated by its expected value $\hat{x_1}^{t+1}=Pr(x_0^{t+1} | x_0^{0:t})$
as it is marginalized over all run-lengths:
\eql{
\hat{x_1}^{t+1} = \sum_{r \geq 0} \mu^{(r)}_{t} \cdot \beta^{(r)}_{t}
\label{eq:readout}
}
Interestingly, it can be proven that if,
instead of updating beliefs with Equations~\ref{eq:beta_noswitch} and~\ref{eq:beta_switch},
we set nodes' beliefs to the constant vector $\beta^{(r)}_t = h \cdot (1 -h) ^r$,
then the marginal probability is equal to that obtained with the leaky integrator (see~\seeEq{leaky}).
This highlights again that, contrary to the leaky integrator, % for which the inference $\hat{x_2}=h$ was fixed,
the BBCP model uses a dynamical model for the estimation of the volatility.
Still, as for the latter, there is only one parameter~$h=\frac 1 \tau$ which informs the BBCP model
that the probability bias switches \emph{on average} every~$\tau$ trials.
%As in \seeEq{leaky}, this defines the \emph{hazard rate}.
Moreover, note that the resulting operations
(see Equations~\ref{eq:pred},~\ref{eq:beta_noswitch},~\ref{eq:beta_switch},~\ref{eq:likelihood} and~\ref{eq:readout})
which constitute the BBCP algorithm
can be implemented \textit{online}, that is,
only the state at trial $t$ and the new datum $x_0^t$
are sufficient to predict all probabilities for the next trial.
%at the next trial $t+1$: $Pr(r_t | x_0^{0:t})$, $\mu^{(r)}_{t+1}$ and $\nu^{(r)}_{t+1}$.
%projecting beliefs backwards and forward  : the algorithm is online et the price of memory
Finally, this prediction/update cycle applied to the BSM and using~\seeEq{bsm}
constitutes the Binary Bayesian Change Point (BBCP) detection model.

% ------------------------------------------------------------------
\subsection{Quantitative analysis of the Binary Bayesian Change Point detection algorithm}
% ------------------------------------------------------------------
%-------------------------------------------------------------%
%: 2Ca python scripts : qualitative analysis
%-------------------------------------------------------------%
We have implemented
the BBCP algorithm % is detailed in~\seeApp{bcp} and
using a set of Python scripts.
This implementation provides also some control scripts
to test the behavior of the algorithm with synthetic data.
Indeed, this allows to qualitatively and quantitatively assess
this ideal observer model against a ground truth before applying it
on the trial sequence that was used for the experiments and
ultimately comparing it to the human behavior. % (see~\seeFig{results_raw}).
\seeFig{Bayesianchangepoint}-A shows a graph-based representation of the BBCP estimate of the run-length for one instance of a short sequence ($14$ trials) of simulated data $x_0^t$
of leftward and rightward trials, with a switch in the probability bias
of moving rightward occurring at trial $7$ (see figure caption for a detailed explanation).
%\textbf{It also shows the predicted probability $\hat{x_1}^t$
%as computed using the BBCP algorithm.} \CP{pas dans \seeFig{Bayesianchangepoint}-A}
%The bottom panel illustrates
%the belief on the predicted probability $\hat{p}$ at a given trial.
\seeFig{Bayesianchangepoint}-B, illustrates the predicted probability $\hat{x_1}^t$, as well as the corresponding uncertainty (the shaded areas correspond to $.05$ and $.95$ quantiles) when
we applied respectively the BBCP (green curve) and the forgetful agent (orange curve) model to
a longer sequence of $200$ trials,
characteristic of our behavioral experiments.
In the bottom panel,
we show the dynamical evolution of the belief on the latent variable (run length),
corresponding to the same sequence of $200$ trials.
The BBCP model achieves a correct detection of the switches after a short delay of a few trials.

Two main observations are noteworthy. First, after each detected switch, beliefs align along a linear ridge,
as our model best estimate of the current run-length is steadily incremented by $1$ at each trial until a new switch,
and the probability $\hat{x_1}^t$ is estimated  by integrating sensory evidence in this epoch (it 'stays').
Then, we observe that shortly after a switch (hidden to the agent),
the belief diffuses until the relative probability of a continuously increasing run-length
is lower that that assigned to a smaller run-length:
There is a transition to a new state (the model 'goes').
Such adaptation is similar to the slow / fast heuristic model proposed in other studies~\citep{Schutz14}.
Second, we can use this information to readout the most likely probability bias and
use it as a regressor for the behavioral data.
Note that the leaky integrator model is implemented
by the agent assuming a fixed run-length profile (see orange line in \seeFig{Bayesianchangepoint}-B),
allowing for a simple comparison of the BBCP model with the leaky integrator.
Again, we see that a fixed length model gives qualitatively a similar output
but with two disadvantages compared to the BBCP model, namely that
there is a stronger inertia in the dynamics of the model estimates and
that there is no improvement in the precision of the estimates after a switch.
%1/ the delay after the occurrence of a switch will always be similar,
%2/ there is no dynamic update of the inferred probability while in the BBCP model, the precision of each new datum is higher after a switch.
In contrast, after a correct switch detection in the BBCP model,
the value of the inferred probability converges rapidly to the true probability
as the number of observations steadily increases after a switch.

%-------------------------------------------------------------%
%: 2Cb quantitative analysis / different read-outs
% see 2018-02-12 journal club Bayesian changepoint chloe.pdf p.33/ p.42
% TODO include these quantitative results in a figure
%-------------------------------------------------------------%
In order to quantitatively evaluate the algorithm and following a similar strategy as~\citet{Norton18},
we computed an overall cost $\Cc$ as the negative log-likelihood (in bits) of the estimated probability bias, knowing the true probability
% - isn't H the joint entropy and C something related to the mutual information? H and C should both be defines},
and averaged over all $T$ trials:
\eql{
%\Cc =  \sum_{t} -\log_2 B(x_1^t ; \hat{x_1}^t, r^t )
%}
%\eql{
\choice{
 \Cc = \frac 1 T  \sum_t \Cc(x_1^t, \hat{x_1}^t)
 \text{ with }
 \Cc(x_1^t, \hat{x_1}^t) = H(x_1^t, \hat{x_1}^t ) - H(x_1^t, x_1^t ) \\
 \text{where } H(x_1^t, \hat{x_1}^t ) = - {x_1}^t \log_2( \hat{x_1}^t ) - (1-{x_1}^t) \log_2( 1- \hat{x_1}^t)
}
\label{eq:KL}
}
The measure $\Cc(x_1^t, \hat{x_1}^t)$ explicitly corresponds to the average score of our model,
%where this score is the log likelihood (in bits) of the inferred belief about the probability bias
as the Kullback-Leibler distance of $\hat{x_1}^t$ %along with its precision $r^t$ and
compared to the hidden true probability bias $x_1^t$.
%Similar measures based on the predicted readout $\hat{x_0}^t$
%gave similar results but would need more data to converge.
We have tested $100$ blocks of $2000$ trials for each read-out.
In general, we found that the inference is better for the BBCP algorithm ($\Cc = 0.171 \pm 0.030$)
than for the leaky integrator ($\Cc = 0.522 \pm 0.128$), % cf  notebook 3},
confirming that it provides overall a better description of the data.
Note that the only free parameter of this model is the hazard rate $h$
assumed by the agent (as in the fixed-length agent).
Although more generic solutions exist~\citep{Nassar10,Wilson13,Glaze15}, % ,Wilson18% ~\citep{Glaze15}
we decided as a first step to keep this parameter fixed for our agent,
and  evaluate how well it matches to the experimental outcomes at the different scales of the protocol:
averaged over all observers, for each individual observer or independently in all individual sub-blocks.
In a second step, by testing different values of $h$ assumed by the agent
but for a fixed hazard rate $h=1/40$ in the BSM,
we found that the distance given by~\seeEq{KL} is minimal
for the true hazard rate used to generate the data.
In other words, this analysis shows that the agent's inference is best for a hazard rate
equal to that implemented in the generative model and which is actually hidden to the BBCP agent.
This property will be important in a following section
to estimate the hazard rate implicitly assumed by an individual participant
on the basis of the set of responses given to the  sequence of stimuli
(see~\seeSec{inter}).
%
%-------------------------------------------------------------%
%: 2Cc perspectives
%-------------------------------------------------------------%
As a summary, for each trial of any given sequence,
we obtain an estimate of the probability bias assumed by the ideal observer
and which we may use as a regressor.
We will now present the analysis of this model's match
to our experimental measures of anticipatory eye movements and
explicit guesses about target motion direction.
%\AM{I WOULD DELETE THIS In particular, we expect intuitively the results to be more variable
%compared to the previous protocol with fixed-length blocks (see~\seeFig{intro}-C).}
%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\section{Results: psychophysics}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Results: Anticipatory eye movements and explicit ratings}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\label{sec:results_psycho}
%: FIGURE 2 fig:results_psycho~\seeFig{results_psycho}
\begin{figure}%[b!]
\centering{\includegraphics[width=\linewidth]{2019_figures/2_results_enregistrement}}
\caption{
\textbf{Raw behavioral results, qualitative overview.} %
The top row represents the sequence of target directions (TD)
that were presented to observers
in one sub-block of $200$ trials,
as generated by the binary switching model (see~\seeFig{intro}-A).
Bottom two rows show the raw behavioral results
for two representative observers:
The recorded aSPEM strength as measured
by the horizontal eye velocity estimated right before
the onset of the visually-driven SPEM (dark gray line);
and the explicit ratings about the expected target direction (or \textit{bet scores}, red line).
We also show the evolution of the value
of the probability bias $P_{\text{true}}$ (blue line)
which is hidden to observers
and used to generate the TD sequence above.
We have overlaid the results of the BBCP model
(see~\seeFig{Bayesianchangepoint}-B, green line).
This shows qualitatively a good match between
the experimental evidence and the model.
Note that short pauses occurred every $50$ trials
(as denoted by vertical black lines, see main text),
and we added the assumption in the model
that there was a switch at each pause.
This is reflected by the reset of the green curve close to the $0.5$ level and
the increase of the uncertainty after each pause. %multiple of $50$ trials.
}
\label{fig:results_psycho}
\end{figure}
%-------------------------------------------------------------%
%-------------------------------------------------------------%
%: 3A raw results in aSPEM
%-------------------------------------------------------------%
We used the BSM model to generate the (pseudo-)random sequence of
the dot's directions (the alternation of leftward/rightward trials)
as the sequence of observations that were used in both sessions
(see~\seeFig{results_psycho}).
In one session, we recorded the participants' eye movements and
%extracted the aSPEM velocity. (for details, see~\seeApp{em}).
we show the anticipatory smooth pursuit velocity
for two representative participants (out of $12$ participants), throughout a sub-block of $200$ trials of the experimental sequence.
Note that these participants were chosen as those
whose fitting score was nearest to the median score in the quantitative analysis
that will be illustrated below in~\seeSec{inter}.
In the top panel  of~\seeFig{results_psycho} we show the actual sequence of binary choices
(TD, leftward or rightward) of the Bernoulli trials,
whereas in the bottom panels,
we compare for each two participants
the evolution of the recorded aSPEM (grey line) with
the true value of the hidden probability bias $x_1$
(step-like blue curve),
and the value inferred using the BBCP model (green line),
along with the $.05$ to $.95$ quantile range (green shaded area).
Comparing the raw aSPEM results with the BBCP agent predictions,
it appears qualitatively that both traces evolve in good agreement.
First, one can observe a trend in the polarity of aSPEM velocity
to be negative for probability bias values below~$.5$ and positive for values above~$.5$.
%Moreover, both curves seem to lag by few trials the
%occurrence of a switch of the probability bias (also hidden to the observers).
Moreover, both curves (aSPEM and model) unveil similar delays in detecting and
taking into account a switch of the probability bias (while being hidden to the observers),
reflecting the time (in the order of a few trials) taken to integrate enough information
to build up the estimation of a novel expectation about the probability bias value
parameterizing the Bernoulli trial.
In general, results are more variable when the bias is weak ($p\approx .5$)
than when it is strong (close to zero or one),
consistent with the well-known dependence of the variance of a Bernoulli trial
upon the probabilistic bias ($\textrm{Var}(p)= p \cdot (1-p)$).
In addition, the precision (i.e. the inverse of the variance)
of the inferred probability bias $\hat{x_1}$ seems to increase
in longer epochs (inter-switch blocks) as information is integrated over more trials.
As a result, the inferred probability as a function of time
seems qualitatively to constitute a reliable regressor
for predicting the strength of aSPEM.

%-------------------------------------------------------------%
%: 3B Results: bias rating scale measurements
%-------------------------------------------------------------%
%\label{sec:rating_scale}
In addition, the explicit ratings
for the next trial's expected motion direction (or \textit{bet scores}, red curve in~\seeFig{results_psycho})
provided in the other experimental session
seem to qualitatively follow the same trend.
Indeed, similarly to the strength of aSPEM,
we qualitatively compare in~\seeFig{results_psycho}
the trace of the bet scores
with the inferred probability bias $\hat{x_1}$.
As with aSPEM, the series of the participants' bias guesses
exhibits a positive correlation with the true probability bias:
The next outcome of $x_{0}^{t}$ will in general be correctly inferred,
as compared to a random choice, as reported previously~\citep{Meyniel15}. %\AM{statistics to cite for this?}
%\AM{I WOULD SKIP THIS Equipped with this generative BSM model and the ideal observer given by the BBCP model,
%the agent is able to predict the next outcome better than chance level.}
Moreover, we observe again that a stronger probability bias leads
to a lower variability in the bet scores, compared to bias values close to $0.5$.
Again, a (hidden) switch in the value of the bias is
most of the time correctly identified after only a few trials.
Finally, note that at every pause (black vertical bar in~\seeFig{results_psycho}),
participants tended to favor unbiased guesses, closer to $0.5$
than at the end of a sub-block of trials.
We can speculate that this phenomenon could correspond
to a spontaneous resetting mechanism of the internal belief on the probability bias
and indeed we can introduce such an assumption in the model, %in the definition of the BBCP agent,
as a reset of the internal belief after each pause.
%along with an increased uncertainty,.
To conclude, the experiment performed in this session
shows that the probability bias values that are explicitly estimated by participants
are qualitatively similar to the implicit (and largely unconscious) ones
which supposedly underlie the generation of anticipatory aSPEM with variable strength.

%-------------------------------------------------------------%
%: FIGURE 4  fig:results_psycho_all \seeFig{results_psycho_all}
\begin{figure}%[b!]
\centering{
\begin{tikzpicture}%[thick,scale=1, every node/.style={scale=1} ]
\node [anchor=north west]  (img4) at (0.000\linewidth,.618\linewidth){\includegraphics[width=0.49\linewidth]{2019_figures/4_A_result_psycho_aSPEM}};
\node [anchor=north west]  (img4) at (0.51\linewidth,.618\linewidth){\includegraphics[width=0.49\linewidth]{2019_figures/4_B_result_psycho_bet}};
%
\draw [anchor=north west] (0.000\linewidth, .638\linewidth) node {$\mathsf{(A)}$};
\draw [anchor=north west] (0.5\linewidth, .638\linewidth) node {$\mathsf{(B)}$};
%
\end{tikzpicture}
}
\caption{%
\textbf{Behavioral results, quantitative analysis across participants.} %
For all participants and for all trials, we collected an estimate of
the strength of aSPEM and a bet score value.
We analyze the relation between these experimental data with the corresponding prediction $P_{\text{BBCP}}$
of the probability bias as inferred by the BBCP model.
We display these functional relations
using an error-bar plot showing the median with $.25$ and $.75$ quantiles
over 5 equal partitions of the $[0, 1]$ probability segment.
%contour plot over the Kernel Density Estimation (KDE)
The green regression line illustrates the relationship between the BBCP regressor in abscissa
and in ordinate \textit{(A)} the strength of aSPEM and \textit{(B)} the bet score, respectively.
%The density of strokes represent increasing probability strengths.
As a comparison, we have plotted in blue and orange colors the regression lines
with respectively
the true probability ($P_{\text{true}}=x_1^t$) and
the probability bias estimates $P_{\text{leaky}}$ obtained with a leaky integrator.
Insets summarize the quantitative measure of this match
by computing the Pearson correlation coefficient $r$
and the mutual information (MI) over the whole data set.
Dots correspond to these measures for each individual observer.
This shows quantitatively that for both experimental measures
there is a strong statistical dependency between
the behavioral results and the prediction of the BBCP model,
but also that this dependency is significantly stronger than that obtained
with the true probability and with the estimates obtained with the leaky integrator
(see text). %(respectively $p=XXX$ and $p=XXX$ for $r$ and  $p=XXX$ and $p=XXX$ for MI).
}
\label{fig:results_psycho_all}
\end{figure}
%%-------------------------------------------------------------%
%-------------------------------------------------------------%
%: 3C quantitative analysis
%-------------------------------------------------------------%
Quantitatively, we now compare the experimental results
with the value of the probability bias $\hat{x_1}$
computed by the BBCP algorithm.
Compiling results %(see~\seeFig{results_psycho})
from all participants,
we have plotted in~\seeFig{results_psycho_all}
the aSPEM strength (panel A) and the bet scores (panel B) as a function of the BBCP-inferred probability bias
(we remind here that the true value of the probability bias was coded at the second layer of the BSM generative model and is hidden both to the agents and to the human observers).
All trials from all participants were pooled together
and we show this joint data as an error bar plot
showing the median along with the $.25$ and $.75$ quantiles
as computed for 5 equal partitions of the $[0, 1]$ probability segment. % to generate a scatter plot
As a comparison, the same method was applied to the true value $P_{\text{true}}$ and
to the estimate obtained by the leaky integrator.
%which was then  converted to a kernel density estimate
%for a better visualization (kernel size of $XX$).

We quantitatively estimated the Pearson correlation coefficient
and the mutual information %$\Cc$ (\seeEq{MI}) OR RATHER $MI$, as defined by ... NEW EQUATION}
between the raw data and the different models.
%Details of this analysis is presented in~\seeApp{bcp}.
First, as we can see in~\seeFig{results_psycho_all}-A,
the probability bias $P_{\text{BBCP}}$ estimated by the BBCP algorithm
is linearly correlated with the aSPEM velocity, both as computed on the whole data or
for each observer individually (see insets in the Figure).
The respective values %($r = XXX \pm $)
for the whole dataset
%\CP{aSPEM / BCP :
($r = 0.657$ and $MI = 0.687$) %pour les sujet -
and across subjects
($r = 0.673 \pm 0.079$ and $MI =  0.707 \pm 0.134$) %notebook 4}
are slightly higher than that found by~\citet{Montagnini2010} and~\citet{Damasse18}
for aSPEM measures gathered across experimental blocks with fixed direction biases %($r = XXX \pm $),
and significantly\footnote{All following $p$-values are obtained from the \href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html}{Wilcoxon signed-rank test}.} better than that estimated
with the true probability
($r = 0.613 \pm 0.069$ with $p=0.002$ and $MI = 0.562 \pm 0.107$ with $p=0.002$) %
%\CP{aSPEM / true : ($r = 0.597$ $MI = 0.529$) pour les sujet - ($r = 0.6138 \pm 0.0698 $  $p=0.0022$ - $MI = 0.5622 \pm 0.1079$ $p=0.0022$)}
and for that estimated by the leaky-integrator model
($r = 0.600 \pm 0.079$ with $p=0.003$ and $MI =  0.622 \pm 0.102$ with $p=0.004$), see inset).
%\CP{aSPEM / Leaky - ($r = 0.582$ $MI = 0.609$) pour les sujet - ($r = 0.6005 \pm 0.0790$ $p=0.0037$ - $MI =  0.6221 \pm 0.1025$  $p=0.0047$)}
A similar analysis
illustrates the relationship between
the model-estimated probability bias
and the rating value, or bet score, about the expected outcome, which was provided at each trial
by participants
and is shown in~\seeFig{results_psycho}.
Similarly to the aSPEM strength, the rating values are nicely correlated
with the probability bias given by the model,
as quantified by the Pearson correlation coefficient and mutual information
across subjects ($r = 0.813 \pm 0.091$ and $MI = 1.312 \pm 0.364$).
%\CP{synchroniser ces valeurs entre figure et texte... --> valeurs correctes -- sur la figure : un r et un MI pour tous les sujets en m\^eme temps, dans le texte : moyenne de r et MI trouv\'e pour chaque sujets individuellement}
%\CP{BET / BCP : ($r = 0.795$ $MI = 1.134$) pour les sujet - ($r = 0.8134 \pm 0.0916 $ $MI = 1.3126 \pm 0.3644$)}
Importantly, this value is again higher for the BBCP model than
for the leaky integrator ($r = 0.731 \pm 0.129$ with $p=0.007$ and $MI =  1.117 \pm 0.409$ with $p=0.028$),
%\CP{BET / Leaky : ($r = 0.712$ $MI = 0.968$) pour les sujet - ($r = 0.7311 \pm 0.1298 $ $p=0.0076$ - $MI =  1.1178 \pm 0.4092$ $p=0.0280$)
or with the true probability ($r = 0.694 \pm 0.086$ with $p=0.002$ and $MI =  0.940 \pm 0.255$ with $p=0.002$). % : ($r = 0.679$ $MI = 0.795$) pour les sujet -  }
Further notice that, in order to account for some specific changes
observed in the behavioral data after the short pauses
occurring every $50$ trials,
we added the assumption %for both models,
that there was a switch at each pause.
However, removing this assumption did not significantly change the conclusions about the match of the model
compared to $P_{\text{true}}$ or $P_{\text{leaky}}$
both for eye movements %\AM{I would skip all these tests, too long here, if they do not add anything significant!!
($P_{\text{BBCP}}$: $r = 0.667 \pm 0.078$ and $MI =  0.712 \pm 0.125$, $P_{\text{leaky}}$: $r = 0.548 \pm 0.074$ with $p=0.003$ and $MI =  0.577 \pm 0.096$ with $p=0.003$ ; $P_{\text{true}}$ : $r = 0.613 \pm 0.069$ with $p=0.002$ and  $MI =  0.562 \pm 0.107$ with $p=0.002$) % }
and the bet experiment
($P_{\text{BBCP}}$: $r = 0.802 \pm 0.090$ and $MI =  1.255 \pm 0.349$, $P_{\text{leaky}}$: $r = 0.641 \pm 0.120$ with $p=0.002$ and $MI =  0.966 \pm 0.300$ with $p=0.002$ ; $P_{\text{true}}$ : $r = 0.694 \pm 0.086$ with $p=0.002$ and $MI =  0.940 \pm 0.255$ with $p=0.002$).
%\CP{SANS PAUSE -- BET  / BCP :($r = 0.785$ $MI = 1.100$) pour les sujet - ($r = 0.8027 \pm 0.0905 $ $MI =  1.2557 \pm 0.3498$)
%/ Leaky :($r = 0.625$ $MI = 0.879$) pour les sujet - ($r = 0.6413 \pm 0.1209$ $p=0.0028$ - $MI = 0.9666 \pm 0.3008$ $p=0.0022$)
%/ true probability : ($r = 0.679$ $MI = 0.795$) pour les sujet - ($r = 0.6949 \pm 0.0861 $ $p=0.0022$ - $MI = 0.9400 \pm 0.2550$ $p=0.0022$) }
To conclude, we deduce that the dynamic estimate of the probability bias produced by the BBCP model
is a powerful regressor to explain
both the strength of anticipatory smooth pursuit eye movements
and the explicit ratings of human observers experiencing a volatile context for visual motion.

This relatively strong correlation is surprising at a first sight
as the epochs with constant probability bias (between two switches) have random lengths,
and participants have to adapt to such a volatile environment.
However, adaptivity to a volatile environment is one of the most exquisite human skills:
When faced with some new observations,
the observer has to constantly adapt his/her response
to either exploit this information by considering that
this observation belongs to the same context of the previous observations, or to explore
a novel hypothesis about the context.
This compromise is one of the crucial component that we wished to explore
and which is well captured by the BBCP model.
In particular, the model predicts different aspects
of the experimental results,
from the variability as a function of the inferred probability,
to the dynamics of the behavior following a (hidden) switch.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Results: Analyzing inter-individual differences}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\label{sec:inter}
%-------------------------------------------------------------%
%: 4A separate above analysis
%-------------------------------------------------------------%
So far, we have presented the qualitative behavior of individual participants and
we have reported the quantitative analysis of the group-pooled data
for the fit between experimental and model-inferred estimates of the hidden probability bias.
For instance, the experimental measures for two representative participants in~\seeFig{results_psycho},
support the qualitative match between behavioral data and model predictions,
which we then confirmed quantitatively on the whole group of participants.
It is important to note that no model fitting procedure was used so far,
but only the match of the results of the BBCP-model
applied to the sequence of binary target directions
presented to the human participants,
as shown in~\seeFig{Bayesianchangepoint}-B.
However, we observed that in both sessions the qualitative match between model and data varied across participants.
This was best characterized by differences
in the variability of the responses, but also, for instance,
by the different characteristic delays after a switch.
This reflects the spectrum of individual behavioral choices
between exploration versus exploration~\citep{Behrens07}.
As a consequence, we were interested in characterizing these individual preferences
for each individual participant,
but also to investigate whether this preference co-varied
across the two experimental sessions (i.e.~across implicit vs explicit response modalities).
Crucially, we have seen that the BBCP model is controlled by a single parameter,
the hazard rate, or equivalently by its inverse, the characteristic time $\tau$.
Also, we have shown that knowing an observed sequence of behavioral responses,
we could fit the value of $h$ which would best explain the observations,
as quantified by the Pearson's correlation coefficient or by the mutual information.
Thus, by extracting the best-fit parameters for each participant and experimental session
we expect to better understand the variety of inter individual differences. % and the covariation of these across response modalities.
%-------------------------------------------------------------%
%: FIGURE 5 fig:results_inter \seeFig{results_inter}
% cf https://github.com/laurentperrinet/Bayesianchangepoint/blob/master/notebooks/test_hazardrate.ipynb
% cf : 4_Meta_analysis.ipynb
\begin{figure}%[b!]
\centering{\includegraphics[width=0.618\linewidth]{2019_figures/5_inter-individual_differences_fit}}
\caption{\textbf{Analysis of inter-individual differences.} %
We analyzed each participant's behavior individually, by searching
the individual best value of the model's single free parameter, the hazard rate $h$.
Estimates were performed independently on both experiments,
such that we extracted different estimates of $h_{\text{aSPEM}}$ and $h_{\text{bet}}$
respectively for the aSPEM strength and the rating value.
The dots correspond to independent estimates of the hazard rate in each $200$ trial sub-block and data belonging to
each individual participant are joined by dotted lines.
Dashed lines correspond to the median for the full dataset (black line)
or for each individual sub-block (colored line).
These should be compared to the values obtained for the BBCP model,
showing a slight variability over sub-blocks.
Stars correspond to the observers displayed in~\seeFig{results_psycho}.
This plot shows that best fit hazard rates are in general higher than the ground truth (blue line),
and in general higher for eye movements (below the diagonal).
Note that the histograms of hazard-rate best-fit estimates (grey shaded areas) is much more narrower
for the eye movement session than for the bet experiment,
as also illustrated by the cumulative distributions (plain lines in black or colors).
Such an analysis suggests that participants ultimately have
different mechanisms at the unconscious and conscious levels
for guiding their tendency of exploration versus exploitation.
 }
\label{fig:results_inter}
\end{figure}
%-------------------------------------------------------------%

%-------------------------------------------------------------%
%: 4B results
%-------------------------------------------------------------%
Hence, we have fitted the sequence of responses generated by each participant and
for each experimental session, that is for the eye movements and the rating scale experiments.
To avoid any possible bias from the fitting procedure,
we tested $1600$ linearly spaced values of $\tau$ from $1$ to $1600$ trials.
For each, we computed the correlation coefficient with the BBCP-model responses parameterized
by the value of the hazard rate $h = \frac 1 \tau$.
We then extracted different estimates of $h_{\text{aSPEM}}$ and $h_{\text{bet}}$,
respectively for aSPEM and the rating scale,
by choosing the hazard rate value corresponding to that with maximal correlation coefficient.
To cross-validate our results for each individual participant,
we have fitted the BBCP model to each of the $3$ sub-blocks of $200$ trials.
This provides with $3$ values of the best fitted hazard rate for each session and observer.
The scatter plot of the best fit values is shown in~\seeFig{results_inter}.
This figure suggests, in the first place, that there is some variability
in the best fitted value of the hazard rate in both sessions.
Overall, the value of correlation coefficient of the best fit hazard rate
was slightly higher than that computed in~\seeFig{results_psycho}
with $r = 0.682 \pm 0.080 $ for the eye movement session
%$\text{MI} = XX~ \pm XX~$
%\CP{$h_{va}$: un r par sujet : $0.6826033619391142 \pm 0.08006027539364491$ //
%un MI par sujet : $ 0.479930369669249 \pm 0.12328555167152054$ //
%moyenne de tous les r trouv\'e : $0.639744367840892 \pm 0.11445851404708901$
%quand on prend un r par sujet il semble que ce soit un peut plus grand que quand on garde un h fixe, pas pour le MI ni pour quand on calcule la moyenne des 3 r par sujet (notebook 5)}
 and $r = 0.811 \pm 0.089 $ for the rating scale session.
%$\text{MI} = XX \pm XX$
%\CP{$h_{bet}$ un r par sujet : $0.8117056662563028 \pm 0.08911550300859927$ //
%un MI par sujet :$0.8341219853735895 \pm 0.25926861033243437$ //
%moyenne de tous les r trouv\'e : $0.7975617439810163 \pm 0.14310508337570083$
%il semble que se soit moin bon que quand on garde un h fixe} for the second.
A part of the variability in the estimated hazard rates comes
from the limited length of the data blocks,
while another part is due to intra-individual and inter-individual variabilities.
Overall, the median (with $25\%$ and $75\%$ quantiles) are $h_{\text{aSPEM}} = 0.069 ~ (0.038, 0.093)$ %\CP{moyenne : $0.06692999125125686 \pm 0.03628253053391533$, mediane : 0.06904761904761905 	25\% 0.038461538461538464 	75\% 0.09318181818181817}
for the aSPEM session and
$h_{\text{bet}} = 0.025 ~ (0.011, 0.093)$ %\CP{moyenne : $0.06631464647744617 \pm 0.09970025861060718$, mediane : 0.02564102564102564 	25\% 0.011951909476661951 	75\% 0.09318181818181817
%}
for the rating scale.
We observe that these values are close to the (hidden) ground truth value ($h=1/40=0.025$) used to generate the sequence.
In addition, the best-fit hazard rate value is higher for aSPEM compared to the true value and the rating scale measures.
%Quantitatively, the ratio of both hazard rates computed across participants was \CP{with a ratio $1.6460031261743702~ \pm 2.779003512890947~$ ($h_{bet} / h_{va}$),  $16.12429536779007~ \pm 40.37883398426319~$ ($h_{va} / h_{bet}$)},
In addition, we observed a tendency for hazard rate to be higher in the eye movement recording session.
%as was previously suggested~\citep{Meyniel??}.
%Importantly, there is a part of the variability
%which seems characteristic of the spectrum of individual choices.
%\LP{
%Indeed, each cluster of individual measures is qualitatively well separated and
%a simple k-means cluster analysis proves that
%given a couple of hazard rate values chosen at random,
%one could identify the participant with an accuracy of $XX\%$.
%As a comparison, the same analysis done based on the leaky integrator predictions
%showed that such accuracy would drop to $XX\%$,
%close to the chance level of $8.33\%$.
%} \CP{n'a pas fonctionn\'e !}
As a consequence, this analysis reveals
that relaxing the free parameter of the BBCP model
improves the match of the model to the behavioral data and
that we could represent the distribution of individual differences in the choice behavior
between exploration and exploitation
in both sessions for each subject.

%-------------------------------------------------------------%
%: 4C analysis
%-------------------------------------------------------------%
The distribution of best-fitted values for each individual subject seemed to qualitatively cluster,
but the dataset is still insufficiently large to support the significance of such observation
at a quantitative level.
Moreover, there is a difference in the distribution of observed hazard rates in both sessions.
Indeed, we observed that the marginal distribution for each session is different,
with the distribution in the aSPEM session being narrower than
that observed for the rating scale session.
In particular, we also observed the same behavior for each sub-block independently,
suggesting that the origin of this variability mainly comes from inter-subject variability.
%In comparison if, similarly to bootstrap resampling,
%we would shuffle the joint fitted values across participants,
%we would obtain a distribution of hazard rate ratios of $XX \pm XX~\ms$.
Such an analysis suggests that even though the predictive processes
at work in both sessions may reflect a common origin for the evaluation of volatility,
this estimation is then more strongly modulated by individual preferences
when a more conscious cognitive process is at stake.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\label{sec:outro}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The capacity to adapt our behavior to the environmental regularities has been investigated in different research fields, from motor priming and sensory adaptation to reinforcement learning, machine learning and economics. Several studies have aimed at characterizing the typical time scale over which such adaptation occurs. However, the pattern of environmental regularities could very well change in time, thereby making a fixed time-scale for adaptation a suboptimal cognitive strategy. In addition, different behaviors are submitted to different constraints and respond to different challenges, thus it is reasonable to expect some differences in the way (and time scales) they adapt to the changing environment. This study is an attempt to address these crucial open questions. We have taken an original approach, by assuming a theoretically-defined volatility in the properties of the environment (in the specific context of visual motion tracking) and we have developed an optimal inferential agent, which best captures the hidden properties of the generative model solely based on the trial sequence of target motion. We have then compared the optimal agent's prediction, as well as a more classical \textit{forgetful agent}, to two sets of behavioral data, one rooted in the early visuomotor loop underlying anticipatory ocular tracking, and the other related to the explicit, conscious estimate of the likelihood of a future event. Our results point to a flexible adaptation strategy in humans, taking into account the volatility of the environmental statistics. The time-scale of this dynamic adaptive process would thus vary across time, but it would also be modulated by the specific behavioral task and by inter-individual differences. In this section we discuss the present work and its implications in view of the existing literature and some general open questions.

%OVERVIEW OF MAIN FINDINGS

% From psychophysics at equilibrium to study the whole dynamics

%%%
%%%% FOR DISCUSSION?
%%%%This analysis -the correlation between predicted bias and bet scores- exhibits a consistent relationship
%%%%but which now follows a non-linear trace.
%%%%We modeled this behavior as
%%%%a logistic regression over
%%%%the log-odd ratio estimated by the agent.
%%%%TODO model this relationship by a logistic regression
%%%%This analysis provides with regression factors for each participant.
%%%%It shows that the bias (intercept)
%%%%was negligible, while the slope (logistic factor)
%%%%was always positive.
%%%%This indicates a generic aversion to risk~\citep{Kahneman13}
%%%%such that the value of a possible outcome
%%%%is down-weighted by the precision of the inference.
%%%%Our logistic regression analyses
%%%%suggests that this may come from a multiplicative weight
%%%%on the (log-) odd-ratio which is chosen as a rating scaled
%%%%compared to the (log-)odd-ratio of the inferred probability. %\AM{Is it worth to discuss more in depth the different functional dependence on p of the two experimental measures? Cite Santos and Kowler discussion about this!} - LP: that would certainly be a great addition...
%%%%%%%
%%%
%%%The compromise between speed and accuracy is shaped
%%%by the variability introduced by the neural noise
%%%at the neuro-motor junction~\citep{Harris98}.
%%%

\subsection{Environmental regularities, cognitive properties and visual perception}

%%%
%%%PERCEPTION:
%%%\AM{I PROPOSE to skip this, we detail this stuff in the next section and the part on perception is not appropriate here... First, it seems important to investigate the integration of environmental statistical regularities
%%%in this smooth tracking task with
%%%a large family of probability biases for the direction of target motion.
%%%As such, instead of a finite set of probability biases, % ($.25$, $.5$, $.75$, $.9$ and $1.$, see \seeFig{intro}-B),
%%%we decided to use a continuous range of probability biases.
%%%%which we were sampling from a fixed prior distribution.
%%%Second,~\citet{Maus2015} have recently shown that
%%%both perceptual adaptation for speed estimation
%%%and priming of aSPEM could occur simultaneously.
%%%They found a robust repulsive adaptation effect
%%%with perceptual judgements biased in favor of faster percepts
%%%after seeing stimuli that were slower and~\textit{vice-versa}.
%%%Concurrently, these authors also found
%%%a positive effect on anticipatory smooth pursuit,
%%%with faster anticipation after faster stimuli.
%%%Indeed, both priming and adaptation can hypothetically share
%%%a common internal representation of stimulus' speed,
%%%reasonably built according to the mean velocity of the last observed movements.
%%%The comparison of the internal representation of speed
%%%with the current stimulus velocity could explain repulsive aftereffects and,
%%%at the same time, be used to elicit
%%%an aSPEM component at the appropriate velocity
%%%for next stimulus occurrences.
%%%\citet{Maus2015} estimated the past history effects over different times scales,
%%%with the priming effects being maximized
%%%for short stimulus histories (around $2$ trials) and
%%%adaptation for longer stimulus history, around $15$ trials.
%%%Their main conclusion was that
%%%perceptual adaptation and oculomotor priming
%%%are the result of two distinct readout processes
%%%using the same internal representation of motion regularities.
%%%Both these history lengths can be considered
%%%short in comparison to the several hundreds
%%%of trials that are commonly used in psychophysics and sensorimotor adaptation studies.}
%%%In general, it seems \AM{important to disentangle the different cognitive processes and time scales}
%%%and their respective role in adaptive processes.
%%%
%%%

The time-varying statistical regularities that characterize the environment are likely to influence several cognitive functions. In this study, we have made the choice to focus on a largely unconscious motor behavior (aSPEM), as well as on the explicit rating of expectation for the forthcoming motion direction. In contrast, we have not addressed the question of whether and how statistical learning affects visual motion perception throughout our model-generated volatile sequences.
In an empirical context similar to ours,~\citet{Maus2015} have recently shown that
perceptual adaptation for speed estimation occurs concurrently to
priming-based aSPEM throughout a sequence of motion tracking trials with randomly varying speed. They actually found a robust \emph{repulsive} adaptation effect,
with perceptual judgements biased in favor of faster percepts
after seeing stimuli that were slower and~\textit{vice-versa}.
Concurrently, these authors also found
a positive effect on anticipatory smooth pursuit,
with faster anticipation after faster stimuli, somehow in agreement with the adaptive properties of aSPEM that we also report here.
\citet{Maus2015} quantified the trial-history effects on aSPEM and speed perception by fitting a fixed-size memory model similar to the forgetful agent. They found that aSPEM and speed perception change over different time scales,
with the priming effects being maximized
for short-term stimulus history (around $2$ trials) and
adaptation for longer stimulus history, around $15$ trials.
Their main conclusion was that
perceptual adaptation and oculomotor priming
are the result of two distinct readout processes
using the same internal representation of motion regularities.
Note that both these history lengths can be considered
short in comparison to the several hundreds
of trials that are commonly used in psychophysics and sensorimotor adaptation studies and that, similar to the present study, the inferred characteristic times are even shorter for the buildup of anticipatory eye movements. However, it is also important to note that in the study by \citet{Maus2015}, the generative model underlying the random sequence of motion trials was different and much simpler than in the present study: In particular the role of environmental volatility was not directly addressed there. This makes a direct comparison between their results and ours difficult beyond a qualitative level.

%%%%The oculomotor system has to constantly update its knowledge about the environment. An ordeal is then to adapt to changes with the shortest delays. Early studies have proposed that stimuli provides information to modulate reaction times within sequences~\citep{Hyman1953, Tune1964, Schvaneveldt}. This theoretical approach is coherent with the notion of local transition probabilities that quantifies at which extent an observation deviates from the preceding ones~\citep{Meyniel16}. The way expectations act on cognitive processes has been investigated by a wide range of domains such as predictive coding~\citep{Wolpert2000, Wacongne2012}, active inference~\citep{Friston2010}, motor control~\citep{Sutton1998, Behrens07} and reinforcement learning~\citep{Nassar2012}. Non-stationary observations can also explain why both local and global effects emerge and why local effects persist in the long run even within purely random sequences~\citep{Cho2002, Yu2009}. This constant update of a general belief on the world can be a consequence of the constant attempt to learn the non-stationary structure of the environment that can change at unpredictable times~\citep{Yu2009}. Many studies have actually already pointed the brain's ability to apprehend non-stationary states in environments~\citep{Ossmy2013, Meyniel15}. As explained by~\citet{Meyniel16}, the belief upon an environment can be divided in two different ways:
In spite of a multitude of existing studies investigating the dynamics of sequential effects on visual perception (see for example~\citet{Cicchini_PRSB_2018,ChopinMamassian2012}), only few of them have directly addressed the role of the environmental volatility on the different behavioral outcomes.~\citet{Meyniel16} have compared the predictions of different models, featuring a dynamic adaptation to the environment's volatility (equivalent to our \textit{forgetful agent model}) versus a fixed belief model, on five sets of previously acquired data, including reaction time, explicit reports and neurophysiological measures. Interestingly,~\citet{Meyniel16} concluded that the estimation of a time-varying transition probability matrix constitutes
a core building block of sequence knowledge in the brain,
which then applies to a variety of sensory modalities and
experimental situations.
As such, sequential effects in binary sequences would be better explained
by learning and updating transition probabilities
compared to the absolute item frequencies (as in the present work) or the frequencies of their alternations.
The critical difference lies in the content
of what is learned (transition probabilities versus item frequencies)
in an attempt to capture human behavior. Rather than on transition probabilities, here we focused on the analysis and modeling of human behavior as a function of the frequency of presentation (and its fluctuations in time) of a given event in a binary sequence of alternating visual motion direction. We can speculate that different statistics can play a different role depending on the context, but altogether the study by~\citet{Meyniel16} and the present one converge to highlight the importance of a dynamic estimate of the hierarchical statistical properties of the environment for efficient behavior.
There are also other limits to the agent that we have defined. In this study we assume that data are provided as a sequence of discrete steps.
A similar approach using a Poisson point process
allows to extend our model to the continuous time domain, such as addressed by~\citet{RadilloBrady2017}:
In their experiments, the authors analyzed the licking behavior of rats in a dynamic environment.
The generalization to the time-continuous case is beyond the scope of our current protocol,
but it would consist in a natural extension of it
to more complex and ecological settings.

The way expectations act on cognitive processes in general has been investigated in a wide range of domains such as predictive coding~\citep{Wacongne2012}, active inference~\citep{Friston2010}, motor control~\citep{WolpertGhahramani2000} and reinforcement learning~\citep{Behrens07,Wilson13,Damasse18}. Non-stationary observations can also explain why both local and global effects emerge and why local effects persist in the long run even within purely random sequences~\citep{Cho2002, Yu2009}. This constant update of a general belief on the world can be a consequence of the constant attempt to learn the non-stationary structure of the environment that can change at unpredictable times~\citep{Yu2009}. Many studies have actually already pointed out the brain's ability to apprehend non-stationary states in environments~\citep{Ossmy2013, Meyniel15}.
Future work will be needed to address the amplitude and dynamics of modulations of visual perception and other cognitive functions in a model-based volatile environment like the one we formally defined in this study, and to compare them to other implicit and explicit behavioral measures (like anticipatory eye movements and explicit expectation ratings).
%%%\AM{I WOULD PUT ALL THIS IN THE DISCUSSION: They concluded that transition probabilities constitute
%%%a core building block of sequence knowledge in the brain,
%%%which then applies to a variety of sensory modalities and
%%%experimental situations.
%%%As such, sequential effects in binary sequences would be better explained
%%%by learning transition probabilities
%%%compared to the absolute item frequencies or the frequencies of their alternations.
%%%The critical difference lies in the content
%%%of what is learned (transition probabilities versus item frequencies)
%%%in an attempt to capture human behavior.}
%%%%TODO say it is an optimization that is, a measure of the dynamic compromise between exploration and exploitation
%%%%Chopin and %Mamassian and the use of visual confidence (see Ann rev of Vision 2016)}
%%%\AM{Rather than on transition probabilities, here we focus on the analysis and modeling of human behavior as a function of the frequency of presentation (and its fluctuations in time) of a given event in a binary sequence of alternating visual motion direction.}
%%%
%%%

\subsection{Hierarchical Bayesian inference in the brain}
When we perceive the physical world, make a decision or take an action to interact with it,
our brain must deal with an ubiquitous property of it, uncertainty. Uncertainty can arise at different levels and be structured around different characteristic time scales. During the past decades, modern science seems to have completed an epistemological transition, from struggling to reduce or neglect uncertainly to engaging in understanding it as a crucial constituent of the world. In the cognitive neurosciences
this transition has been formalized in the theoretical framework of Bayesian probabilistic inference, which has become very popular as a
benchmark of optimal behavior in perceptual, sensorimotor and cognitive tasks~\citep{KnillPouget2004} and gives a unified framework for studying the brain~\citep{Friston2010}. Furthermore, plausible hypotheses about the implementation of Bayesian computations ---or approximations of them--- in the activity of neuronal populations have been proposed~\citep{Bastos12, Fetsch2012,Ma2006}.

However, one should be careful when evaluating the quality of fit of Bayesian inference models for behavioral data, and avoid any over-interpretation of the results.
Note that, if we assume that the inversion of the generative model is perfect
(that is, if no algorithmic approximation has been done, like in the present study),
this means that by fitting different ideal observers
to the data, one evaluates as a matter of fact the adequacy of
a specific generative model, not of the probabilistic calculus in its detailed implementation.
There is a common confusion around the idea of a ``Bayesian brain''.
We actually believe that the challenge here is not to validate the hypothesis that the brain uses or not the Bayes' theorem, or a more complex hierarchical combinations of inferential computations, but rather to test different hypotheses about the different generative models
that agents may use. This methodological point will be essential in designing future experimental protocols, and in evaluating quantitatively the results. The brain is probably only ``weakly Bayesian'' (it does not care about equations but more about sugar, after all!). One remaining question though, is to understand why in cognitive systems the adaptation to hierarchical probabilistic fluctuations occurs and in particular why it may deviate in some pathological disorders such as schizophrenia~\citep{Adams12, Jardri2017} or across the natural variability of autistic traits~\citep{Karvelis2018}.

%\subsection{Computational phenotyping of human participants}
While it was not our original objective, we have analyzed in this study the individual best-fit parameters (hazard rates) of the BCCP model: despite a consistent variability of such parameters across sub-blocks of the trial sequence, we highlighted some noteworthy tendencies for participants to cluster around specific properties of the dynamic adaptation to a volatile probabilistic environment. Most important, this analysis corroborates and strengthens some recent attempts to realize a \textit{computational phenotyping} of human participants. However, more extensive studied should be conducted to be able to quantitatively titrate inter-individual tendencies.

\subsection{Non-linearities in the adaptation to probabilistic environments}
Finally, neuroeconomists have pointed out a generic aversion to risk~\citep{Kahneman13}
such that the value of a possible outcome
is weighted by the precision of the inference, leading in general to an under-weighting of high gains and losses. Importantly,~\citet{WuDelgadoMaloney2012} compared a classical economic decision task with a motor decision task: they found that participants were more risk seeking in the motor task compared to the first one. More recently, in a task similar to ours, where the behavioral choice was not specifically associated to a reward schedule,
\citet{SantosKowler2017} found a weak non-linearity in the dependence of aSPEM upon the probability of motion direction, yielding an overweight of the extreme values of probability, whereas an opposite non-linearity (underweight) was observed when the target direction was visually-cued with a given probability of validity. In our data we have not found consistent evidence suggesting a clear non-linearity in either sense. Further work is needed to disentangle the possible specificities (e.g. non linearities), in this respect, of different cognitive tasks, as well as to investigate the dependence of non-linearities upon the environmental volatility.


%%%%\begin{enumerate}[label=\Alph*)]
%%%%\item Update the~\textit{a priori} likelihood of a sudden change, also known as the volatility and taken into account by the model~\citep{Behrens07}
%%%%\item A leaky integrator factor imbedded in the model~\citep{Anderson2006, Yu2009, Ossmy2013, Wang2002}
%%%%\end{enumerate}
%%%%
%%%
%%%
%%%However, one should be careful with conclusions based on this kind of data fitting.
%%%For instance, if the inversion of the forward model is exact in the present case,
%%%one has to use approximations in more complex models,
%%%such as with the HGF~\citep{Mathys11}
%%%or with the models developed by~\citet{Wilson13,Wilson18}.
%%%As a result, some discrepancy could originate from these approximations
%%%at the algorithmic level.
%%%As such, it is essential to carefully control for these sources of discrepancy (intrinsic versus extrinsic) independently~\citep{Beck12}.
%%%Second, if we assume that the inversion of the model is perfect
%%%(that is, no algorithmic approximation has been done),
%%%this means that by fitting different ideal observers
%%%to the data, one evaluates as a matter of fact the adequacy of
%%%a generative model, not of the probabilistic calculus in its detailed implementation.
%%%This is a common confusion around the idea of a ``Bayesian brain''.
%%%We believe here that the challenge is not to validate the hypothesis that the brain uses or not the Bayes' theorem,
%%%but rather to test different hypotheses
%%%about the different generative models
%%%that agents may use.
%%%This methodological point may be essential in designing the experimental protocol,
%%%or in evaluating quantitatively the results.
%%%
%%%%: theory / computationnally-driven experiments
%%%% it's a main novelty
%%%generative models for changing environments allows to know the ground truth compared to natural stimulation (see Rust eand Movshon)%
%%%Let's remember our hierarchical generative model.
%%%
%%%At any given trial, we wish to construct an algorithm which
%%%
%%%We will introduce a fundamental component of Bayesian models : a latent variable
%%%
%%%this new variable will be used to test different hypothesis which will be evaluated to predict future states. it is called latent because it aims at representing a variable that is latent (or hidden) to the observer
%%%
%%%in our case, we will assume that the Bayesian model knows about the structure of the generative model and we will set it to the current run-length $r$, that is, at any given trial, the hypothesis that the past r observations belong to the same block. of course a wrong choice of a latent variables (let's say the temperture in the experimental room) may give unexpected results, even is the Bayesian model is ``optimal'' - an essential point to understand in Bayesian inference
%%%
%%%extension to multi-nomioal( daniele + fred danion)
%%%
%%%
%%%
%%%% Still, only Bayesian models recover an explicit probabilistic representation of change in likelihood. Recent experimental studies suggest, indeed, that the brain is able of estimating a hierarchical model of the environment and that humans can explicitly report sudden changes in sequences~\citep{Meyniel15, Gallistel2014}. Ultimately, we passed over one of leaky integrator models' main default, having a too fixed and rigid memory parameter. In our work the memory parameter is constantly inferred by the BBCP algorithm over the observation of the number of trials where this inference stayed reliable and then globally represented probabilistic representation of changes in likelihood and actualization of~\textit{a priori} knowledge.
%%%
%%%
%%%perspectives:
%%%- RL : use hindsight example of localization for saccades: get the changepoints then improve estimate of reward allows to optimize the association between the set of measures and their utility (compared to Q-learning where it is a fixed length)
%%%- interindividual differences : markers for the berhaviour traces - traces of the network implementation / testing different h
%%%- the brain is weakly Bayesian (it does not care about equations but more about sugar)
%%%
%%%
%%%One remaining question though, is to understand why in cognitive systems
%%%this adaptation occurs and
%%%in particular why it may deviate
%%%in some pathological disorders such as schizophrenia~\citep{Adams12}.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusions}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{itemize}\setlength{\itemsep}{0ex}
\item We have developed a Bayesian model of an agent estimating the probability bias of a volatile environment with changing points (switches), such that the agent may decide \textit{to stay} on the current hypothesis about the environment, or \textit{to go} for a novel one. This allows to dynamically infer the probability bias across time and directly compare model predictions and experimental data, such as measures of human behavior.
\item We applied such a framework to the case of a probability bias in a visual motion task where we manipulated the target direction probability. We observed a good match between the largely unconscious anticipatory smooth eye movements and the results of the model, replicating and providing a novel solid theoretical framework for previous findings~\citep{Montagnini2010, SantosKowler2017, Damasse18}.
\item We also found a good match between model predictions and the explicit rating of the expected target motion direction, a novel result suggesting that this model captures some of the brain computations underlying expectancy based motion prediction, both at the unconscious and conscious level.
\item Finally, we found that the experimental data of each different participant matched to different types of belief about the volatile environment, some being more or less conservative than others. Interestingly, each of the two experiments (namely for the unconscious anticipatory eye movements and the conscious rating) provided different distributions, opening the perspective for future \emph{computational phenotyping} using such a volatile setting.
\end{itemize}
%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Material and Methods}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Participants, visual stimuli and experimental design}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Twelve observers ($29$ years old $\pm 5.15$, $7$ female) with normal or corrected-to-normal vision took part in these experiments. They gave their informed consent and the experiments had received ethical approval from the Aix-Marseille Ethics Committee (approval 2014-12-3-05), in accordance with the declaration of Helsinki.

Visual stimuli were generated using PsychoPy 1.85.2~\citep{Peirce19} on a Mac running OS 10.6.8 and displayed on a 22" Samsung SyncMaster 2233 monitor with $1680\times 1050$ pixels resolution at 100~\si{\Hz} refresh rate. Experimental routines also written using PsychoPy controlled the stimulus display. Observers sat 57~\si{\cm} from the screen in a dark room.

The moving target used in our experiments was a white ring ($0.35\degree$ outer diameter and $0.27\degree$ inner diameter) with a luminance of $102~cd/m2$ that moved horizontally on a grey background (luminance $42~cd/m^2$). Each trial started with a central fixation point displayed for a random duration drawn from a uniform distribution ranging between $400$ and $800~\ms$. Then a fixed-duration $300~\ms$ gap occurred between the offset of the fixation point and the onset of the moving target, which was presented slightly offset from the fixation location~\citep{Rashbass1961} and immediately started moving horizontally at a constant speed of $15\degree/s$, either to the right or to the left for $1000~\ms$. The probability $p$ of rightward trials was a time-varying random variable which was constant within an epoch of the sequence of a given random size (see main text for the description of the generative model).

The paradigm included two experimental sessions performed on two distinct (in general consecutive) days by each participant. The two sessions involved the presentation of the same sequence of trials, while collecting a different behavioral response: explicit rating judgments in the first session (the \textit{bet} experiment), and eye movement recordings in the second session. Asked after the experiment, no observer noticed that the same (pseudo-)random sequence of target directions was used in both experiments.

\subsection{Eye movements experiment}
Eye movements were recorded continuously with an eye tracking system (Eyelink 1000, SR Research Ltd., sampled at 1000 Hz), using the Python module Pylink 0.1.0 provided by PsychoPy. Horizontal and vertical eye position data were transferred, stored, and analyzed offline using programs written using Jupyter notebooks.
To minimize measurement errors, the participant's head movements were restrained using a chin and forehead rest, so that the eyes in primary gaze position were directed towards the center of the screen. In order to enforce accuracy in gaze position and tracking, we implemented an automatic procedure of fixation control. If the distance between the gaze position and the central fixation point during the fixation epoch exceeded $2\degree$ of visual angles, the fixation point started flickering and the counter for the fixation duration was reset to $0$.

The recorded horizontal and vertical raw gaze position data were numerically differentiated to obtain velocity measures. We adopted an automatic conjoint acceleration and velocity threshold method (the default saccade detection implemented by SR Research) to detect ocular saccades. Saccades and eye-blinks were excluded from eye velocity traces (and replaced by \textit{Not-a-Number} values in the numerical arrays) before trial averaging and data fitting for the extraction of the oculomotor parameters of interest.
In order to extract the relevant parameters of the oculomotor responses, we developed new tools based on a best-fitting procedure of predefined oculomotor patterns and in particular the typical smooth pursuit velocity profile that was recorded in our experiment. A piecewise-defined function was fitted to the different epochs of the eye velocity traces: a constant function during fixation, a ramp-like linear function during smooth pursuit anticipation, an increasing sigmoid-function during the initiation of visually-guided smooth pursuit, reaching its saturating value during the pursuit steady-state. This analysis was applied to each trial individually and it allowed in particular to estimate the anticipatory smooth pursuit velocity.
%are provided in Appendix \ref{app:em}
Some trials were excluded from the analysis as the proportion of missing data-points, due to eye blinks or saccades was considered too large, namely when the missing data exceeded $45~\ms$ during the gap or one third of the total target motion epoch ($4.36\%$ of all trials). In addition, trials were also excluded when the eye-movement fitting procedure did not converge, after visual inspection, to a satisfactory match with the data ($3.25\%$ of all trials). The python scripts used to analyze eye movements are available at \url{https://github.com/invibe/ANEMO}.

\subsection{The Bet experiment}
The aim of the Bet experiment was to collect data related to the individual conscious estimates of the probability of target motion direction. At the beginning of each trial, before the presentation of the moving target, participants had to answer to the question \textit{ ``How sure are you that the target will go left or right''}. This was performed by adjusting a cursor on the screen using the mouse (see Figure~\seeFig{intro}-C). The cursor could be placed at any point along a horizontal segment representing a linear rating scale with three ticks labeled as \textit{ ``Left''}, \textit{``Right''} (at the extreme left and right end of the segment respectively), and \textit{``Unsure''} in the middle. Participants had to validate their choice by clicking on the mouse left-button and the actual target motion was shown thereafter. The rationale to collect rating responses on a continuous scale instead of a simple binary prediction (Right/Left) was to be able to infer the individual estimate of the direction bias at the single trial scale (in analogy to the continuous interval for the strength of aSPEM velocity).
%
We called this experiment the \textit{ ``Bet''} experiment, as participants were explicitly encouraged to make reasonable rating estimates, as though they had to bet money on the next trial outcome. Every $50$ trials, a \textit{``score''} was displayed on the screen, corresponding to the proportion of correct direction predictions (Right or Left of the \textit{``Unsure''} tick) weighted by the confidence attributed to each answer (the distance of the cursor from the center).


%\subsection{Eye movements analysis}
%The data analyses were implemented using the Python libraries numpy, pandas and pylab. All the scripts for data analysis, as well as for stimulus presentation, data collection, and preparation of figures are available on github at \url{https://github.com/chloepasturel/AnticipatorySPEM}.

\section*{Acknowledgments}
\Acknowledgments
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%{\tiny
%\printbibliography
%}
\bibliography{Pasturel_etal2020}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Supporting information}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\subsection{Appendix 1: Analysis of eye movements}
%\label{app:em}
%\AM{I would eliminate this appendix}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%
%%: FIGURE 1B fig:introB~\seeFig{introB}
%\begin{figure}%[b!]
%\centering{
%%\includegraphics[width=\linewidth]{figure1}
%\begin{tikzpicture}%[thick,scale=1, every node/.style={scale=1} ]
%\node [anchor=north west] (img2) at (0.51\linewidth,.33\linewidth){\includegraphics[width=0.33\linewidth]{1_B_Trace_moyenne}};
%%\draw [anchor=north west] (0.000\linewidth, .62\linewidth) node {$\mathsf{(A)}$};
%%\draw [anchor=north west] (0.505\linewidth, .62\linewidth) node {$\mathsf{(B)}$};
%\end{tikzpicture}
%}
%\caption{
%%\emph{Anticipatory SPEM (aSPEM): experimental design and results.}
%%\textbf{(A)}~TODO: MOVE THIS PART IN THE MAIN TEXT:
%\emph{\AM{I GUESS THIS FIGURE WILL BE ELIMINATED...?} Behavioral experiments: anticipatory Smooth Pursuit Eye Movements}
%% TODO : dans notre plot on a des range de valeurs pour p :-/ -> C'EST PARCE QUE LE PLOT REPRESENTE DEJA LA VITESSE MOYENNE OBTENUE DANS LA TACHE AVEC SWITCHES!
%%\textbf{(B)}~
%We first replicated the results of~\citet{Montagnini2010},
%in which human observers were presented with several $500$ trials blocks of horizontal target motion with a block-dependent direction probability bias, and they were asked to track the target with their gaze.
%An important difference with that study is that their experiment was made of several blocks of target motion with a block-dependent direction probability bias selected among a few predetermined values, whereas in the present study $p$ can change in each different block as a random number between $0$ and $1$.
%Horizontal eye velocity traces
%averaged, over rightward trials and all observers, for five different intervals of the direction bias.
%These traces are aligned to the onset of the moving dot (the $0$ on the $x-axis$).
%Saccades were removed using a thresholding method~(see~\seeApp{em}) and
%the shaded area around the traces represent one standard deviation over all velocity samples.
%In the unbiased or weakly biased condition ($0.4\leq p\leq 0.6 $), one can distinguish
%a visually-driven component (after a latency of $\approx 100~\ms$)
%which corresponds to the standard Smooth Pursuit Eye Movement (SPEM) initiation.
%When introducing a bias in the direction,
%the average eye velocity progressively ramps
%in the direction of the expected velocity, starting during the GAP phase and well before the visually-driven component:
%This phase is the anticipatory SPEM (aSPEM).
%As previously reported~\citep{Montagnini2010, SantosKowler2017,Damasse18},
%the slope of this ramp correlates with the strength of the bias.
%In this study, we extended this experiment in three aspects.
%First, we used probability biases in a continuous space,
%as drawn from a prior distribution for the values of $p$.
%Second, we generated the random sequence of trials
%by concatenating random-length blocks (see \seeFig{results_psycho}),
%to avoid potential confounds related to the previously used blocked-design.
%}
%\label{fig:introB}
%\end{figure}
%
%I show here a typical velocity traces for one participant / 2 trials
%
%- x-axis is time in milliseconds aligned on target onset,
%and we show respectively from left to right the fixation in gray,
%the GAP in pink (300\ms) and the run in light gray.
%
%- y-axis is the velocity as computed as the gradient of position.
%Remark that the eyelink provides with the periods of saccades or
% blinks that we removed from the signal. it is quite noisy and
% to complement existing signal processing methods,
% Chloe implemented a robust
%
%- fitting method which allows to extract some key components of
%the velocity traces: maximum speed, latency, temporal inertia ($\tau$)
% and most interestingly acceleration before motion onset.
% We cross-validated that this method was giving similar results
% to other classical methods but in a more robust fashion/
%
%While being sensible to recording errors, this allows us to extract the
% anticipatory component of SPEMs and..

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Appendix : leaky integrator}
\label{app:leaky}
Given a series of observations $\{x_0^i\}_{0\leq i \leq t}$
with $\forall i, x_0^i \in \{0, 1 \}$, we defined

\eqs{
\hat{x_1}^{t} &= (1-1/\tau)^{t+1} \cdot \hat{x_1}^{t=0} + 1/\tau \cdot \sum_{0\leq i \leq t} (1 - 1/\tau)^{i} \cdot x_0^{t-i}\\
			  &= (1-h)^{t+1} \cdot \hat{x_1}^{t=0} + h \cdot \sum_{0\leq i \leq t} (1 - h)^{i} \cdot x_0^{t-i}
}
%is true for $t=1$: by definition $\hat{x_1}^{0}=x_0^0$ and
%\eq{
%\hat{x_1}^{1} = (1 - \rho) \cdot \hat{x_1}^{0} + \rho \cdot x_0^1
%}
If we write it for trial $t-1$, we have

\eqs{
\hat{x_1}^{t-1}	&= (1-h)^{t} \cdot \hat{x_1}^{t=0} + h \cdot \sum_{0\leq i \leq t-1} (1 - h)^{i} \cdot x_0^{t-1-i} \\
                &= (1-h)^{t} \cdot \hat{x_1}^{t=0} + h \cdot \sum_{1\leq j \leq t} (1 - h)^{j-1} \cdot x_0^{t-j} \\ % j = i+1
(1 - h) \cdot \hat{x_1}^{t-1} &= (1-h)^{t+1} \cdot \hat{x_1}^{t=0} +  h \cdot \sum_{1\leq i \leq t} (1 - h)^{i} \cdot x_0^{t-i}
                }
As such, the integrative formula above becomes an iterative relation:

\eqs{
\hat{x_1}^{t}	&= (1-h)^{t+1} \cdot \hat{x_1}^{t=0} + h \cdot \sum_{0\leq i \leq t} (1 - 1/\tau)^{i} \cdot x_0^{t-i} \\
				&= (1-h)^{t+1} \cdot \hat{x_1}^{t=0} + h \cdot x_0^{t} + h \cdot \sum_{1\leq i \leq t} (1 - h)^{i} \cdot x_0^{t-i} \\
				&= h \cdot x_0^{t} + (1 - h) \cdot \hat{x_1}^{t-1} \\
}
such that finally

\eq{
\hat{x_1}^{t} = (1 - h) \cdot \hat{x_1}^{t-1} + h \cdot x_0^t
}
As such, the definitions in~\seeEq{leaky} and~\seeEq{leaky2} are equivalent.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The Bernoulli, binomial and Beta distributions}
\label{app:beta}

Let us define some basic concepts. A Bernoulli trial is the outcome of a binary random variable $x$ knowing a probability bias $\mu$ (with $0 \geq \mu \geq 1$) and can be formalized as:

\eq{
Pr(x | \mu) = \mu^x \cdot (1-\mu)^{1-x}
}

The binomial distribution is defined as the probability that the sum $X$ of $\nu$ independent Bernoulli trials is $k$:

\eq{
\Pr(k;\nu,\mu) = \Pr(X = k) = {\nu\choose k} \cdot \mu^k \cdot (1-\mu)^{\nu-k}
}

Knowing such a model for $X$ it can be of interest to find an estimate of the parameter of the Bernoulli trial, that is of the probability bias $\mu$. This distribution function is called the conjugate of the binomial distribution which is the Beta-distribution. For example, the beta distribution can be used in Bayesian analysis to describe initial knowledge concerning probability of success such as the probability that a product will successfully complete a stress test. The beta distribution is a suitable model for the random behavior of percentages and proportions.

It is usually defined using shape parameters $\alpha$ and $\beta$:

\eq{
Pr(p | \alpha, \beta ) = \frac{1}{B(\alpha, \beta)} \cdot p^{\alpha -1} \cdot (1-p)^{\beta - 1}
}
Note that here, the variable is the probability bias $p$. The normalization constant $B(\alpha, \beta)$ is given by the beta function. By definition:

\eqs{
        \alpha &= \mu \cdot \nu \\
        \beta  &= (1-\mu) \cdot \nu
    }
Inversely, $\alpha + \beta = \nu$ and $\mu = \frac{\alpha}{\alpha +\beta} = 1- \frac{\beta}{\alpha + \beta}$


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Appendix 2: BBCP algorithm}
\label{app:bcp}

To summarize, the algorithm that we presented is an implementation of the  ``Bayesian Online Changepoint Detection'' by~\citet{AdamsMackay2007}
extended for the class of binary inputs. Using the definition of the run-length~\seeSec{Binary_Bayesian_change_point}, the flow-chart of the algorithm is:

\begin{enumerate}
	\item Initialize
	\begin{itemize}
		\item    $P(r_0>0)= 0$ or $P(r_0=0)=1$ and
		\item    $\mu^{(0)}_0 = \mu_{prior}$ and $\nu^{(0)}_0 = \nu_{prior}$
	\end{itemize}

	\item     Observe New Datum $x_0^t \in \{ 0, 1 \}$,

	\begin{enumerate}

		\item    Evaluate Predictive Probability $\pi^{(r)}_{t} = P(x_0^t |\mu^{(r)}_t,\nu^{(r)}_t)$.
	    \item    Calculate Growth Probabilities $P(r_t=r_{t-1}+1, x_{0:t}) = P(r_{t-1}, x_{0:t-1}) \pi^{(r)}_t (1-h)$,
	    \item    Calculate Changepoint Probabilities $P(r_t=0, x_{0:t})= \sum_{r_{t-1}} P(r_{t-1}, x_{0:t-1}) \pi^{(r)}_t \cdot h$,
	    \item    Calculate Evidence $P(x_{0:t}) = \sum_{r_{t-1}} P (r_t, x_{0:t})$,
	    \item    Determine Run Length Distribution $P (r_t | x_{0:t}) = P (r_t, x_{0:t})/P (x_{0:t}) $.
	\end{enumerate}

	\item     Update sufficient statistics
		\begin{itemize}
			\item  at a switch  $\mu^{(0)}_{t+1} = \mu_{prior}$, $\nu^{(0)}_{t+1} = \nu_{prior}$,
			\item  else, $\nu^{(r+1)}_{t+1} = \nu^{(r)}_{t} + 1$ and $\nu^{(r+1)}_{t+1} \cdot \mu^{(r+1)}_{t+1} = \nu^{(r)}_{t} \cdot \mu^{(r)}_{t} + x_0^t$.
		\end{itemize}
	\item     Return to step $2$.
\end{enumerate}


In the following, we  detail some intermediate steps and highlight some key differences with their implementation. We also provide a python implementation of the algorithm, which is openly available on \href{https://github.com/laurentperrinet/Bayesianchangepoint}{GitHub}.

\subsubsection{Initialization}
%to represent our belief at trial $t$
%or to determine the pdf for $x_1^t$ as a mixture of Beta distributions:
%\eqa{
%\hat{x_1}^{t} = \sum_{r^{t}} Pr(x_1^t | r^{t}, x_0^{0:t}) \cdot Pr(r^{t} | x_0^{0:t})
%}

% in Python.
%
%
%* adapted from https://github.com/JackKelly/Bayesianchangepoint by Jack Kelly (2013)

%
%* This code is based on the [MATLAB implementation](http://www.inference.phy.cam.ac.uk/rpa23/changepoint.php) provided by Ryan Adam. Was available at http://hips.seas.harvard.edu/content/Bayesian-online-changepoint-detection
%
% * full code @ https://github.com/laurentperrinet/bayesianchangepoint are available .


Note that the prior distribution is itself a Beta distribution:
$\Pp\propto B(p; \mu_{prior}, \nu_{prior})$.
It will by symmetry be unbiased: $\mu_{prior}=.5$.
Concerning the shape, it can be for instance
the uniform distribution $\Uu$ on $ [ 0, 1 ] $, that is $\nu_{prior}=2$ or
Jeffrey's prior $\Jj$, that is $\nu_{prior}=1$.
We chose the latter for the generation of trials
as the uniform distribution would yield more sample around $.5$.
Qualitatively, this would result in more difficult task in discriminating a probability bias from another.
Jeffrey's prior was more adapted to that task.
%Wikipedia: Beta(1/2, 1/2): The arcsine distribution probability density was proposed by Harold Jeffreys to represent uncertainty for a Bernoulli or a binomial distribution in Bayesian inference, and is now commonly referred to as Jeffreys prior: p−1/2(1 − p)−1/2. This distribution also appears in several random walk fundamental theorems


\subsubsection{Prediction: run-length distribution}

The steps to achieve the update rule are:

 \eqs{
%\hat{x_1}^{t} =
Pr(x_0^t | x_0^{0:t-1}) &= \sum_{r^{t}} Pr(x_0^t | r^{t}, x_0^{0:t-1}) \cdot  \beta^{(r)}_t \\
Pr(x_0^t | x_0^{0:t-1}) &= \sum_{r^{t}} Pr(x_0^t | r^{t}, x_0^{0:t-1}) \cdot  Pr(r^{t} | x_0^{0:t-1})\\
\text{with} \quad Pr(r^{t} | x_0^{0:t-1}) &\propto \sum_{r^{t-1}}  Pr(r^t | r^{t-1}) \cdot  Pr(x_0^t | r^{t-1}, x_0^{0:t-1}) \cdot  Pr(r^{t-1} | x_0^{0:t-2})
}
Finally we obtain~\seeEq{pred_node}:
\eq{
\beta^{(r)}_t \propto \sum_{r^{t-1}}  Pr(r^t | r^{t-1}) \cdot  Pr(x_0^t | r^{t-1}, x_0^{0:t-1}) \cdot  \beta^{(r)}_{t-1}
}


\subsubsection{Prediction: sufficient statistics}

The recursive formulation in~\seeEq{update_nu} and~\seeEq{update_mu} comes from the expression

 \eq{
\nu^{(r)}_{t} \cdot \mu^{(r)}_{t} = \sum_{i=t-r-1}^{t-1} x_0^i % + 2 - \nu_{prior}
}
and therefore

\eqs{
\nu^{(r+1)}_{t+1} \cdot \mu^{(r+1)}_{t+1} 	&= \sum_{i=t+1-r-1-1}^{t+1-1} x_0^i  \\% + 2 - \nu_{prior} \\
											&=  \sum_{i=t-r-1}^{t} x_0^i  \\% + 2 - \nu_{prior}\\
											&= \nu^{(r)}_{t} \cdot  \mu^{(r)}_{t} +  x_0^t
}

%\subsubsection{Readout}
%\label{app:readout}
%
%Perform Prediction $P (x_0^{t+1} | x_{0:t}) = P (x_0^{t+1}|x_{0:t} , r_t) P (r_t|x_{0:t})$,
%
%Can we get  $P (x_2^{t+1} | x_{0:t}) $ ? would be nice to see the inferrence of surprise / would fit with pupil size...

\subsubsection{Quantitative evaluation}

To quantitatively evaluate our results with respect to another probability bias, we computed in~\seeEq{KL} the cost as the Kullback-Leibler divergence  $\KL{\hat p}{p}$ between samples $\hat p$ and model $p$ under the hypothesis of a Bernoulli trial:

\begin{equation}
\KL{\hat p}{p} = \hat{p} \cdot\log\pa{\frac{\hat p}{p}} + (1-\hat p)\cdot \log\pa{\frac{1-\hat p}{1-p}}.
\end{equation}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Appendix: likelihood function}
\label{app:likelihood}
%\seeApp{likelihood}
% TODO : check http://www.princeton.edu/~rcw2/papers/WilsonEtAl_PLOSCompBiol2013.pdf and Bernoulli case + evaluation
%cf p33 de 2018-02-12 journal club Bayesian changepoint chloe.pdf
%cf p52 de 2017-10-05 chloe inverting the process rem jb.pdf


We want to compute $\Ll(r | o) = Pr(o | p, r)$ where $o \in \{ 0, 1 \}$ such that we can evaluate Predictive Probability $\pi_{0:t} = P(x_0^t  |\mu^{(r)}_t,\nu^{(r)}_t)$ in the algorithm above with $\mu^{(r)}_t$ and $\nu^{(r)}_t$ the sufficient statistics at trial $t$ for node $(r)$.
The likelihood of observing $o=1$ is that of a binomial (conjugate of a Beta distribution) of
	\begin{itemize}
		\item  mean rate of choosing hypothesis $o=1$ equal to $\frac{p\cdot r + o}{r+1}$,
		\item number of choices where  $o=1$ equals to $p\cdot r+1$.
	\end{itemize}
More generally, by observing $o$, the new rate is $p^{'} = \frac{p\cdot r + o}{r+1}$.

\subsubsection{Mathematical derivation}

The likelihood will give the probability of this novel rate given the known parameters and their update (in particular $r^{'}=r+1$):

\eqs{
L(r | o)&={(\frac{p\cdot r + o}{r+1})}^{p\cdot r + o} \cdot (1-\frac{p\cdot r + o}{r+1})^{r + o - (p\cdot r + o)} \\
&= \frac{1}{({r+1})^{r+1}} \cdot {(p\cdot r + o)}^{p\cdot r + o}  \cdot {((1- p)\cdot r + 1- o)}^{(1- p)\cdot r + 1- o} \\
%&= \frac{ (1-o) \cdot {(p\cdot r)}^{p\cdot r}  \cdot {((1- p)\cdot r + 1)}^{(1- p)\cdot r + 1}
%+ o \cdot {(p\cdot r + 1)}^{p\cdot r + 1}  \cdot {((1- p)\cdot r)}^{(1- p)\cdot r}
% }{
% {(p\cdot r + 1)}^{p\cdot r + 1}  \cdot {((1- p)\cdot r )}^{(1- p)\cdot r }  +
%  {(p\cdot r )}^{p\cdot r }  \cdot {((1- p)\cdot r + 1)}^{(1- p)\cdot r + 1}
%}  \\
}
since both likelihood sum to 1, the likelihood of drawing $o$ in the set $\{ 0, 1 \}$   is equal to

%\AM{the change in the argument of $\Ll$}
\eqs{
\Ll(r | o)&=\frac{L(r | o)}{L(r | o=1) + L(r | o=0)}  \\
&= \frac{ {(p\cdot r + o)}^{p\cdot r + o}  \cdot {((1- p)\cdot r + 1- o)}^{(1- p)\cdot r + 1- o} }{
 {(p\cdot r + 1)}^{p\cdot r + 1}  \cdot {((1- p)\cdot r )}^{(1- p)\cdot r }  +
  {(p\cdot r )}^{p\cdot r }  \cdot {((1- p)\cdot r + 1)}^{(1- p)\cdot r + 1}
}  \\
&= \frac{ (1-o) \cdot {(p\cdot r)}^{p\cdot r}  \cdot {((1- p)\cdot r + 1)}^{(1- p)\cdot r + 1}
+ o \cdot {(p\cdot r + 1)}^{p\cdot r + 1}  \cdot {((1- p)\cdot r)}^{(1- p)\cdot r}
 }{
 {(p\cdot r + 1)}^{p\cdot r + 1}  \cdot {((1- p)\cdot r )}^{(1- p)\cdot r }  +
  {(p\cdot r )}^{p\cdot r }  \cdot {((1- p)\cdot r + 1)}^{(1- p)\cdot r + 1}
}
}
This can also be written by isolating the part which depends on $o$ and for a given run-length and knowing sufficient statistics describing the sufficient statistics at each node $r$:

\eql{
\Ll(r | o) = \frac{1}{Z} \cdot {(p \cdot r + o)}^{p \cdot r + o} \cdot {((1- p)\cdot r + 1- o)}^{(1- p)\cdot r + 1- o}
}
with $Z$ such that $\Ll(r | o=1) + \Ll(r | o=0)=1$, that is~\seeEq{likelihood}.

\subsubsection{Python code}

\begin{lstlisting}
def likelihood(o, p, r):
    """
    Knowing $p$ and $r$, the sufficient statistics of the beta distribution $B(\alpha, \beta)$ :
    $$
        alpha = p*r
        beta  = (1-p)*r
    $$
    the likelihood of observing o=1 is that of a binomial of

        - mean rate of choosing hypothesis "o=1" = (p*r + o)/(r+1)
        - number of choices where  "o=1" equals to p*r+1

    since both likelihood sum to 1, the likelihood of drawing o in the set {0, 1}
    is equal to

    """
    def L(o, p, r):
        P =  (1-o) * ( 1. - 1 / (p * r + 1) )**(p*r) * ((1-p) * r + 1)
        P +=  o * ( 1. - 1 / ((1-p) * r + 1) )**((1-p)*r) * (p * r + 1)
        return  P

    L_yes = L(o, p, r)
    L_no = L(1-o, p, r)
    return L_yes / (L_yes + L_no)

\end{lstlisting}

\subsubsection{Properties}
This function has some properties, notably symmetries:
	\begin{itemize}
		\item for certain outcomes, $\forall r >0$, $\Ll(o|p=0, r)=1-o$ and $\Ll(o|p=1, r)=o$,
		\item if $r=0$, the likelihood is uniform $\Ll(o)=1/2$,
		\item $Pr(o | p, r)=Pr(1-o | 1-p, r)$.
	\end{itemize}

Note also that as $r$ grows, the likelihood gets sharper.

% TODO : put figure from https://github.com/laurentperrinet/Bayesianchangepoint/blob/master/notebooks/test_tracebase.ipynb

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%\subsection{Appendix 4: Supplementary psychophysical results}
%\label{app:results_psycho}
%%\seeApp{results_psycho}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}%
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&