-
Notifications
You must be signed in to change notification settings - Fork 0
/
ch_LFT_MCMC.tex
1010 lines (939 loc) · 45.9 KB
/
ch_LFT_MCMC.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{LFT: Computer Implementation}\label{ch:MCMC}
As discussed in \secref{sec:haar}, expectation values of
physical observables $X$ in pure $\SU(N_c)$ LGT
are given by functional integrals
\begin{equation}\label{eq:exobs}
\ev{X}=\frac{1}{Z}\int\DD{U}e^{-S(U)}X(U).
\end{equation}
Even though the integral~\eqref{eq:exobs} is well-defined on a lattice
because there are
finitely many sites, it is not feasible to evaluate it numerically; even
relatively small lattices have $4\times10^4$ links. The goal of our
simulation is to estimate $\ev{X}$ by randomly generating configurations,
distributed with probability $e^{-S}$,
and on each configuration, making a measurement $X_i$. The average
\begin{equation}\label{eq:arithmeticaverage}
\bar{X}=\frac{1}{\nconf}\sum_{i=1}^{\nconf} X_i
\end{equation}
serves as the estimator.
The strategy of carrying out a statistical computation using random sampling
goes under the name of {\it Monte Carlo}\index{Monte Carlo}
algorithms\footnote{The idea of Monte Carlo algorithms dates back to the 1940s.
Stanislav Ulam wanted to know the probability of winning a game of Solitaire.
The calculation turned out to be too complicated to do by hand, so he wanted to
estimate this by playing repeated games, then calculating
$$
{\rm win\ chance}\approx\frac{\rm number\ of\ wins}{\rm number\ of\ games}.
$$
Of course this is tremendously tedious; it is much more appropriate for a
computer. Other scientists working with him at Los Alamos such as John von
Neumann and Nicholas Metropolis are early pioneers of this method.}.
The strategy of drawing these random configurations is to use a
Markov chain, and hence the backbone of a lattice calculation is
often referred to as {\it Markov chain Monte Carlo} (MCMC).
Further details on MCMC methods can be found in, for instance,
Ref.~\cite{berg_markov_2004,gattringer_quantum_2010}.
\section{Markov chain Monte Carlo}\label{sec:MCMCintro}
To generate our configurations, we start from some arbitrary configuration
$C_0$ and construct a stochastic sequence of configurations.
Configuration $C_i$ is generated based on\index{update}
configuration $C_{i-1}$, which we call an {\it update} or {\it Markov
step}
The result is a {\it Markov chain}
\begin{equation}
C_0\to C_1\to C_2\to...
\end{equation}
of configurations.
\index{MCMC}
{\it Markov chain Monte Carlo} (MCMC) is characterized by the probability
$W^{CC'}\equiv\pr{C'|C}$, the probability to jump to configuration
$C'$ given that the system started in configuration $C$.
The MCMC {\it transition matrix}\index{transition matrix}
\begin{equation}
W\equiv\Big(W^{CC'}\Big)
\end{equation}
is constructed to bring the system to {\it equilibrium}.
In equilibrium, the chain should have no sinks or sources of probability,
which means that the probability of jumping into a configuration $C'$
should be the same as jumping out of $C'$. This property
is called {\it balance}\index{balance}
\begin{equation}\label{eq:balance}
\sum\limits_CW^{CC'}\pr{C}=
\sum\limits_CW^{C'C}\pr{C'},
\end{equation}
with the LHS representing the total probability to end up in $C'$ and
the RHS representing the probability to transition out of $C'$.
If $W$ satisfies
\begin{enumerate}
\item {\it ergodicity},\index{ergodic} i.e.
\begin{equation}\label{eq:ergodicity}
\pr{C}>0\;\;\text{and}\;\;\pr{C'}>0\;\;\Rightarrow\;\;
\exists\;n\in\N\;\;\text{s.t.}\;\;\big(W^n\big)^{CC'}>0,
\end{equation}
which means that every possible configuration is accessible from
any other configuration in a finite number\footnote{In practice,
it's important that $n$ is not too large, otherwise some configurations
will be sampled poorly. In a sense, this is the problem with
topological freezing\index{topological!freezing}
in \secref{sec:toplat}.} of Markov steps;
\item {\it normalization},\index{normalization} i.e.
\begin{equation}
\sum\limits_{C'}W^{CC'}=1,
\end{equation}
which means that starting from configuration $C$ you have
to move somewhere;
\item and balance,
\end{enumerate}
then the Markov process is guaranteed to bring the ensemble toward
equilibrium. Using normalization, one finds from \equatref{eq:balance}
\begin{equation}\label{eq:eigenProb}
\sum\limits_CW^{CC'}\pr{C}=\pr{C'},
\end{equation}
which shows that the equilibrium distribution is a fixed point of
the Markov chain\footnote{Above, we called $W^{CC'}$ a transition matrix.
Equation~\eqref{eq:eigenProb} shows that we can think of the equilibrium
distribution as an eigenvector of $W$ with eigenvalue 1. Each component
of this vector thus yields the probability of a particular configuration.}.
\index{Metropolis step}
Here and the following two sections,
we omit the Lorentz index and space-time point from link variables to
avoid clutter. We use $U$ to indicate the to-be-updated link,
$U^\sqcup$ to indicate the staple matrix attached to $U$, and
$U'$ to indicate a trial link. We will use the Boltzmann
distribution $\pr{C}\propto e^{-S_C}$.
One trivial way to satisfy the balance condition~\eqref{eq:balance} is
to find an update that satisfies it term-by-term. For such an update,
\begin{equation}\label{eq:detailedbalance}\index{detailed balance}
W^{CC'}\pr{C}=W^{C'C}\pr{C'}.
\end{equation}
This property is known as {\it detailed balance}.
One of the most well-known Monte Carlo updates that satisfies detailed
balance is the {\it Metropolis algorithm}~\cite{metropolis_equation_1953}.
In the Metropolis algorithm, a trial configuration $C'$ is selected
with some probability distribution $\prt{C'|C}$. Then $C'$
is accepted with likelihood
\begin{equation}\label{eq:metupdate}
\pr{C\to C'}=\min\left[1,\frac{\prt{C|C'}e^{-S_{C'}}}
{\prt{C'|C}e^{-S_C}}\right],
\end{equation}
where $S_C$ is the action corresponding to $C$.
If $C'$ is rejected, the unchanged configuration is counted
in the Markov chain. The total probability
to transition from $C$ to $C'$ using this strategy is
\begin{equation}\label{eq:MCMCtransition}
W^{CC'}=\prt{C'|C}\pr{C\to C'},
\end{equation}
and using this fact, one
can show that this update satisfies detailed balance.
The Metropolis algorithm is in some sense the most standard Markov chain
algorithm used in lattice calculations, with the acceptance step
\equatref{eq:metupdate} being the most crucial part, in the sense that
it guarantees your configuration is drawn with the correct probability.
A Markov algorithm that concludes with the acceptance step is
said to be an\index{exact algorithm} {\it exact algorithm}\footnote{One
could imagine update steps that may not be guaranteed to draw configurations
from the desired distribution. This can be corrected by simply adding
the acceptance step at the end, and the algorithm is still exact.
One may neglect this acceptance step at the end, if one has reason to
believe the configuration is still drawn appropriately, but in such
instances the algorithm is not exact anymore.}.
In general, one could choose a trial configuration uniformly at random
from $\SU(N_c)$. On the other hand, as discussed above, if the trial
configuration increases the action substantially, then by
\equatref{eq:metupdate}, this trial is exponentially unlikely to
be accepted. If we are constantly suggesting such configurations,
we will make little movement in configuration space.
What makes an update efficient then is to suggest configurations that
have reasonable acceptance probabilities to begin with. This
strategy is often called {\it importance sampling}\index{importance sampling}.
\section{Update: Heat bath}\index{heat bath}
The {\it heat bath} (HB) algorithm takes this idea to the extreme. Here,
we construct our trial probability distribution to draw from the
Boltzmann distribution to begin with. Thus no formal accept/reject
step is required\footnote{The Kennedy-Pendleton algorithm has a sort
of accept/reject step on one of the random variables to ensure that
variable is drawn from the correct distribution.}. For pure gauge
simulations, it is common practice to generate a new configuration
an old one by updating one link.
For the $\SU(N_c)$ HB algorithm, the trial link distribution is
\begin{equation}\label{eq:PTdist}
\dd{\prt{U'}}\propto \dd U'\exp\left(\frac{\beta}{N_c}\,\tr\,
U'U^\sqcup\right).
\end{equation}
This construction also satisfies detailed balance.
Numerically, the trick is to map \equatref{eq:PTdist} to a set
of real, random variables, which have to be specially
generated to reproduce the desired distribution.
A common strategy for doing this for the gauge group $\SU(2)$,
often referred to as the {\it Kennedy-Pendleton}\index{heat
bath!Kennedy-Pendleton} algorithm, is explained in an elementary way
in Ref.~\cite{gattringer_quantum_2010}; the original papers are
Refs.~\cite{fabricius_heat_1984,kennedy_improved_1985}.
When $N_c>2$, this strategy is extended by leveraging the fact
that $\SU(2)\leq\SU(3)$; in general, an $\SU(N_c)$ matrix
can be broken down into $N_c(N_c-1)/2$
$\SU(2)$ submatrices~\cite{cabibbo_new_1982}. One generates each
$\SU(2)$ submatrix using the above HB, then reconstructs
the $\SU(N_c)$ matrix from those.
%Single link Metropolis or HB updates of links carried out in a systematic
%(as opposed to random) order fulfill balance, but do not
%fulfill detailed balance.
\section{Update: Over-relaxation}\index{over-relaxation}
An additional useful update for $\SU(N_c)$ is the
{\it over-relaxation}\index{over-relaxation} (OR) update.
Adler introduced OR algorithms \cite{adler_over-relaxation_1981} and
they were further developed by Creutz \cite{creutz_overrelaxation_1987}
and others. The idea of the OR algorithm is to speed up relaxation
by generating a group element ``far away" from $U$ without
destroying equilibrium, which is here achieved by keeping the action constant.
This update is not random in any way, but rather serves to move you around
in configuration space without impacting the acceptance rate too much.
More precisely let $U\in\SU(N_c)$ and suppose we have some method of
choosing another link variable $U_0$ that maximizes
the action for this staple. We assume that this method of selection has no
dependence on $U$. Pick some element $V\in\SU(N_c)$ such that $U=VU_0$; viewed
in this way, $U$ is ``on one side of $U_0$," and the element
``on the other side" is $U'=V^{-1}U_0$. Note that
\begin{equation}
V=U U_0^{-1},
\end{equation}
which implies
\begin{equation}
U' = U_0 U^{-1} U_0.
\end{equation}
This manner of constructing a new link variable $U'$, which generates
a group element ``far away" from $U$ without changing the action,
is what we mean by over-relaxation.
In principle an OR update should be more efficient than a Monte
Carlo update. This is because we chose the new link variable to be two
group elements away from the old one, thrusting us further
along configuration space. However unlike Metropolis updates, OR updates
only sample the subspace of constant action, and are therefore not ergodic.
Hence to ensure an approach to equilibrium, they must be supplemented with,
for instance, HB updates.
For $\SU(2)$, the update is given by
\begin{equation}\label{eq:ORupdate}
U\to U'=\frac{1}{\det U^\sqcup}\left(U^\sqcup UU^\sqcup\right)^\dagger.
\end{equation}
\begin{proposition}{}{OR}\label{prp:OR}
This update does not change the $\SU(2)$ Wilson action.
\begin{proof}
Since $\det(kA)=k^n\det A$ for any constant $k$ and $n\times n$ matrix $A$,
one can show that the sum of two $\SU(2)$ matrices is proportional
to an $\SU(2)$ matrix. Hence we can write
$$
U^\sqcup=u^\sqcup\sqrt{\det U^\sqcup}
$$
where $u^\sqcup\in\SU(2)$. After updating, the local contribution
to the Wilson action becomes
\begin{equation*}
\tr U'U^\sqcup =\frac{1}{\det U^\sqcup}\tr
\left(U^\sqcup UU^\sqcup\right)^\dagger
=\tr U^\sqcup U\left(u^\sqcup\right)^\dagger u^\sqcup
=\tr U^\sqcup U,
\end{equation*}
which is what it was originally.
\end{proof}
\end{proposition}
This simple
behavior is special to $\U(1)$ and $\SU(2)$ LGT.
For $N_c>2$, we again choose to do our efficient updates on
$\SU(2)$ submatrices of the original $\SU(N_c)$ matrix.
\section{Update: Over-relaxation gauge fixing}\label{sec:ORgaugefix}
\index{gauge!fixing}
In general expectation values of gauge-dependent quantities are zero.
Nevertheless there are some rare occasions where it is useful to calculate
a quantity in a particular gauge, which can then be related to some
other physical quantity. In this section we cover an OR
algorithm that fixes gauge configurations to the {\it Coulomb} or
{\it Landau} gauges.\index{gauge!Coulomb}\index{gauge!Landau}
In the continuum such gauges are given by the condition
\begin{equation}
\partial_\mu A_\mu=0,
\end{equation}
where $\mu$ runs over only spatial directions for the Coulomb gauge, and
over all four directions for the Lorentz gauge. On the lattice, these
{\it gauge conditions}\index{gauge!condition} are discretized as
\begin{equation}
\Delta(x)\equiv A_\mu(x)-A_\mu(x-a\hat\mu) =0.
\end{equation}
\begin{proposition}{}{ORgf}
A gauge satisfying $\Delta(x)=0$ is achieved by extremizing
$$
F\equiv-\Re\tr\sum_{x,\mu} g(x) U_\mu(x) g^\dagger(x+a\hat{\mu})
$$
\begin{proof}
Any gauge matrix can be written
$$
g=e^{is^aT^a},
$$
where $s^a\in\R$ and $T^a$ are the generators of the Lie group. We want
to find the special $g_0$ that extremizes $F$; this is equivalent to finding
the $s_0$ that extremizes it. We will look in the vicinity of such a
gauge, i.e. look at gauges
$$e^{i\tau^a T^a}, \qquad\tau\equiv s-s_0$$
with $\tau$ small. Then
\begin{equation*}\begin{aligned}
F&=-\Re\tr\sum_{x,\mu}&&
e^{i\tau^a(x)T^a}\,e^{iaA^b_\mu(x)T^b}\,e^{-i\tau^c(x+a\hat\mu)T^c}\\
&\approx &&\big(\id+i\tau^a(x)T^a\big)
\big(\id+iaA_\mu^b(x)T^b\big)
\big(\id-i\tau^c(x+a\hat\mu)T^c\big)\\
&\approx && \big(\id+a\tau^c(x+a\hat\mu)A_\mu^b(x)T^bT^c
-a\tau^a(x)A_\mu^b(x)T^aT^b\big),
\end{aligned}\end{equation*}
where we have neglected $\order{a^2, a\tau, \tau^2}$ terms and use that
the $T^a$ are traceless. At the extremum, derivatives with respect to $\tau$
will vanish. Exchanging our dummy group summation indices and shifting
the first term by one site (which is allowed because of
periodic BCs) we find
\begin{equation*}\begin{aligned}
0=\pdv{F}{\tau^d}
&=-\Re\tr\sum_{x,\mu}&&a\pdv{\tau^d}
\left[A_\mu^b(x-a\hat\mu)\tau^c(x)-A_\mu^b(x)\tau^c(x)\right]T^bT^c\\
&= &&a\left[A_\mu^b(x-a\hat\mu)-A_\mu^b(x)\right]T^bT^d.
\end{aligned}\end{equation*}
In order for the above to vanish, the term in brackets has to vanish,
which completes the proof.
\end{proof}
\end{proposition}
The strategy from here is to minimize $F$. As you can clearly see from
its definition, this can be accomplished through the replacement
\begin{equation}\label{eq:gfORupdate}
U\to U'=gU_\mu g^\dagger,
\end{equation}
which is reminiscent of the OR update~\eqref{eq:ORupdate}. Hence we employ
again OR, i.e. we find the $g$ minimizing $F$ and make the
replacement~\eqref{eq:gfORupdate}. In a similar strategy as
\propref{prp:OR}, we look at the local update to $F$,
\begin{equation}
\Re\tr g(x)\sum_\mu\left[U_\mu(x)g^\dagger(x+a\hat\mu)
+U_\mu^\dagger(x-a\hat\mu)g^\dagger(x-a\hat\mu)\right]
\equiv\Re\tr g(x)K(x).
\end{equation}
As before, $K(x)=\sqrt{\det K(x)}\hat K(x)$, where $\hat K(x)\in\SU(2)$.
To extremize $F$, we search for the $g$ rotating everything under the
trace to $\id$, which is clearly
\begin{equation}
g(x)=K^\dagger(x)/\sqrt{\det K(x)}.
\end{equation}
We will never achieve the condition $\Delta(x)=0$ on the computer exactly.
A measure of how close we are to satisfying the gauge condition is
\begin{equation}
\theta\equiv\frac{1}{N_c N_s^3}\sum_x\tr\Delta(x)\Delta^\dagger(x).
\end{equation}
This combination is constructed so that $\theta\in\R$.
For more information
about using OR to gauge fix, see Ref.~\cite{mandula_efficient_1990}.
For information about how to implement this on multiple GPUs, see
e.g. Ref.~\cite{schrock_coulomb_2013}.
\section{Update: Molecular dynamics}\label{sec:MD}\index{molecular dynamics}
For this updating strategy, we leverage an isomorphism between
lattice field theory and a classical statistical mechanical
ensemble. We will use this
correspondence to find a clever way of suggesting a gauge
configuration, and then ultimately accept or reject it via
a standard Metropolis step.
To start let's imagine a theory with a scalar field $\phi$ on
the lattice. Then expectation values are computed like
\begin{equation}
\ev{X}=Z^{-1}\int\DD\phi e^{-S[\phi]}X[\phi],~~~~Z\equiv\int\DD\phi
e^{-S[\phi]}.
\end{equation}
Now let
\begin{equation}
H\equiv\frac{1}{2}\sum_{x\in\LAT}P(x)^2+S[\phi],
\end{equation}
where $P$ is a scalar field we will define in a moment. Since $P$ is some other
field, it follows that
\begin{equation}
\ev{X}=Z^{-1}\int\DD\phi\DD P e^{-H[\phi,P]}X[\phi],
~~~~Z\equiv\int\DD\phi\DD P e^{-H[\phi,P]}.
\end{equation}
Now let these fields evolve in Markov time $\tau$ and define
\begin{equation}
\dv{\phi(x)}{\tau}\equiv P(x).
\end{equation}
Then we have defined a classical system where $P$ is the momentum
conjugate to $\phi$. If we evolve $\phi$ and $P$ in $\tau$ according
to Hamilton's equations of motion, the energy is guaranteed to
be a constant. This procedure would yield 100\% acceptance if we
were able to integrate the equations of motion exactly. In practice
there is round off error, and the Metropolis step serves to protect
the probability distribution.
The above example is for a scalar field, which is a simplification of
Ref.~\cite{callaway_microcanonical_1982},
which does it for $\U(1)$. This was shortly thereafter extended
to fermions~\cite{polonyi_microcanonical_1983}.
This integration is completely deterministic, and hence does not
capture quantum fluctuations. To capture such quantum behavior,
one can intuitively imagine disturbing Hamilton's equations
with a random noise term. This is is an example of a
stochastic differential equation\footnote{A stochastic differential
equation is a differential equation where you include one
or more random forces. A simple example is the Langevin equation
$$
m \dv{v}{t}=-\lambda v+\eta(t)
$$
describing the Brownian motion of a particle of mass $m$
and velocity $v$ in a fluid. Here $\lambda$ is the drag, and
$\eta$ is a Gaussian-distributed noise term representing
random collisions of the particle with other particles in the fluid.
At a fixed value of the noise, such equations can be solved.
It becomes challenging when one wants to average over $\eta$.
}
like the Langevin equation, which can be used to model the inherent
randomness of the underlying quantum field
theory~\cite{parisi_perturbation_1981}.
Inspired by this, we might consider to give the conjugate momentum
random kicks, following the microcanonical path in between.
This strategy is often called a
{\it hybrid}~\cite{duane_hybrid_1985} strategy.
If not for numerical rounding error, this strategy would be expected
to generate the correct
distribution~\cite{duane_theory_1986}.
To correct for numerical error, one can again apply the Metropolis
step as before, which is referred to as
{\it hybrid Monte Carlo}\index{Monte Carlo!hybrid} (HMC).
Applying a series of molecular dynamics steps is usually called
a {\it trajectory}\index{trajectory}, after which comes the
Metropolis step. To fulfill detailed balance\footnote{Remember
we need this in order to ensure that the typical accept/reject
based on the configuration generated by the molecular dynamics
is a valid Metropolis step.}, we need to ensure
that our integration scheme
\begin{enumerate}
\item preserves the integration measure $\DD\phi\DD P$; and
\item is reversible.
\end{enumerate}
A commonly used scheme that fulfills these demands is known as the
{\it leapfrog} algorithm. Under this scheme, $\phi$ integrates
in $N$ steps of Monte Carlo length $\epsilon$, while $P$
starts with a step of length $\epsilon/2$, followed by $N-1$
steps of length $\epsilon$, and a final step of length $\epsilon/2$.
You can find an elementary proof that the leapfrog preserves area
and is reversible in Ref.~\cite{gattringer_quantum_2010}.
There one also finds a proof that the Metropolis fulfills detailed balance.
Let us synthesize what we have discussed to introduce an updating scheme that
represents a system of two indentical fermions with gauge interaction
at zero chemical potential.
This scheme for staggered fermions was first made explicit
in Ref.~\cite{gottlieb_hybrid-molecular-dynamics_1987}.
As discussed in \secref{sec:pseudofermions}, the most efficient way
to treat the determinant of the Dirac matrix is using pseudofermions.
Hence our path integral is\index{pseudofermion}
\begin{equation}\begin{aligned}\label{eq:twoFlavorRHMC}
Z&=\int\DD{U}\left(\det D\right)^2e^{-S_G}\\
&=
\int\DD{\Phi_R}\DD{\Phi_I}\DD{U}
e^{-S_G-\Phi^\dagger\left(D^\dagger D\right)^{-1}\Phi},
\end{aligned}\end{equation}
where we have introduced a three-color, complex scalar field
$\Phi=\Phi_R+i\Phi_I$, the pseudofermion field.
In \secref{sec:pseudofermions} we stressed that the treatment of two identical
fermions was important since it lets us put a $D^\dagger D$ in the
exponential, which makes it a well defined probability distribution.
We also saw that we can draw our pseudofermions according to the
above distribution if we define $\Phi=D^\dagger R$ and draw $R$ from
$\exp\left(-R^\dagger R\right)$. The strategy then goes:
\begin{enumerate}
\item Generate Gaussian $R$.
\item Solve $\Phi=D^\dagger R$ to get $\Phi$.
\item Given your starting configuration $U$ generate conjugate
momenta.
\item Keeping $\Phi$ fixed, use the molecular dynamics strategy
to integrate along a short trajectory, yielding a proposal for $U$.
\item Do a Metropolis accept/reject at the end of the trajectory for $U$.
\end{enumerate}
Let us discuss steps 2-4 in a little more detail; in particular we would like to
understand how the molecular dynamics works for $\SU(3)$. We express our gauge
field as
\begin{equation}
U_\mu=e^{i\omega_\mu^aT^a}=e^{iA_\mu}.
\end{equation}
We will therefore think of our Hamiltonian as a function of the
generalized coordinates $A_\mu$, or equivalently when convenient,
$\omega_\mu^a$. The corresponding conjugate momenta are
\begin{equation}
P_\mu=P^a_\mu T^a.
\end{equation}
The steps taken in the leapfrog change the momentum. With our fictitous
Hamiltonian in mind, what we seek is a force, i.e. a gradient of
the effective action. Looking at \equatref{eq:twoFlavorRHMC} our effective
action is
\begin{equation}
S_{\rm eff}=-S_G-\Phi^\dagger\left(D^\dagger D\right)^{-1}\Phi,
\end{equation}
and the force is
\begin{equation}
f=T^a\nabla^aS_{\rm eff}.
\end{equation}
The leapfrog thus proceeds in steps $\epsilon f$.
Based on the effective action, the force can be divided into the
{\it gauge force} and {\it fermion force}. The fermion force thus
depends on the chosen discretization. For example with a straightforward Wilson
gauge action, whose contribution at $U$ has the form
\begin{equation}
-\frac{\beta}{6}\tr\left(UA+A^\dagger U^\dagger\right),
\end{equation}
where $A$ is the staple sum. Using \equatref{eq:lieDeriv},
the fact that the derivative lies in the Lie algebra,
which lets you express it as a linear combination of the generators,
and \eqref{eq:SUNnorm}, the gauge force comes out to be
\begin{equation}
f_G=-\frac{i\beta}{12}\left(UA-A^\dagger U^\dagger\right).
\end{equation}
Meanwhile, using \propref{prp:MATwrtSCLinv},
the fermion force can be written
\begin{equation}
f_{\rm ferm}^a=
-T^a\left(\left(D^\dagger D\right)^{-1}\Phi\right)^\dagger
\left(\pdv{D}{\omega^a}D^\dagger+D\pdv{D^\dagger}{\omega^a}\right)
\left(\left(D^\dagger D\right)^{-1}\Phi\right).
\end{equation}
This shows that the inverse of $D^\dagger D$ must be computed
at each step of the molecular dynamics, which is far and away
the most expensive part.
Both here and in Step 2 we need to compute inverses of extremely
large matrices. One of the most efficient ways to approach this,
and indeed the strategy taken by codes such as
QUDA,
Grid, and
$\simulat$, is to use the conjugate gradient.
The difficulty of computing the inverse increases
substantially according to the smallest eigenvalue of $D$,
as mentioned in \secref{sec:numerics}. As a result
one may need to employ an SVD or a
{\it force filter}\index{force!filter}, which artificially
shifts low eigenmodes by an amount $\delta$.
\section{The HISQ force}\label{sec:HISQforce}
In this section we review aspects of computing the HISQ force specifically.
The HISQ action, discussed in \secref{sec:stagg} and \secref{sec:HISQ},
is widely used in state-of-the-art lattice calculations. Constructing its
force is a bit complicated, as it is a synthesis of many constructs in
those sections along with \secref{sec:MD}.
We begin by writing down the action itself in a way that's helpful
for computer implementation. We write
\begin{equation}
S_f=\Phi^\dagger\left(D^\dagger D\right)^{-N_f/4}\Phi
=\bra{\Phi}\left(D^{\dagger} D\right)^{-N_f/4}\ket{\Phi},
\end{equation}
where $N_f$ is the number of fermions and $\ket\Phi$ is the pseudofermion field.
For this presentation of the HISQ force, we follow
Ref.~\cite{Wong:2007uz}, and hence switch to bra-ket notation.
Remember, we chose this pseudofermion rewrite to be able to compute the
determinant of the Dirac matrix efficiently. We want a
$D^\dagger D$ in the exponential since it has positive eigenvalues,
and hence makes for a well defined probability distribution.
The $D^\dagger D$ would correspond to two identical quarks of that flavor, since
by the Matthews-Salam formula
\begin{equation}
\det D^\dagger D = \det D^\dagger \det D
=\int \dd{\bar{\psi}_f}\dd{\psi_f}\dd{\bar{\psi}_g}\dd{\psi_g}
\exp\left(\bar{\psi}_fD^\dagger \psi+\bar{\psi}_gD \psi_g\right).
\end{equation}
The trick to get around this, as done in
Ref.~\cite{gottlieb_hybrid-molecular-dynamics_1987}, is to define the
$\Phi$ fields on even sites only. This works because $D^\dagger D$ has no
matrix elements connecting even and odd sites. More precisely, if
$D_0$ is the massless part of the Dirac operator, then
\begin{equation}\begin{aligned}
\left(D^\dagger D\right)_{xy}
&=\left(2m\delta_{xz}+D^\dagger_{0,xz}\right)
\left(2m\delta_{zy}+D_{0,zy}\right)\\
&=4m^2\delta_{xy}+2m\left(D^\dagger_{0,xy}+D_{0,xy}\right)+D^\dagger_{0,xz}D_{0,zy}.
\end{aligned}\end{equation}
The second term in the last line vanishes since $D$ is antihermitian.
The first term is diagonal in the site basis. Similarly $D$ itself always sends
odd to even or vice versa, so $D_0^\dagger D_0$ must preserve parity.
Hence if we write $D^\dagger D$ in a basis organized so even indices come first
followed by odd, we have a block-diagonal matrix with two copies of
$4m^2+D_0^\dagger D_0$. Restricting to one parity halves the number of degrees
of freedom, returning us once again to our interpretation of having a single
fermion species.
Finally we have the issue, discussed in
\secref{sec:HISQ}, that staggered fermions lead to four degenerate tastes.
The solution offered in that section is to take the fourth root. In practice,
this fourth root is implemented as a rational
approximation~\cite{clark_rhmc_2004,clark_rational_2006},
\begin{equation}
\left(D^\dagger D\right)^{-N_f/4}\approx\alpha_0+\sum_l\frac{\alpha_l}{D^\dagger
D+\beta_l}.
\end{equation}
Using the rational approximation to estimate the fourth root in this hybrid
Monte Carlo scheme is called {\it rational hybrid Monte Carlo}\index{RHMC}
(RHMC).
Next we need to open up the $D$. We write\footnote{It is common practice in
lattice publications to use $M$ rather than $D$ to indicate the Dirac matrix.
In that case, $D$ is sometimes used to indicate the (covariant) derivative part.
I took the notation of Gattringer and Lang~\cite{gattringer_quantum_2010}.}
\begin{equation}\begin{aligned}
D(y|x)&=2m\delta_{x,y}
+\sum_\alpha \eta_\alpha(x)\left(\UHISQ_\alpha(x)\delta_{x,y-\hat\alpha}
-\left(\UHISQ_\alpha(x-\hat\alpha)\right)^\dagger\delta_{x,y+\hat\alpha}\right)\\
&\equiv 2m\delta_{x,y}+\dd_{x,y},
\end{aligned}\end{equation}
where $\UHISQ$ is given by \equatref{eq:HISQsmear}:
\begin{equation}
\UHISQ=\mathcal{F}^{\rm Fat7}_{\rm corr}\mathcal{U}\mathcal{F}^{\rm Fat7}U,
\end{equation}
i.e. we Fat7-smear the thin link, followed
by a reunitarization and then a second-level smear that has some lattice-spacing
corrections built in. The $\eta_\alpha$ are the
staggered phases defined in \equatref{eq:staggeredPhase}.
\subsection{The Hasenbusch trick}\index{preconditioning!Hasenbusch}
At this point is has been mentioned multiple times that a major bottleneck
of the RHMC algorithm, which as of now is the fastest scheme for
generating configurations with dynamical quarks, is inverting the
Dirac matrix. As discussed in \secref{sec:numerics}, if this matrix
has a poor condition number, it is difficult to invert.
One way to alleviate this problem is to {\it
precondition}\index{preconditioning} the matrix. With this strategy,
we look out for a preconditioner $P$ to transform our to-be-solved equation to
\begin{equation}
DP^{-1}Px=b
\end{equation}
with the hope that $DP^{-1}$ has a better condition number than $D$ had.
Inverting $D$ becomes more costly as the bare quark mass is dialed down.
The {\it Hasenbusch trick}~\cite{hasenbusch_speeding_2001,urbach_hmc_2006}
notes that
\begin{equation}
\det D^{\dagger} D\left(m_l\right)=\det \frac{D^{\dagger}
D\left(m_l\right)}{D^{\dagger} D\left(m_h\right)} \det D^{\dagger}
D\left(m_h\right),
\end{equation}
where $m_l$ indicates some light quark mass and $m_h$ some heavier one.
The heavier masses are then treated with their own pseudofermion integrals.
\section{Choosing a random number generator}
As far as we know\footnote{See for instance \apref{ap:bell}.},
the only truly random or stochastic things in nature are
quantum mechanical measurements. This poses a problem for our strategy, since
we need a random number for the Metropolis step \equatref{eq:metupdate}.
In practice, one wants to generate a number $X$ uniformly at random
in the interval $X\in[0,1)$ and compute the number on the RHS
of \equatref{eq:metupdate}. If $X$ turns out to be greater, you reject $C'$.
In order to generate $X$, we use a {\it random number
generator}\index{generator!random number} or RNG.
Since we cannot have a true random number\footnote{For
this reason, these algorithms are sometimes more aptly called
{\it pseudorandom number generators}\index{generator!pseudorandom number}
or PRNGs.}, we instead settle for an algorithm that produces a sequence of numbers
\begin{equation}
X_{\rm seed},~X_1,~X_2,~...,~X_n,~...,
\end{equation}
by using some rule to generate $X_{n+1}$ given $X_n$. The starting number is
called the {\it seed}\index{seed}. The goal is to choose your rule so that,
given the seed, it is difficult or effectively impossible to figure out what
$X_n$ will be. Again, this is not a truly random process, but unfortunately it
is not yet practical to harness direct quantum-mechanical measurements to that
end.
One of the earliest popular RNGs was the {\it linear congruence
generator}\index{generator!linear congruence}~\cite{lehmer_mathematical_1951,thomson_modified_1958}.
Here the rule is pretty simple
to understand: You choose three tuning parameters $a$, $b$, $m\in\N$
and then generate according to
\begin{equation}\label{eq:linearCongruence}
X_{n+1}=(aX_n+b)\mod m.
\end{equation}
One can then convert a number $X_n$ generated in this way to a number $X$
in the unit interval as required for the Metropolis step by normalizing it by $m$.
\begin{figure}
\centering
\includegraphics[width=0.9\linewidth]{figs/linear_congruence.pdf}
\caption{A strong correlation hidden in the linear congruence generator for
poorly chosen $a$, $b$, and $m$. Things seem to improve for a better choice of
these parameters, but as Marsaglia proved, this generator ought to be avoided
regardless. Image made by C. Schmidt.}
\label{fig:LCG}
\end{figure}
The linear congruence generator is extremely easy to implement and is quite
fast. Unfortunately it suffers from one fatal flaw: It sucks. To see that it
sucks, one can plot $X_{2m}$ against $X_n$. An example of such a plot for
poorly chosen $a$, $b$, and $m$ is given in \figref{fig:LCG}. One sees the
pairs organize themselves into planes. Changing $a$, $b$, and $m$ seems
to improve the generator, but this flaw of the linear congruence generator
is fundamental. Indeed it was shown by Marsaglia~\cite{marsaglia_random_1968} that
outcomes of the linear will always arrange themselves in planes of hypercubes.
Hence one should never this generator for scientific purposes.
Looking for these hidden correlations is a bit challenging requiring some care
and experience. There are a variety of tests, but if you want to be thorough,
it is likely best to use already existing batteries, such as the {\it Diehard}
battery written by Marsaglia himself. More modern batteries,
such as the {\it Dieharder}, can be found online~\cite{dieharder}.
Besides lacking hidden correlations, you would like numbers produced by your
RNG to match the target distribution. This can be quantitatively estimated
using, for instance, the Kolmogorov test of \secref{sec:kolm}.
Some modern RNGs that are known to be suitable for scientific purposes,
in particular passing some of the tests above, include
the\index{generator!Tausworthe}
{\it hybrid Tausworthe}~\cite{tausworthe_random_1965,lecuyer_maximally_1996}
and\index{generator!Mersenne twister}
the {\it Mersenne twister}~\cite{matsumoto_mersenne_1998}.
\section{Statistical analysis}\label{sec:statana}
Since $C_i$ is generated based on $C_{i-1}$, measurements on subsequent
configurations are correlated. In our simulations, these
correlations are reduced in two ways:
\begin{enumerate}
\item Subsequent configurations are separated by multiple updating sweeps;
and then
\item configurations are grouped into $\nconf$ blocks or bins.
\end{enumerate}
The final measurements $X_i$ used in data analysis are obtained by averaging
within each block.
To check whether the final data are effectively independent, one can use
the integrated autocorrelation time.
For statistically independent measurements, we expect the variance
$\sigma^2_{\bar{X}}$ of $\bar{X}$ to be
\begin{equation}
\sigma^2_{\bar{X}}=\frac{\sigma^2}{\nconf}
\end{equation}
due to the CLT. In practice, however, one finds
\begin{equation}
\sigma^2_{\bar{X}}=\frac{\sigma^2}{\nconf}\tauint.
\end{equation}\index{autocorrelation time!integrated}
The factor $\tauint$ is the integrated autocorrelation time. It is the
ratio between the estimated variance of the sample
mean and what this variance would have been if the data were independent.
For effectively independent data, $\tauint=1$.
So, the final measurements are drawn from some
distribution with mean $\ev{X}$ and variance $\sigma^2$
and are effectively independent. The
estimator $\bar{X}$ of the mean is the average~\eqref{eq:arithmeticaverage},
while the unbiased estimator $\bar{\sigma}^2$ of the variance is
\begin{equation}\label{eq:estimatorvariance}
\bar{\sigma}^2=\frac{1}{\nconf-1}\sum_{i=1}^{\nconf}
\left(X_i-\bar{X}\right)^2.
\end{equation}
An estimator is biased if its mean for finite $\nconf$
does not agree with the exact result;
the bias is the difference. Generally, problems with bias emerge whenever
one wishes to estimate some non-linear function $f$ of the mean $\ev{X}$.
Naively one might guess
\begin{equation}
\bar{f}_\text{bad}=\frac{1}{\nconf}\sum_{i=1}^{\nconf}f(X_i)
\end{equation}
as an estimator; however $\bar{f}_\text{bad}$ is not guaranteed
to converge to the exact result\footnote{See for instance the discussion
in \secref{sec:bias}.}.
An estimator for $f(\ev{X})$ that converges to its true value is
\begin{equation}\label{eq:MCMCconsistentEst}
\bar{f}=f(\bar{X});
\end{equation}
in particular, the bias of this estimator is $\order{1/\nconf}$.
Therefore in the large $\nconf$ limit, the bias vanishes faster than the
statistical error bar.
At this point it is worth commenting when it is appropriate to use
the sample average~\eqref{eq:arithmeticaverage} and the estimator
\eqref{eq:MCMCconsistentEst}. The sample average is the preferred
method to estimate an observable that can be directly calculated from
the configurations, i.e. when $X_i=X(C_i)$.
Such random variables are called {\it primary quantities},
\index{primary quantity} and they include many simple observables such
as the action, the absolute value of the Polyakov loop, and so on.
However some quantities can only be calculated
as functions of the sample average; these are called {\it secondary quantities}.
\index{secondary quantity} An obvious example is the variance, which is defined
only in terms of averages. But a less obvious example are particle masses,
which cannot be directly calculated on any single configuration. In this
case we are forced to use \equatref{eq:MCMCconsistentEst}.
We have introduced a way to estimate the mean and variance of some
operator, as well as a way to estimate the mean of some function of
that operator. Now we need a way to estimate the error bar of that
function. We cannot use
\begin{equation}
\bar{\sigma}^2_{\bar{f}}=\frac{\bar{\sigma}^2_{\bar{f}}}{\nconf}
=\frac{1}{\nconf\,(\nconf-1)}\sum_{i=1}^{\nconf}
\left(f(X_i)-\bar{f}\right)^2
\end{equation}
because $f(X_i)$ is not a valid sample point.
One could analytically produce an error bar for $\bar{f}$
using error propagation. However when the function is complicated,
error propagation becomes extremely unwieldy.
Jackknifing\index{jackknife} allows one to extract a mean and error bar,
and it is straightforward to implement;
therefore it makes sense to use the jackknife method generally.
The idea of jackknifing is to throw away the first measurement,
leaving $\nconf-1$ resampled values. Then we resample again, this
time throwing out the second point, and so on. The resulting
jackknife bins are
\begin{equation}
X_{J,i}=\frac{1}{\nconf-1}\sum_{j\neq i}X_j.
\end{equation}
The jackknife estimator for $f(\ev{x})$ is then
\begin{equation}
\bar{f}_J=\frac{1}{\nconf}\sum_{i=1}^{\nconf}f(X_{J,i}),
\end{equation}
while the estimator for the variance of $\bar{f}_J$ is
\begin{equation}
\bar{\sigma}^2_{f_J}=\frac{\nconf-1}{\nconf}\sum_{i=1}^{\nconf}
\left(f(X_{J,i})-\bar{f}_J\right)^2.
\end{equation}
In many instances, we will need to compare two estimates
of the same quantity against each other and decide whether the difference
between them is significant. This can happen, for example, if we want to
compare another group's results with our own. Let their result
be $\bar{X}$ with uncertainty $\sigma_{\bar{X}}$ and ours be
$\bar{Y}$ with uncertainty $\sigma_{\bar{Y}}$. Then the probability
that these two estimates differ by at least $D$ is
\begin{equation}\index{Gaussian difference test}
q=\pr{|\bar{X}-\bar{Y}|>D}
=1-\erf\left(\frac{D}{\sqrt{2
\left(\sigma_{\bar{X}}^2+\sigma_{\bar{Y}}^2\right)}}\right)
\end{equation}
assuming $\bar{X}$ and $\bar{Y}$ are normally distributed with
the same mean. This is called a Gaussian difference test.
The quantity $q$ is called the q-value.
In practice we take $q\leq0.05$ to be an indication of a possible
discrepancy between $\bar{X}$ and $\bar{Y}$, keeping in mind that
$q\leq0.05$ by chance one out of twenty times.
In practice, the true variances $\sigma_{\bar{X}}$ and $\sigma_{\bar{Y}}$
are not known. If one wishes to use the estimators $\bar{\sigma}_{\bar{X}}$
and $\bar{\sigma}_{\bar{Y}}$ instead, one can perform a {\it Student
difference test} or {\it t-test}\index{t-test} to investigate whether the
discrepancy $D$ is due to chance. Suppose the estimate $\bar{X}$ comes from
$M_{\text{conf}}$ data, while $\bar{Y}$ comes from $\nconf$ data.
Assume $\sigma_{\bar{X}}=\sigma_{\bar{Y}}$, which
happens when the sampling methods used are identical.
We introduce the random variable
\begin{equation}
t=\frac{D}{\bar{\sigma}_D},
\end{equation}
where $D=\bar{X}-\bar{Y}$, and
\begin{equation}
\bar{\sigma}^2_D=\left(\frac{1}{M_{\text{conf}}}+\frac{1}{\nconf}\right)
\frac{(M_{\text{conf}}-1)\,\bar{\sigma}_{\bar{X}}^2
+(\nconf-1)\,\bar{\sigma}_{\bar{Y}}^2}
{M_{\text{conf}}+\nconf-2}.
\end{equation}
Then the probability that these two estimates differ by at least $D$ is
\begin{equation}
q=2
\begin{cases}
\,I\left(z,\frac{\nu}{2},\frac{1}{2}\right) & \text{for }t\leq 0, \\
\,1-\frac{1}{2}\,I\left(z,\frac{\nu}{2},\frac{1}{2}\right) & \text{otherwise},
\end{cases}
\end{equation}
where $I$ is the incomplete beta function, $\nu=M_{\text{conf}}+\nconf-2$,
and
\begin{equation}
z=\frac{\nu}{\nu+t^2}.
\end{equation}
To estimate finite size corrections and carry out continuum limit
extrapolations, we need a way to fit data to curves.
Consider a sample of $\nsim$ Gaussian, independent data points $(X_i,Y_i)$,
where the $Y_i$ have standard deviations $\sigma_i$ and the $X_i$ have
no errors. For instance, if one is interested in a continuum limit
extrapolation, the $X_i$ are $\beta$ values while the $Y_i$ are
ratios of scales evaluated at that $\beta$.
We model these data with a fit that depends on some set
of $M$ parameters
\begin{equation}
y=y(x;a),
\end{equation}
where $a=(a_1,...,a_M)$ is the vector of these parameters. Our goal
is to estimate the $a_j$.
Assuming that $y(x;a)$ is the exact law for the data, the probability
distribution for the measurements $Y_i$ is
\begin{equation}
f(y_1,...,y_{\nsim})=\prod_{i=1}^{\nsim}\frac{1}{\sqrt{2\pi}\sigma_i}
\exp\left[\frac{-(y_i-y(x_i;a))^2}{2\sigma_i^2}\right].
\end{equation}
The probability that the data fall within a region near what was observed is
\begin{equation}
\text{P}=\prod_{i=1}^{\nsim}\frac{1}{\sqrt{2\pi}\sigma_i}
\exp\left[\frac{-(y_i-y(x_i;a))^2}{2\sigma_i^2}\right]dy_i.
\end{equation}
Our strategy for determining the correct fit will be to find the vector $a$
that maximizes the above probability. This happens when
\begin{equation}
\chi^2(a)\equiv\sum_{i=1}^{\nsim}\frac{(y_i-y(x_i;a))^2}{2\sigma_i^2}
\end{equation}
is minimized. This strategy is an example of a {\it maximum likelihood method}.
We now describe an iterative method to search for the minimum of $\chi^2$.
Let $a_n$ be the vector of parameters for the $n\nth$ iteration.
As long as $a$ is in a small enough neighborhood of $a_n$, we can safely
approximate
\begin{equation}\label{eq:NRapprox}
\chi^2(a)\approx\chi^2(a_n)+(a-a_n)\cdot b
+\frac{1}{2}(a-a_n)\,A\,(a-a_n),
\end{equation}
where the coefficients of the vector $b$ and the $M\times M$ matrix $A$
are given by the first and second derivatives of $\chi^2$ evaluated at $a_n$.
In the {\it Newton-Raphson method}\index{Newton-Raphson}, the next
iteration $a_{n+1}$ is
determined from the condition $\nabla\chi^2(a)|_{a=a_{n+1}}=0$,
which yields
\begin{equation}\label{eq:NR}
a_{n+1}=a_n-A^{-1}b.
\end{equation}
If the approximation~\eqref{eq:NRapprox} is not good, one can instead move
a small step in the direction of the gradient by
\begin{equation}\label{eq:SD}
a_{n+1}=a_n-c\,b,
\end{equation}
where $c$ is a constant that is small enough not to overshoot direction
of steepest descent. This is an example of a {\it steepest descent method}.
The Levenberg-Marquardt\index{Levenberg-Marquardt}
method~\cite{levenberg_method_1944,marquardt_algorithm_1963}
varies smoothly between~\equatref{eq:NR} and
\eqref{eq:SD}. Steepest descent is used far from the minimum, and
then it switches to the Newton-Raphson method when the minimum is approached.
\section{General code structure}
Now that we have introduced the general idea of MCMC, along with some specific
updating schemes, and complications for statistical analysis, we are ready
to discuss the computer implementation. In this section we will discuss the
general code structure. In the following sections, we will address various
implementation details important for improving code performance and readability.
As mentioned earlier, we design the simulation using {\it local updates}, which
means we update the links one at a time. This is done in a systematic order,
because there is some computational advantage compared to updating in a
random order~\cite{berg_markov_2004}.
An updating {\it sweep}\index{sweep} updates every link on the lattice once.
To maximize efficiency while maintaining ergodicity, it is common for pure
$\SU(2)$ algorithms to use
a combination of HB and OR updating.
An MCMC simulation of LGT broadly consists of three essential steps:
\begin{enumerate}
\item {\it Initialization}: The first thing to do is get everything ready for
the simulation. This includes initializing the random number generator,
and setting up an initial configuration.
\item {\it Equilibration} or {\it thermalization}:
\index{equilibration}\index{thermalization}
To avoid over-sampling rare configurations,
one must perform many sweeps to bring the system to its equilibrium
distribution. The structure of this section looks like
\begin{verbatim}
do from n=1 to n=nequi
call MCMC update
end do
\end{verbatim}
\item {\it Configuration generation}: One we are in the equilibrium
distribution, we want to generate configurations on which we can
perform measurements. To help reduce correlations between
measurements, multiple updating sweeps are performed in between.
This section is structured as
\begin{verbatim}
do from n=1 to n=nconf
do from n=1 to n=ndiscarded
call MCMC update
end do
save configuration
end do
\end{verbatim}
\item {\it Measurements}: Now that we have a good sample of configuration
space, we are ready to perform measurements there. Some measurements
can be take {\it in situ}, i.e. they are incredibly cheap and can be
calculated the moment the configuration is saved. Measurements of this
type include simple link products like the Polyakov loop. More expensive
observables may require inverting a Dirac matrix thousands of times,
gauge fixing, or smoothing. For such observables it is better to
have separate code that runs on the saved configurations. This
code is structured as
\begin{verbatim}