-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathiibproject.tex
1868 lines (1624 loc) · 88.4 KB
/
iibproject.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass{IIBproject}
\usepackage{setspace}
\usepackage{amssymb,amsmath,amsthm}
\usepackage{units}
\usepackage{cite}
\usepackage{subfigure}
\usepackage{minted}
\usepackage{url}
\usepackage[margin=2.5cm]{geometry}
\usepackage{multirow}
\usepackage{import}
\usepackage{mdwlist}
\usepackage[retainorgcmds]{IEEEtrantools}
% required to get proper length monospace underscores
\usepackage[T1]{fontenc}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\usepackage{graphicx,ctable,booktabs}
\graphicspath{{figures/}}
\pagestyle{empty}
\onehalfspacing
\begin{document}
% Title Page
\author{Rodrigo Queiro (DOW)}
\title{Machine Learning for Control}
\projectgroup{F}
\maketitle
\thispagestyle{empty}
\pagestyle{plain}
\begin{spacing}{1.1}
\tableofcontents
\end{spacing}
\newpage
\section*{Technical Abstract}
In 2004 and 2005, a robotic unicycle was built by Mellors\cite{ref:mellors}
and Lamb\cite{ref:lamb} as a Master's project, using a powered wheel and
horizontal flywheel designed to emulate the actuation used by a human rider,
but they lacked the time to design a controller. Several years later,
D'Souza-Mathew\cite{ref:dsm} and then Forster\cite{ref:forster} attempted to
stabilise it by adding training wheels, and using the powered wheel to balance
in pitch, but were thwarted by problems with the hardware.
The system of the unicycle with a horizontal flywheel actuator had previously
been studied by Vos\cite{ref:vos} and Naveh\cite{ref:naveh}. Vos built a
physical unicycle and designed a controller for it, but his results were
reportedly unreliable. Furthermore, his controller required constant forward
motion of the unicycle to correct for roll errors. Naveh built upon Vos's
design, coming up with a more capable (and complicated) controller, but did
not test it on a physical unicycle.
These projects all sought to build a manageable but accurate analytical model
of the dynamics of the unicycle. They then applied non-linear control
techniques to try to stabilise the unicycle. This is particularly difficult
for unicycle with a horizontal flywheel, since when it rolls to the side it
must both rotate in the direction of the fall and then travel forwards to
catch itself. This brings gyroscopic effects into play, which are hard to
stabilise with analytical control techniques.
An alternative approach is to build a numerical model of the dynamics directly
from interaction with the system, and then to computationally design an
optimal controller for the numerical model---a reinforcement learning
problem. An approach to this problem due to Rasmusssen and
Deisenroth\cite{ref:rasdei08}, Reinforced Model Learnt Control (RMLC), has
been successfully used to learn to swing up and balance an inverted pendulum,
with prior knowledge only of the states and control inputs of the system. It
has also been used to stabilise a computer simulation of the 3D unicycle. One
of the most important aspects of RMLC is that it uses Gaussian Processes to
learn the dynamics of the system, allowing it to make predictions while
keeping track of the certainty of its knowledge about the system.
Previously, McHutchon had successfully applied RMLC to the unicycle with
training wheels---this had required several changes to the
hardware of the unicycle\cite{ref:mchutchon}. His work gave a clear idea of necessary changes to
the hardware and software of the unicycle in order to enable it to
balance without training wheels. The first aim of the project was to install a
new MEMS gyroscope and an Arduino microcontroller, and then to repeat
McHutchon's stabilisation experiments. These changes solved a number of
problems with the original system, including noisy angle measurements and
voltage supply problems. We were also able to take advantage of the
flexibility of the new Arduino-based architecture and improve the safety and
ease of use of the system.
After making these changes, the RMLC system was able to successfully balance
the unicycle (with training wheels). Additionally, we found we could
significantly improve the performance of the RMLC-trained controllers by
reducing the timestep at which the system is discretised, and by modelling the
discretised system as a 2$^\textrm{nd}$ order Markov chain. This led to
confident and accurate predictions about the performance of the unicycle, and
these techniques would be critical in modelling the unicycle without training
wheels.
The next step was to prepare the hardware and software for balancing the
unicycle. This required the addition of an accelerometer for sensing the
initial angle of the unicycle and a new motor controller and speed sensor for
the flywheel. We also required new algorithms for keeping track of the
unicycle's orientation and position in 3D space, for sensing the flywheel
speed more accurately from an extremely unreliable sensor and for applying a
quadratic controller to the system. Additionally, we designed and built a
wooden skirt to protect the unicycle from falling too far when testing
incapable controllers. When this turned out to have too high a moment of
inertia, we redesigned it to reduce the weight, and finally settled on testing
with a loose rope tied to a balcony above the unicycle.
At this point, we started training the unicycle to balance. During the first
process, the system became capable of designing controllers that could
reliably keep the unicycle vertical until it reached the edge of its range and
was pulled over by the rope. Although the learned dynamics model could
accurately predict the motion of the unicycle, the system was unable to
produce a controller that could stay in one place. This turned out to be the
result of a bug in the position tracking---the controller did not have
sufficient information to control its position.
Once this problem had been fixed, a second training process was initiated, but
failed due to a sensor problem. Finally, a third training process was
undertaken, which followed the same initial process of learning to balance as
the first. To prevent it from reaching the limit of its restricted range, the
loss function was changed to penalise off-centre positions very harshly,
although after this point, the system appeared to sacrifice stability in
favour of controllers that would remain near the centre. This produced
controllers that could frequently balance for 2--3 seconds, but only rarely
longer. The training sequence could not be continued due to the computational
cost of controller training, and due to the time limitations of the project.
Despite the failure to produce a controller that could reliably, indefinitely
balance the unicycle, the RMLC system produced some promising results,
including controllers that could control pitch and roll angles reliably but
fell foul of the limited range of positions, and one trial in which the
unicycle remained upright for 5 seconds. It is hoped that future work on the
system will able to resolve some of the various issues that may have caused
problems for the RMLC controllers, and finally achieve the task of balancing
the unicycle.
\newpage
\section{Introduction}
Balancing a unicycle is a very challenging task for a
human rider. Many attempts have been made to achieve this task, using a
variety of models for the action of the rider. Some represent the rider as a
flywheel or pendulum in the coronal plane, allowing direct compensation of
falling to the side\cite{ref:zenkov,ref:murata}, as in
Figure~\ref{fig:murata}. Other use a more realistic (and challenging) model of
a flywheel in the horizontal plane\cite{ref:vos,ref:naveh}, as in
Figure~\ref{fig:naveh_unicycle}, but none of these
have reliably balanced a real unicycle.
\begin{figure}[htbp]
\begin{center}
\subfigure[Murata Girl]{
\includegraphics[height=5cm]{figures/murata_girl.jpg}
\label{fig:murata}
}
\subfigure[Naveh's Model]{
\includegraphics[height=5cm]{figures/naveh_unicycle.png}
\label{fig:naveh_unicycle}
}
\subfigure[2D Problem]{
\includegraphics[height=5cm]{figures/forster_unicycle.png}
\label{fig:forster_unicycle}
}
\subfigure[3D Problem]{
\includegraphics[height=5cm]{figures/3d_unicycle.png}
\label{fig:3d_unicycle}
}
\end{center}
\caption{Different balance problems}
\label{fig:unicycles}
\end{figure}
In 2004/2005 Mellors and Lamb \cite{ref:mellors,ref:lamb} built a robotic
unicycle, shown in Figure~\ref{fig:3d_unicycle}, intending to design a controller to balance it. However, they were
only able to complete the construction of the unicycle. In 2007/2008,
D'Souza-Mathew resumed work, replacing a wheel sensor and attempting to design
a controller. He simplified the problem by removing the ability to fall to the
side, reducing the dynamics to two dimensions: the inverted pendulum (from
now on referred to as the \textbf{2D system}). This is shown in
Figure~\ref{fig:forster_unicycle}. He was unable to balance the unicycle due to
hardware problems.
Next, in 2008/2009 Forster analysed the dynamics of both the 2D problem and
the unrestricted 3D unicycle\cite{ref:forster}. Again, hardware problems
prevented him from balancing the 2D system, and although he designed a
controller for the 3D unicycle, it was not even tested in simulation. Given
the simplicity of his approach compared to those of Vos and Naveh
\cite{ref:vos,ref:naveh}, it appears unlikely to work.
One thing all these approaches have in common is that their first step is a
series of simplifying assumptions about the dynamic system. They ignore the
non-linearities like motor dead-zones and wheel friction that are present in
any real-world system, and attempt to design a controller to stabilise the
idealised system. In many cases, this approach is very successful. However,
D'Souza-Mathew and Forster found that their model was invalid since the
unicycle's motor drive didn't react fast enough. Vos's approach neglected all
Coriolis and gyroscopic effects, and failed to balance in trials where these
were important. Naveh's more complex approach required multiple control laws
for different regions of state space, and was only tested in simulation.
An alternative ``intelligent'' approach to control involves learning the
dynamics of the system directly, instead of relying on assumptions and
mechanical analysis. Various methods for this have been used, but many require
prohibitively large amounts of data from the system. One method due to
Rasmussen and Deisenroth, known here as Reinforced Model Learnt Control
(RMLC), achieves unprecedented data efficiency, and has been successfully used
to stabilise a computer simulation of a 3D
unicycle\cite{ref:rasdei08,ref:rasdei11}.
In 2009/2010, McHutchon successfully applied RMLC to the 2D system
\cite{ref:mchutchon}. However, since he had to make significant changes to the
unicycle hardware and software to achieve this, he did not have time to
attempt to balance the 3D unicycle.
The principal objective of this project is to apply RMLC to the unrestricted
3D unicycle. This includes the solution of problems identified by McHutchon,
as well as other problems identified during the project.
\subsection{Division of Labour}
This project was conducted as a joint project between the author and Alan
Douglass. The vast majority of the work was done together, and as such it is
difficult to attribute ideas or components of the system to one or the other.
When only one was responsible for a particular component, such as a circuit or
module of the code, it is focussed upon in their report, and the other report
will simply refer to it.
Although this may raise difficulties for marking the projects, it was
essential for the actual work. We would never have been able to make such
progress if we had not been able to build on each other's ideas, to spot each
others mistakes and to rely upon another point of view when stuck.
\section{Reinforced Model Learnt Control}
\begin{figure}[htbp]
\begin{center}
\includegraphics[width=14cm]{figures/GPRMLC.pdf}
\end{center}
\caption{Reinforced Model Learnt Control}
\label{fig:rmlc_flowchart}
\end{figure}
The main technique in this project is Reinforced Model Learnt Control,
diagrammed in Figure~\ref{fig:rmlc_flowchart}. At its core, it assumes that
the system (in this case, the unicycle) can be modelled in discrete time as:
\[
\boldsymbol{q}[n+1] = \boldsymbol{f}(\boldsymbol{q}[n],
\boldsymbol{u}[n])
\]
In this equation, $\boldsymbol{q}[n]$ is the state of the system at time $n$,
and $\boldsymbol{u}[n]$ is the control input at time $n$. In the case of the
unicycle, $\boldsymbol{q}$ consists of the angles and angular velocities of
the components of the unicycle, and the position of the unicycle.
$\boldsymbol{u}$ consists of the commands sent to the wheel and flywheel
motors.
This function, $\boldsymbol{f}$, is modelled as a Gaussian Process (GP). By
using Gaussian Process Regression (GPR), we can estimate any continuous
function from sampled inputs and outputs. For a description of the mechanics
of GPR, refer to \cite{ref:gpml}. When the unicycle runs, we get
a series of states and control inputs that can be converted to samples of
$\boldsymbol{f}$, and this allows us to use GPR to estimate
$\boldsymbol{f}$ at any point. This estimated $\boldsymbol{f}$ is referred to
as the \textbf{dynamics model}.
By successively applying $\boldsymbol{f}$ to an initial distribution of
possible starting states, we can estimate, with confidence bounds, a
distribution of states over some finite horizon. This is referred to as
\textbf{simulation} of the system. Then, a \textbf{loss function} is applied
to the state distributions---this might find, for example, the expected
distance between the top of the unicycle and the upright position. Summing
these losses over the horizon gives a numerical score that rates how well the
dynamics model believes a given controller will balance the unicycle. This
loss score will be lowest for trajectory distributions that are concentrated
near the equilibrium.
The gradient of the loss with respect to the controller parameters can be
calculated, and this allows standard gradient descent optimisation methods to
be used to find a locally optimal controller (for the estimated dynamics
model). This process is shown as the right-hand box, ``Simulation \&
Controller Optimisation'', in Figure~\ref{fig:rmlc_flowchart}, and is referred
to as \textbf{training} a controller.
Once an optimal controller has been trained on the simulated system, a
\textbf{trial} is performed on the real system (``Real World Testing'' in
Figure~\ref{fig:rmlc_flowchart}). This generates a log of states and control
inputs, which can be converted into more samples of $\boldsymbol{f}$,
improving the quality of the dynamics model and allowing a better controller
to be trained. This process is repeated iteratively until the dynamics model
is sufficiently accurate that the trained controllers perform well on the real
system. This process of gathering initial data (see
Section~\ref{sec:random_trials}) and then iterating between training a new
controller and gathering data with it, continuing until steady-state is
reached, is referred to as a training \textbf{sequence}.
\subsection{Prediction Plots}
\label{sec:prediction_plots}
In this approach, knowledge about the state of the system is represented by a
multivariate Gaussian distribution over the states. The dynamics model is used
to map the distribution at one timestep to the distribution at the next
timestep. Repeated application yields a series of distributions at different
times representing possible futures of the system. To represent this in a
useful fashion, we can marginalise the multivariate Gaussians to get the
distribution of a single variable at a given time, and then consider a 95\%
confidence interval:
\[
P(\mathcal{N}(\mu, \sigma^2) \in [\mu-2\sigma, \mu+2\sigma]) \approx 95\%
\]
We can then plot these confidence intervals as a function of time. This is
shown as a filled region in Figure~\ref{fig:pred_ex}. The mean of the
predicted distribution is shown as a dashed line. These predictions are for
the controller trained by the optimisation procedure described above. After
the optimal controller has been trained for the learned dynamics model, a
trial on the real system is conducted to gather more data and improve the
dynamics model. This trial is shown on both plots as solid line.
\begin{figure}[htpb]
\begin{center}
\subfigure[Prediction of a specific trajectory]{
\includegraphics[width=7cm]{pred_ex_specific.pdf}
\label{fig:pred_ex_specific}
}
\subfigure[Prediction of a generic trajectory]{
\includegraphics[width=7cm]{pred_ex_generic.pdf}
\label{fig:pred_ex_generic}
}
\end{center}
\caption{Predictions of the unicycle's pitch, and the following trial}
\label{fig:pred_ex}
\end{figure}
Note the difference between the two plots. Figure~\ref{fig:pred_ex_specific}
shows the prediction when given the initial state of the trial. The
uncertainty is limited to measurement error initially, but grows due to
uncertainty in the dynamics model. Figure~\ref{fig:pred_ex_generic} shows the
prediction starting with a distribution over possible initial states---it is
this distribution over trajectories that is optimised by the training process.
\subsection{Practical Concerns}
There are many different decisions to make when implementing the RMLC
strategy, which are detailed in this section. Previously, Rasmussen used
Forster's analytical model of the unicycle to simulate the real world trials,
and used RMLC to train a controller for this ideal unicycle. Thanks to this,
and to McHutchon's work on the real 2D system, we have a lot of information on
which choices can work well in these situations, and which are most important.
In many cases, these choices were made after noticing deficiencies in the
performance of RMLC on either the 2D system or the 3D unicycle, as explained
in Section~\ref{sec:learning_results}.
\subsubsection{GPR Implementation}
The GPR system has many configurable parameters, but fortunately the form used
previously had proven very robust. This project uses a zero-mean GP with a
squared exponential covariance function, with automatic relevance detection.
This has the form:
\[
k(\boldsymbol{x}, \boldsymbol{x}') = \alpha^2 \exp \left(\sum_{d=1}^D -
\frac{(x_d-x_d')^2}{2 l_d^2}\right)
\]
This expression contains hyperparameters for the signal variance $\alpha^2$
and the length scales $l_d$. An additional hyperparameter involved in the
regression is the noise variance, $\sigma_\varepsilon^2$. These parameters are
chosen to best fit the data without overfitting with the maximum likelihood
(ML) method\cite{ref:gpml}. The optimal values of these hyperparameters are
very useful for interpreting how accurate the dynamics model is: a high SNR
$\frac{\alpha}{\sigma_\varepsilon}$ suggests the model can predict very
well. Furthermore, automatic relevance detection is provided by the length
scales - if a variable is not useful for predicting, the ML length scale will
tend to $\infty$.
\subsubsection{Choice of State Vector}
\label{sec:state_vector}
The state vector $\boldsymbol{q}[n]$ should be chosen to ensure that the
future states are a function only of the current state, and current and future
control inputs. In other words, with a known controller the states should form
a Markov chain:
\[
P(\boldsymbol{q}[n+1] | \boldsymbol{q}[i]\textrm{ for } i = 1, \dots, n)
= P(\boldsymbol{q}[n+1] | \boldsymbol{q}[n])
\]
Forster's analysis suggested the following state to be suitable, which was
found by Rasmussen to be sufficient to model the ideal unicycle.
\[
\boldsymbol{q}[n] = \left[ \begin{array}{ll}
\dot{\theta} & \textrm{roll angular velocity} \\
\dot{\phi} & \textrm{yaw angular velocity} \\
\dot{\psi}_w & \textrm{wheel angular velocity} \\
\dot{\psi}_f & \textrm{pitch angular velocity} \\
\dot{\psi}_t & \textrm{flywheel (turntable) angular velocity} \\
x_c & \multirow{2}{*}{\textrm{position of target in unicycle's reference
frame}} \\
y_c & \\
\theta & \textrm{roll angle} \\
\psi_f & \textrm{pitch angle} \\
\end{array}\right]
\]
\begin{figure}[htbp]
\begin{center}
\includegraphics[height=8cm]{figures/angles.pdf}
\end{center}
\caption{Diagram showing Euler angles for the rotation of the unicycle
(from \cite{ref:forster})}
\label{fig:unicycle_angles}
\end{figure}
However, this ignores the presence of unobserved states in the system like
delays, backlash in the gears, etc. To help the dynamics model deal with these
problems, we tried giving it access to the previous state and control input,
in effect modelling it as a 2$^\textrm{nd}$ order Markov chain:
\[
\boldsymbol{q}_2[n] = \left[ \begin{array}{l}
\boldsymbol{q}[n-1] \\
\boldsymbol{u}[n-1] \\
\boldsymbol{q}[n] \\
\end{array} \right]
\]
Figure~\ref{fig:order1_trained_single} shows a prediction for a single
trajectory using the 1$^\textrm{st}$ order model, and
Figure~\ref{fig:order2_untrained_single} shows a prediction for the same
trajectory and controller with the 2$^\textrm{nd}$ order model.\footnote{It
is puzzling that the predicted trajectories go in the opposite direction
initially. It is possible that noise in the measured angular velocity misleads
the 1$^\textrm{st}$ order model, whereas the 2$^\textrm{nd}$ order model gets
two consecutive states, making the initial motion of the unicycle more clear.
Also, the initial state(s) come from a rollout using a different controller.
As such, the second state provided to the 2$^\textrm{nd}$ order model could
not be predicted by the 1$^\textrm{st}$ order model since it doesn't know the
controller.} Initially, it is much more certain about the motion of the
unicycle, although this certainty quickly diverges. When the controller is
optimised for the more accurate 2$^\textrm{nd}$ order model, the rate of
uncertainty growth is much reduced, as Figure~\ref{fig:order2_trained_single}
shows.
\begin{figure}[htbp]
\begin{center}
\subfigure[1$^\textrm{st}$ order, optimal controller]{
\includegraphics[width=4.5cm]{order1_trained_single.pdf}
\label{fig:order1_trained_single}
}
\subfigure[2$^\textrm{nd}$ order, controller of
Figure~\ref{fig:order1_trained_single}]{
\includegraphics[width=4.5cm]{order2_untrained_single.pdf}
\label{fig:order2_untrained_single}
}
\subfigure[2$^\textrm{nd}$ order, optimal controller]{
\includegraphics[width=4.5cm]{order2_trained_single.pdf}
\label{fig:order2_trained_single}
}
\end{center}
\caption{Pitch angle predictions with 1$^\textrm{st}$ and 2$^\textrm{nd}$
order dynamics models}
\label{fig:order_1_and_2}
\end{figure}
\subsubsection{Initial State Distribution}
\label{sec:initial_state_distribution}
In order to simulate the unicycle, the dynamics model requires an initial
distribution, which it can then transform with the estimated transition
function $\boldsymbol{f}$. For the RMLC system used, this must be a sum of a
finite number of multivariate Gaussian distributions. Using a 1$^\textrm{st}$
order Markov model, a simple and effective choice is a single Gaussian with a
diagonal covariance matrix representing the variety of possible initial
states. This is represented in Figure~\ref{fig:initial_order1_diag}.
\begin{figure}[htpb]
\begin{center}
\subfigure[Independent variables]{
\includegraphics[width=4.5cm]{initial_order1_diag.pdf}
\label{fig:initial_order1_diag}
}
\subfigure[Independence between same variable at different times]{
\includegraphics[width=4.5cm]{initial_order2_diag.pdf}
\label{fig:initial_order2_diag}
}
\subfigure[Correlation between same variable at different times]{
\includegraphics[width=4.5cm]{initial_order2_corr.pdf}
\label{fig:initial_order2_corr}
}
\end{center}
\caption{Contours of various Gaussian initial state distributions}
\label{fig:initial_state}
\end{figure}
When using a 2$^\textrm{nd}$ order Markov model, the initial
augmented state \\ $\boldsymbol{q}_2[0] = [\boldsymbol{q}[-1]\
\boldsymbol{u}[-1]\ \boldsymbol{q}[0]]^\textrm{T}$ contains two successive
states. The simplest choice is to use the same independence assumption as
before, as illustrated in Figure~\ref{fig:initial_order2_diag}. However, this
is an extremely non-physical distribution---the illustration shows how this
implies the possibility of crossing the entire region of possible starting
positions in a single timestep.\footnote{With the 0.05s timestep used in our
experiment, this suggests a possible starting velocity of 6 m s$^{-1}$, which
was not that case.} This leads to rapid growth in uncertainty in the position,
as shown in Figure~\ref{fig:pred_dist_orig}, and also means that the behaviour
in the trial bore little resemblance to the highly uncertain prediction.
However, when given the exact initial state, the dynamics model's prediction
is very similar to the trial's trajectory, as demonstrated by
Figure~\ref{fig:pred_dist_single}.\footnote{The trial actually follows a
trajectory slightly outside the prediction's confidence bounds: this is may be
due to measurement noise affecting the initial state estimate, as well as
inaccuracy in the dynamics model.}
\begin{figure}[htpb]
\begin{center}
\subfigure[Prediction with original initial distribution]{
\includegraphics[height=7cm]{pred_dist_orig.pdf}
\label{fig:pred_dist_orig}
}
\subfigure[Prediction with trial's initial state]{
\includegraphics[height=7cm]{pred_dist_single.pdf}
\label{fig:pred_dist_single}
}
\subfigure[Prediction with new initial distribution]{
\includegraphics[height=7cm]{pred_dist_new.pdf}
\label{fig:pred_dist_new}
}
\end{center}
\caption{Effect of initial distribution on predictions}
\label{fig:pred_dist}
\end{figure}
Upon realising this, a new initial state distribution was chosen, based on the
model that the two previous states are the same, but there is some small
measurement error, representing the fact that the states are, in reality,
different, and also that this model is not quite true. This distribution is
depicted in Figure~\ref{fig:initial_order2_corr}.
\begin{IEEEeqnarray*}{rCl}
q_{i,\textrm{initial}} &\sim& \mathcal{N}(\mu_i, \sigma_i^2) \\
q_i[-1] &\sim& q_{i,\textrm{initial}} + \mathcal{N}(0,
\sigma_{\varepsilon,i}) \\
q_i[0] &\sim& q_{i,\textrm{initial}} + \mathcal{N}(0,
\sigma_{\varepsilon,i}) \\
\begin{bmatrix}
q_i[-1] \\
q_i[0]
\end{bmatrix} &\sim& \mathcal{\boldsymbol{N}}\left(
\begin{bmatrix}
\mu_i \\ \mu_i
\end{bmatrix},
\begin{bmatrix}
\sigma_i^2 + \sigma_{\varepsilon,i}^2 & \sigma_i^2 \\
\sigma_i^2 & \sigma_i^2 + \sigma_{\varepsilon,i}^2
\end{bmatrix}\right)
\end{IEEEeqnarray*}
Since at this point several trials had already been performed, we were able
to estimate $\mu_i$, $\sigma_i$ and $\sigma_{\varepsilon,i}$ from the trial
data. Using this new initial distribution, the expected behaviour is much more
like the observed trial, as shown in Figure~\ref{fig:pred_dist_new}.
\subsubsection{Controller Form}
One of the most basic forms of controller is a linear controller,
$\boldsymbol{g}(\boldsymbol{q}[n]) = \boldsymbol{W} \boldsymbol{q}[n] +
\boldsymbol{p}$, where $\boldsymbol{W}$ is a matrix of weights and
$\boldsymbol{p}$ is a vector of offsets. This form is capable of stabilising
the ideal inverted pendulum, and indeed proved sufficient to stabilise the 2D
system.
However, the linear controller cannot generate the correct turning command as
shown in Figure~\ref{fig:xor}---this is equivalent to the XOR problem, and can
be solved by using a quadratic controller. This takes the form:
\[
g_i(\boldsymbol{q}[n]) = p_i + \sum_{j=1}^D w_{i,j} q_j[n] +
\sum_{j=1}^D \sum_{k=j}^D h_{i,j,k} q_j[n] q_k[n]
\]
\begin{figure}[htbp]
\begin{center}
\includegraphics[width=8cm]{figures/xor.pdf}
\end{center}
\caption{Correct turning command in unicycle-centred coordinates}
\label{fig:xor}
\end{figure}
The quadratic controller was sufficient to balance the ideal unicycle in
Rasmussen's simulations.
When switching to the 2$^\textrm{nd}$ order Markov model, it was found that
the controllers performed significantly better when using
$\boldsymbol{q}_2[n]$ as input, instead of $\boldsymbol{q}[n]$. To understand
this, consider the effect with a linear policy: the controller for the
augmented state is equivalent to a combination of a two-tap FIR filter and a
first-order IIR filter on the original state:
\begin{IEEEeqnarray*}{rCl}
\boldsymbol{u}[n] &=&
\boldsymbol{g}(\boldsymbol{q}_2[n]) = \boldsymbol{W} \boldsymbol{q}_2[n] \\
&=& \boldsymbol{W}_1 \boldsymbol{q}[n-1] + \boldsymbol{W}_2
\boldsymbol{u}[n-1] + \boldsymbol{W}_3 \boldsymbol{q}[n]
\end{IEEEeqnarray*}
This allows the RMLC system to create basic low or high-pass filters in the
controller, and this additional freedom improved both the loss function and
the subjective quality of the controllers trained.
Figure~\ref{fig:con1_and_con2} shows the effect of this change on the optimal
controllers.
\begin{figure}[htpb]
\begin{center}
\subfigure[Control input: $\boldsymbol{q}{[n]}$]{
\includegraphics[width=7cm]{order2_trained_con1_generic.pdf}
\label{fig:order2_trained_con1}
}
\subfigure[Control input: $\boldsymbol{q}_2{[n]}$]{
\includegraphics[width=7cm]{order2_trained_con2_generic.pdf}
\label{fig:order2_trained_con2}
}
\end{center}
\caption{Effect of providing augmented state to controller}
\label{fig:con1_and_con2}
\end{figure}
\subsubsection{Gathering Initial Data}
\label{sec:random_trials}
In order to train a controller, some experience of the system is required. The
way this has been done before is with \textbf{random trials}: the system
inputs are chosen randomly, and the response is recorded.
Rasmussen chose to use independent and identically distributed (IID) motor
commands when stabilising the ideal unicycle. McHutchon chose to use a
controller with randomly chosen weights.
We initially chose to use IID motor commands, chosen uniformly from the full
range. However, the dynamics model could not model the resulting data well. We
suspect that the same problems that make the 2$^\textrm{nd}$ order Markov
model (described in Section~\ref{sec:state_vector}) so advantageous mean that
if the motor command changes radically between cycles, the behaviour is very
unpredictable. Since the dynamics model cannot model signal-dependent noise
well, these datapoints cause a very uncertain model to be learnt.
We solved this problem by low-pass filtering the random motor commands: this prevents
sudden changes in the motor command, while providing a wider survey of
the space than a random controller and avoiding the risk of large positive
feedback.
\subsubsection{Loss Function}
\label{sec:loss_function}
The main concerns when choosing a loss function are to accurately represent
what is desired of the controller, and to ensure that different desires are
appropriately weighted. When stabilising the ideal unicycle, Rasmussen was
successful using the following form of loss function:
\[
1 - \mathcal{N}(\boldsymbol{q}[n]; \boldsymbol{\mu}, \boldsymbol{\Sigma})
= 1 - \exp\left(\sum_{d=1}^D - \frac{(q_d[n] - \mu_d)^2}{w_d^2}\right)
\]
This is configured using $w_d$, the characteristic widths for each variable.
To ignore a variable, take $w_d^{-2} = 0$. While the RMLC system is not too
sensitive to these values, it is important to set them appropriately. During a
test run on the 2D system, we accidentally set the characteristic pitch angle
to $10^\circ$ and the characteristic distance (from the origin) to around
10cm. This meant that the preferred policy was to fall over immediately, to
prevent travelling away from the origin. Loosening the characteristic distance
to 30cm led to a controller that balanced the system.
For the ideal unicycle, Rasmussen chose to penalise the pitch and roll angles,
to encourage the system to keep the robot vertical. In addition, he chose to
penalise the yaw rate $\dot{\phi}$ and flywheel rate $\dot{\psi}_t$ to prevent
the system from using a ``spinning-top'' like approach to balance, and to
penalise the distances from the origin, $x_c$ and $y_c$, to prevent the system
from driving the robot very fast to enhance stability.
We used a similar loss function structure, with characteristic widths of
$9^\circ$ on the angles, and $\frac{1}{2}$ metres on the
distances.\footnote{We initially used a distance loss width of 1 metre, but
upon realising how little room the unicycle had to manoeuvre we reduced the
width to $\frac{1}{2}$ metres to represent the importance of staying within
the rope's range. This did not appear to have a dramatic effect on the
resulting controllers.} Unlike
the ideal unicycle, the real unicycle has an upper limit on the flywheel
speed, so the ``spinning-top'' strategy is not viable, and it is not important
to penalise the flywheel speed.
\subsubsection{Timestep}
\label{sec:timestep}
The RMLC approach assumes a discrete-time system - to apply it to a continuous
time system, it must be discretised. We used a zero-order hold (ZOH) on the
control signal: $u(t)~=~u[\lfloor\frac{t}{T}\rfloor]$.
Training for the ideal unicycle, Rasmussen found it desirable to choose the
largest timestep $T$ with which the system can be stabilised, to reduce the
computational cost of the optimisation, and to avoid problems with noise
buildup from many repeated applications of the transition function
$\boldsymbol{f}$. He chose a timestep of $\frac{1}{7}$ seconds.
However, for the real unicycle, both McHutchon and we found that the predicted
trajectories are more confident and reliable, and controllers perform better,
when using a timestep of $\frac{1}{20}$ seconds. It was hoped that this was
because the sensors were noisier in real life, and so a shorter time in the
ZOH led to several successive noisy control settings being averaged by the
low-pass effect of the system, reducing the effect of noise. However,
upgrading the sensors didn't seem to change this, so there is possibly another
cause.
It is hard to provide a clear demonstration of the advantage of the faster
timestep, since trials must be re-conducted and so the datasets are different.
In addition, since the training sequence with a 50ms timestep developed better
controllers, trials lasted longer and more data was gathered. As such,
to compare dynamics models based on similar numbers of datapoints, we must
provide fewer trial logs for the 50ms dynamics model. This means that it has
had less of a chance to try a variety of strategies, and necessarily is less
certain about the dynamics of the system. That said,
Figure~\ref{fig:timestep_effect} shows the effect of reducing the timestep,
comparing dynamics models with similar numbers of points
(Figures \ref{fig:angle_100ms} and \ref{fig:angle_50ms}) and after the same
number of trials (Figures \ref{fig:angle_100ms} and \ref{fig:angle_50ms_8th}).
\begin{figure}[htpb]
\begin{center}
\subfigure[100ms, 8 trials, 542 points]{
\includegraphics[width=4.5cm]{angle_100ms.pdf}
\label{fig:angle_100ms}
}
\subfigure[50ms, 5 trials, 556 points]{
\includegraphics[width=4.5cm]{angle_50ms.pdf}
\label{fig:angle_50ms}
}
\subfigure[50ms, 8 trials, 824 points]{
\includegraphics[width=4.5cm]{angle_50ms_8th.pdf}
\label{fig:angle_50ms_8th}
}
\end{center}
\caption{Predictions with datasets at 100ms timesteps and 50ms timesteps}
\label{fig:timestep_effect}
\end{figure}
\section{Hardware and Software}
When the project started, some of the required tasks were already clear, as a
result of the previous work on the unicycle. These included extensive changes
to the hardware and a complete rewrite of the controller software, both
required to fix issues that had been turned up by the previous year's work
and to prepare it for 3D balance. A brief summary of the changes is below,
followed by greater detail where necessary.
\begin{itemize*}
\item The cumbersome FPGA-based controller used previously was replaced with
an Arduino microcontroller, allowing changes to be made faster and more
easily.
\item Older gyroscopes and very noisy angle sensors were replaced by modern
MEMS gyroscopes and accelerometers.
\item Algorithms for keeping track of angle and position in 3D were written.
\item A new motor controller was built for the flywheel, as well as a
quadrature encoder for speed sensing. Additionally, a different speed
sensing algorithm was implemented.
\item A sensor for the battery voltage was added, to ensure it could supply
sufficient power to the motors.
\item We designed and tested a number of methods of protecting the unicycle
from falling, while allowing 3D movement.
\item Some faults were identified and could not be fixed, so tests and
error-checking were added to ensure these did not corrupt the data.
\item To improve safety, the motor is disabled when the operator presses a
switch on the chassis or when the unicycle nears the ground.
\end{itemize*}
\begin{figure}[htpb]
\begin{center}
\includegraphics[width=15cm]{figures/hardware.pdf}
\end{center}
\caption{Summary of changes to unicycle hardware}
\label{fig:hardware}
\end{figure}
\subsection{Sensing}
\subsubsection{Angle Sensing}
The unicycle had previously used a vibrating crystal rate gyroscope and a pair
of infra-red distance sensors to keep track of its angle. However, these were
reported to be extremely noisy, and were blamed for a large part of the poor
performance of the previous system. Our supervisor had already purchased the
ITG-3200, a 3-axis MEMS rate gyroscope with in-built temperature compensation
and low-pass filtering. We later purchased an ADXL345, a 3-axis MEMS
accelerometer, to determine the absolute angle of the unicycle.
The ITG-3200 rate gyro determines the angular velocity of the chip, and thus
the unicycle, around each of its 3 axes. These values are offset by some
unknown bias---this is determined before the trial by averaging the output
when stationary, and is assumed to be approximately constant during a
trial.\footnote{The bias drift over 10 seconds was measured to be negligible.
For longer trials, this would have to be taken into account.} After removing
the bias, the readings can be converted to radians per second with a
calibration factor from the datasheet \cite{ref:itg3200}.
To keep track of the unicycle's orientation, these angular velocities must be
integrated. There are 3 commonly used ways to keep track of rotations in 3D,
of which unit quaternions were chosen as best suited to the project.
\begin{description}
\item[Euler angles]
These are the yaw, roll and pitch angles shown in
Figure~\ref{fig:unicycle_angles}. They are required for the controller,
and so other forms must be converted to them. However, they suffer from
gimbal lock---in certain positions, it is impossible to represent a
small rotation with a small change in the Euler angles. They also require
trigonometric functions in the integration loop: a problem for fast
integration on embedded platforms.
\item[Direction Cosine Matrices (DCMs)]
A DCM is an orthonormal rotation matrix, representing the rotation from
the global coordinate system to the body's coordinate system. They require
only basic linear algebra to understand. However, when errors in
integration accumulate, the matrix will no longer be orthonormal, and
there is no clear way to fix this.
\item[Unit quaternions]
Quaternions extend the complex numbers into 3D. Just as a complex number
of unit magnitude can represent a rotation in the 2D plane, a quaternion
of unit magnitude can represent a rotation in 3D. Although they may appear
confusing, they have the fastest integration step, do not suffer from
gimbal lock and can be normalised by simply dividing by the magnitude.
\end{description}
Having chosen to use quaternions to track the orientation, various formulae
had to be derived to convert to and from this form. These mathematical
techniques are summarised here---they are reasonably simple to derive by
considering the DCM as a composition of Euler rotations.
\paragraph{Quaternion Integration}
Given the unit quaternion representing a rotation from the global coordinate
system to that of the unicycle, $\boldsymbol{q}[n] = q_w + q_x
\boldsymbol{i} + q_y \boldsymbol{j} + q_z \boldsymbol{k}$, and rate gyro
outputs $\omega_x$, $\omega_y$ and $\omega_z$, we can integrate
the angular velocities with $\boldsymbol{q}[n+1] = \boldsymbol{q}[n] \left(1 +
\frac{\omega_x \Delta t}{2} \boldsymbol{i} +
\frac{\omega_y \Delta t}{2} \boldsymbol{j} +
\frac{\omega_z \Delta t}{2} \boldsymbol{k}\right)$. For more details on this,
see \cite{ref:quaternions}.
\paragraph{Euler Angles of a Quaternion}
Unfortunately, the mechanical analysis of Forster, and thus the simulation of
the ideal unicycle, uses a different angle convention to all other sources
located. We chose to continue with this convention, and so the expressions for
converting a quaternion to Euler angles had to be rederived. Using a
convention of yaw around the vertical $y$-axis, roll around the forward
$x$-axis and pitch around the sideways $z$-axis (the axes of the mounted
gyroscope) we get:
\begin{IEEEeqnarray*}{rCl}
\phi &=& \tan^{-1}\left(\frac{2q_xq_y +
2q_yq_w}{-q_x^2-q_y^2+q_z^2+q_w^2}\right) \\
\theta &=& \sin^{-1}\left(2q_wq_x - 2q_yq_z\right) \\
\psi_f &=& \tan^{-1}\left(\frac{2q_xq_y + 2q_zq_w}{-q_x^2+q_y^2-q_z^2+q_w^2}
\right)\\
\end{IEEEeqnarray*}
\paragraph{Angular Velocities}
To calculate the angular rates, $\dot{\phi}$, $\dot{\theta}$ and
$\dot{\psi}_f$, we convert the quaternion to a DCM, and then apply expressions
for the derivatives of the Euler angles of a DCM. This could cause a division
by zero in a gimbal lock situation, but fortunately the unicycle never reaches
such positions.
\begin{IEEEeqnarray*}{rCl}
\left[\begin{array}{lll}
d_{11} & d_{12} & d_{13} \\
d_{21} & d_{22} & d_{23} \\
d_{31} & d_{32} & d_{33} \\
\end{array}\right] &=&
\left[\begin{array}{lll}
1-2q_y^2-2q_z^2 & 2q_xq_y-2q_zq_w & 2q_xq_z+2q_yq_w \\
2q_xq_y+2q_zq_w & 1-2q_x^2-2q_z^2 & 2q_yq_z-2q_xq_w \\
2q_xq_z-2q_yq_w & 2q_yq_z+2q_xq_w & 1-2q_x^2-2q_y^2 \\
\end{array}\right] \\
\dot{\phi} &=& \frac{(d_{12}d_{32}-d_{12}d_{33})\omega_x +
(d_{11}d_{33}-d_{13}d_{31})\omega_y}{d_{13}^2 + d_{33}^2} \\
\dot{\theta} &=& \frac{d_{22} \omega_x - d_{21}
\omega_y}{\sqrt{1-d_{23}^2}} \\
\dot{\psi}_f &=& -\frac{d_{23}(d_{21}\omega_x + d_{22}\omega_y)}{d_{21}^2 +
d_{22}^2} + \omega_z
\end{IEEEeqnarray*}
\paragraph{Initial Angle}
To determine the initial angle of the unicycle, we take an average
accelerometer reading while the unicycle is initially stable. This is a 3D
vector, representing the direction of gravity in the reference frame of the
rotated unicycle. By considering the DCM resulting from rotations in pitch and
roll, we can determine the pitch and roll angles, and use these to construct
an initial rotation quaternion.
\begin{IEEEeqnarray*}{rCl}
\boldsymbol{D} &=& \left[\begin{array}{lll}
\cos \psi_f & -\sin\psi_f & 0 \\
\cos \theta \sin \psi_f & \cos \theta \cos \psi_f & -\sin \theta \\
\sin \theta \sin \psi_f & \sin \theta \cos \psi_f & \cos \theta
\end{array}\right] \\
\left[\begin{array}{l}
a_x \\ a_y \\ a_z
\end{array}\right] &=&
\left[\begin{array}{l}
\cos \theta \sin \psi_f \\ \cos \theta \cos \psi_f \\ -\sin \theta
\end{array}\right]\\
\theta &=& -\sin^{-1}(a_z) \\
\psi_f &=& \tan^{-1}\left(\frac{a_x}{a_y}\right) \\
\boldsymbol{q} &=& \left(\cos\left(\frac{\theta}{2}\right) +
\boldsymbol{i}\sin\left(\frac{\theta}{2}\right)\right)
\left(\cos\left(\frac{\psi_f}{2}\right) +
\boldsymbol{k}\sin\left(\frac{\psi_f}{2}\right)\right)
\end{IEEEeqnarray*}
\paragraph{Gyro Noise and Drift}
The approach described above is vulnerable to drift: accumulating errors in
the integration of the gyroscope, especially when the zero-offset of the
gyroscope changes. To evaluate the effect of this, the gyro was sampled for 10
seconds when stationary. The rate readings are shown in
Figure~\ref{fig:gyro_stat}. It is clear that the noise and drift are on the
same order of magnitude, over a 10 second trial, the zero-offset changed by
about 0.1 deg s$^{-1}$, leading to a drift of below 1$^\circ$. This was judged
as acceptable. If we wished to conduct longer trials, we could use one of a
variety of fusion methods such as state observers, Kalman filters or one of
many highly tuned implementations developed by UAV hobbyists
\cite{ref:gluonpilot}.
\begin{figure}[htpb]
\begin{center}
\includegraphics[width=15cm]{figures/gyro_stat.pdf}
\end{center}
\caption{Gyro readings when stationary}
\label{fig:gyro_stat}
\end{figure}
\paragraph{Accelerometer Noise}
Figure~\ref{fig:acc_stat} shows the accelerometer noise when stationary (the
mean reading has been subtracted).\footnote{Note that the noise characteristic
is very different for one axis compared to the other 2 - this is because this
axis is perpendicular to the chip, and as such is built differently.} These
noises correspond to noise standard deviations for pitch and roll of about
0.5$^\circ$, but this can be reduced by averaging many readings. More detail
on the use of the accelerometer can be found in Douglass's report.
\begin{figure}[htpb]
\begin{center}
\includegraphics[height=8cm]{figures/acc_stat.pdf}
\end{center}
\caption{Accelerometer readings when stationary}
\label{fig:acc_stat}
\end{figure}
\subsubsection{Position Sensing}
\label{sec:position_sensing}
In order to reduce the number of state variables necessary, Rasmussen used
self-centred coordinates to keep track of the position of the target. In these
coordinates, the unicycle is at (0, 0) and faces along the positive $x$-axis,
as shown in Figures~\ref{fig:xor}~and~\ref{fig:scc}. At each timestep, we must
use the change in yaw angle $\Delta\phi$ and the change in wheel angle
$\Delta\psi_w$ (along with wheel radius $r_w$) to calculate the new target
position. From Figure~\ref{fig:scc} we can see that $x_c$ and $y_c$ must be
modified as follows:
\[
\begin{bmatrix}
x_c[n+1] \\
y_c[n+1]
\end{bmatrix} =
\begin{bmatrix}
\cos(-\Delta\phi) & -\sin(-\Delta\phi) \\
\sin(-\Delta\phi) & \cos(-\Delta\phi)
\end{bmatrix}
\begin{bmatrix}
x_c[n] \\
y_c[n]
\end{bmatrix} +
\begin{bmatrix}
-r_w\Delta \psi_w \\
0
\end{bmatrix}
\]
\begin{figure}[htpb]