-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmgl.tex
328 lines (285 loc) · 11.5 KB
/
mgl.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
\documentclass[adraft,copyright,creativecommons]{eptcs}
\providecommand{\event}{THedu'11}
%\usepackage{breakurl} % Not needed if you use pdflatex only.
\usepackage{listings}
\usepackage{color}
\definecolor{light-gray}{gray}{0.95}
\title{The GF Mathematics Library}
\author{Jordi Saludes
\institute{Universitat Polit\`ecnica de Catalunya}
\institute{Sistemes Avan\c cats de Control}
%Edifici GAIA, Terrassa (Spain)
\email{[email protected]}
\and
Sebastian Xamb\'o
\institute{Universitat Polit\`ecnica de Catalunya}
\institute{MA2, Edifici OMEGA, Barcelona (Spain)}
\email{\quad [email protected]}
}
\def\titlerunning{GF Math Library}
\def\authorrunning{J. Saludes \& S. Xamb\'o}
\begin{document}
\lstset{backgroundcolor=\color{light-gray},
breaklines=true,
basicstyle=\ttfamily,
tabsize=2} % language=GF
\maketitle
\newcommand{\molto}{\textsc{mOlto}}
\newcommand{\webalt}{\textsc{WebALT}}
\newcommand{\openmath}{\textsc{OpenMath}}
\newcommand{\CD}{\textsc{CD}}
\newcommand{\OM}{\textsc{OM}}
\newcommand{\MGL}{\textsc{MGL}}
\newcommand{\MMA}{\textsc{MMA}}
\newcommand{\CAS}{\textsc{CAS}}
\newcommand{\GF}{\textsc{GF}}
\newcommand{\Nat}{\texttt{Nat}}
\newcommand{\Prop}{\texttt{Prop}}
\begin{abstract}
The aim of this paper is to present
the \emph{Mathematics Grammar Library}.
The main points are
the context in which it originated,
its current design and functionality
and the present development goals.
We also comment on possible future applications
in the area of artificial mathematics assistants.
\end{abstract}
\section{Introduction}
An archetypal meeting point for natural language processing and mathematics
education is the realm of \emph{word problems}
(\cite{wikipedia-wordproblem, Verschaffel-Greer-DeCorte-2000}), a realm in
which mechanised mathematics assistants (\MMA) are expected to play an ever
more prominent role in the years to come. The Mathematics Grammar Library
(\MGL) presented in this paper is not an \MMA, but our working hypothesis
is that it is a good basis on which to build useful \MMA's for learning and
teaching (cf. \cite{E-LearningMathematics, AutonomousLearners} for some
general clues on e-learning technologies).
In fact, we regard \MGL\ as an enabling technology for \emph{multilingual}
dialog systems capable of helping students in learning how to solve word
problems. This assessment is based on the \MGL\ potential capabilities
for dealing effectively with a mixture of text and mathematical
expressions, and also for managing interactions with ancillary \CAS{}
systems.
To be more specific, the current general aim for MGL is to provide natural
language services for mathematical constructs at the level of high school
and college freshmen linear algebra and calculus. At the present stage,
the concrete goal is to provide rendering of simple mathematical exercises
in multiple languages (see \cite{MathBar} for a demo).
\section{Origin}
For a closer view of \MGL, let us look briefly at its origins. The
idea behind \MGL\ was born, to a good extend, on reflecting about one of
the key results of the \webalt{} project (see
\cite{Caprotti_multilingualdelivery, Caprotti06webalt!deliver}). In
summary, the unfolding of this reflection went as follows.
One of the aims of \webalt{} was to produce a proof-of-concept
platform for the creation of a multilingual repository of \emph{simple}
mathematical problems with guaranteed quality of the (machine)
translations, in both linguistic and mathematical terms. The languages
envisioned were Catalan,
English,
Finnish,
French,
Italian and
Spanish.
Of these, Finnish, with its great complexities, could not be raised to
the same level of functionality as the others.
The WebALT prototype was successful and, as far as we know, that endeavour
brought about the first application of the GF system (\cite{GF,Ranta11})
for the multilingual translation of simple mathematical questions. The
powerful \GF\ scheme, based on the perfect interlocking of abstract and
concrete grammars, was found to be a very sound choice, but the solution
had several shortcommings that could not be addressed in that project.
For the present purposes, the three that worried us the most are the
following:
\begin{itemize}
\item
The grammars did not work for later versions of \GF\ ($>$2.9).
\item
The library was not modular with respect to
semantic processing, and hence not easy to maintain.
\item
It included too few languages, especially as seen from an Europan
perspective.
\end{itemize}
The springboard for the present library was the need to properly solve
these problems, inasmuch as this was regarded as one of the most promissing
prerequisites for all further advanced developments in machine processing
of mathematical texts. Thus the main tasks were:
\begin{itemize}
\item
To design a \emph{modular} mathematics library structured according to
the semantic standards (content dictionaries) of \openmath{}
(\cite{OpenMath}).
\item
To code it in the much more expressive \GF~3.1 for the few languages
mentioned above, and
\item
To write new code for a few additional languages (Bulgarian, Finnish,
German, Romanian and Sweedish).
\end{itemize}
The first two points amount to a tidying of the original \webalt{}
programming methods. The third point represents not merely an addition
of a few more languages, but a thourough testing of the methods and
procedures enforced in the preceding steps. This testing is important in
order to secure the rules for the inclusion of further languages and for
a controlled uniform extension of the available grammars.
\section{Some basic \GF\ notions}
For convenience, we include a few notions about the \GF\
system that will ease the considerations about MGL in the next section.
For a thourough reference, see \cite{Ranta11}.
Any GF application begins by specifying its \emph{abstract syntax}. This
syntax contains declarations of \emph{categories} (the \GF\ name for
types) and \emph{functions} (the \GF\ name for constructor signatures) and
has to capture the \emph{semantic structure} of the application domain.
For example, to let \Nat{} stand for the type of natural numbers and
\Prop{} for propositions about natural numbers, the \GF\ syntax is
{\small
\begin{verbatim}
cat Nat, Prop ;
\end{verbatim}
}
That `zero is a natural number' and that `the successor of
any natural number is a natural number' can be expressed as follows:
{\small
\begin{verbatim}
fun
Zero : Nat ;
Succ : Nat -> Nat ;
\end{verbatim}
}
The signatures for `even number' and `prime number' can be captured with
{\small
\begin{verbatim}
fun
Even, Prime : Nat -> Prop ;
\end{verbatim}
}
Finally, we can abstract the logical `not', `and' and `or' as follows:
{\small
\begin{verbatim}
fun
Not : Prop -> Prop ;
And, Or : Prop -> Prop -> Prop ;
\end{verbatim}
}
In practical terms, these declarations would form the \emph{body} of an
\emph{abstract module} that would have the form
{\small
\begin{verbatim}
abstract Arith = {<body>}
\end{verbatim}
}
where \texttt{Arith} is the name of the module.
\section{The library} % (fold)
\label{sec:the_library}
\MGL\ is a collection of GF modules. The categories used in these modules
are in correspondence with all possible combinations of \textbf{Variable}
and \textbf{Value} with the mathematical types
\textbf{Number}, \textbf{Set}, \textbf{Tensor} and \textbf{Function}.
Thus the category \texttt{VarNum} denotes a numeric
variable like $x$, while \texttt{ValSet} denotes an actual set like ``the
domain of the natural logarithm''. The distinction between variables and
values allows us to type-check productions like lambda abstractions that
require a variable as the first argument. Obviously variables can be
promoted to values when needed.
Other categories stand for propositions, geometric constructions and
indices.
The library is organised in a matrix-like form, with an horizontal axis
ranging over the targeted natural languages. At the moment these are:
Bulgarian,
Catalan, English, Finnish, French, German, Italian, Romanian, Spanish and
Swedish.
The vertical axis is for complexity and contains from bottom to top, three
layers:
\begin{enumerate}
\item\emph{Ground}. It deals with literals, indices and variables.
\item\emph{OpenMath}. It is modelled after the \OM{ } \emph{Content
Dictionaries} (\CD's). Each \CD{} defines a collection of
mathematical objects. This is a \emph{de facto} standard for mathematical
semantics and for each \emph{Content Dictionary}
there is a corresponding module in this layer.
The \CD's are collected by the
\emph{OpenMath consortium} \cite{OpenMath}.
\item\emph{Operations}.
This layer takes care of simple mathematical exercises. These appear
in drilling materials and usually begin with directives such as
`Compute', `Find', `Prove', `Give an example of', \ldots
\end{enumerate}
The following tree is an example belonging to the \emph{OpenMath} layer:
\begin{lstlisting}
mkProp
(lt_num
(abs (plus (BaseValNum (Var2Num x) (Var2Num y))))
(plus (BaseValNum (abs (Var2Num x)) (abs (Var2Num y)))))
\end{lstlisting}
When linearized with the Spanish concrete grammar, it yields
\begin{quote}
El valor absoluto de la suma de x y de y es menor que la suma del valor
absoluto de x y del valor absoluto de y
\end{quote}
Similarly, the tree
\begin{lstlisting}
DoSelectFromN
(Var2Num y)
(domain (inverse tanh))
(mkProp
(gt_num
(At cosh (Var2Num y))
pi))
\end{lstlisting}
gives, when linearized with the English concrete grammar:
\begin{quote}
Select y from the domain of the inverse of the hyperbolic tangent such that
the hyperbolic cosine of y is greater than pi.
\end{quote}
% section the_library (end)
%%%%%%%%%%
% un exemple d'Operations?
\section{Conclusions and further work}
After a first step in which the main concern was tidying and modularizing
the \webalt{} prototype for simple mathematics exercises for the languages
Catalan,
English,
%Finnish,
French,
Italian and
Spanish,
we have extended it, in a second step, with the languages
Bulgarian, Finnish,
German, Romanian and Sweedish
(Finnish was considered in the first step, but it had to be worked out from
scratch).
In this note we have described the \GF\ library produced so far for
multilingual mathematical text processing, which we call
\MGL\ (Mathematics Grammar Library). We have also indicated how it
originated in the \webalt\ project, its relation to \GF, and its
present functionality.
Further work has three main lines:
\begin{itemize}
\item
Addition of new languages,
like Danish, Dutch, Norwegian, Polish, Portuguese, Russian, \ldots
This is a continuation of the first two steps referred to above and
our assessment is that it can be done reliably with the methods and
procedures established so far. To some extent, the library modules for a
new language can be generated automatically up to a point in which the
remaining work corresponds to natives in that language.
\item
Describing a controlled procedure for the uniform and reliable extension
of the grammars according to new semantic needs. This is an important step
that is being researched from several angles. One important point is to
ascertain when a piece of mathematical text requires functionalities
(categories, constructors, operations) not covered yet.
\item
Advancing in the use of \MGL\ for the production of ever more
sophisticated artificial mathematics assistants. This is also the focus of
current research that includes a collaboration with statistical machine
translation methods, as in principle they can suggest grammatical structures
out of a corpus of mathematical sentences.
\end{itemize}
\nocite{*}
\bibliographystyle{eptcsstyle/eptcs}
%\bibliographystyle{eptcs}
\bibliography{webalt}
\end{document}