-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpeg.1
933 lines (850 loc) · 27.9 KB
/
peg.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
.\" Copyright (c) 2007 by Ian Piumarta
.\" All rights reserved.
.\"
.\" Permission is hereby granted, free of charge, to any person obtaining a
.\" copy of this software and associated documentation files (the 'Software'),
.\" to deal in the Software without restriction, including without limitation
.\" the rights to use, copy, modify, merge, publish, distribute, and/or sell
.\" copies of the Software, and to permit persons to whom the Software is
.\" furnished to do so, provided that the above copyright notice(s) and this
.\" permission notice appear in all copies of the Software. Acknowledgement
.\" of the use of this Software in supporting documentation would be
.\" appreciated but is not required.
.\"
.\" THE SOFTWARE IS PROVIDED 'AS IS'. USE ENTIRELY AT YOUR OWN RISK.
.\"
.\" Last edited: 2012-04-29 16:58:44 by piumarta on emilia
.\"
.TH PEG 1 "April 2012" "Version 0.1"
.SH NAME
peg, leg \- parser generators
.SH SYNOPSIS
.B peg
.B [\-hvV \-ooutput]
.I [filename ...]
.sp 0
.B leg
.B [\-hvV \-ooutput]
.I [filename ...]
.SH DESCRIPTION
.I peg
and
.I leg
are tools for generating recursive-descent parsers: programs that
perform pattern matching on text. They process a Parsing Expression
Grammar (PEG) [Ford 2004] to produce a program that recognises legal
sentences of that grammar.
.I peg
processes PEGs written using the original syntax described by Ford;
.I leg
processes PEGs written using slightly different syntax and conventions
that are intended to make it an attractive replacement for parsers
built with
.IR lex (1)
and
.IR yacc (1).
Unlike
.I lex
and
.IR yacc ,
.I peg
and
.I leg
support unlimited backtracking, provide ordered choice as a means for
disambiguation, and can combine scanning (lexical analysis) and
parsing (syntactic analysis) into a single activity.
.PP
.I peg
reads the specified
.IR filename s,
or standard input if no
.IR filename s
are given, for a grammar describing the parser to generate.
.I peg
then generates a C source file that defines a function
.IR yyparse().
This C source file can be included in, or compiled and then linked
with, a client program. Each time the client program calls
.IR yyparse ()
the parser consumes input text according to the parsing rules,
starting from the first rule in the grammar.
.IR yyparse ()
returns non-zero if the input could be parsed according to the
grammar; it returns zero if the input could not be parsed.
.PP
The prefix 'yy' or 'YY' is prepended to all externally-visible symbols
in the generated parser. This is intended to reduce the risk of
namespace pollution in client programs. (The choice of 'yy' is
historical; see
.IR lex (1)
and
.IR yacc (1),
for example.)
.SH OPTIONS
.I peg
and
.I leg
provide the following options:
.TP
.B \-h
prints a summary of available options and then exits.
.TP
.B \-ooutput
writes the generated parser to the file
.B output
instead of the standard output.
.TP
.B \-v
writes verbose information to standard error while working.
.TP
.B \-V
writes version information to standard error then exits.
.SH A SIMPLE EXAMPLE
The following
.I peg
input specifies a grammar with a single rule (called 'start') that is
satisfied when the input contains the string "username".
.nf
start <- "username"
.fi
(The quotation marks are
.I not
part of the matched text; they serve to indicate a literal string to
be matched.) In other words,
.IR yyparse ()
in the generated C source will return non-zero only if the next eight
characters read from the input spell the word "username". If the
input contains anything else,
.IR yyparse ()
returns zero and no input will have been consumed. (Subsequent calls
to
.IR yyparse ()
will also return zero, since the parser is effectively blocked looking
for the string "username".) To ensure progress we can add an
alternative clause to the 'start' rule that will match any single
character if "username" is not found.
.nf
start <- "username"
/ .
.fi
.IR yyparse ()
now always returns non-zero (except at the very end of the input). To
do something useful we can add actions to the rules. These actions
are performed after a complete match is found (starting from the first
rule) and are chosen according to the 'path' taken through the grammar
to match the input. (Linguists would call this path a 'phrase
marker'.)
.nf
start <- "username" { printf("%s\\n", getlogin()); }
/ < . > { putchar(yytext[0]); }
.fi
The first line instructs the parser to print the user's login name
whenever it sees "username" in the input. If that match fails, the
second line tells the parser to echo the next character on the input
the standard output. Our parser is now performing useful work: it
will copy the input to the output, replacing all occurrences of
"username" with the user's account name.
.PP
Note the angle brackets ('<' and '>') that were added to the second
alternative. These have no effect on the meaning of the rule, but
serve to delimit the text made available to the following action in
the variable
.IR yytext .
.PP
If the above grammar is placed in the file
.BR username.peg ,
running the command
.nf
peg -o username.c username.peg
.fi
will save the corresponding parser in the file
.BR username.c .
To create a complete program this parser could be included by a C
program as follows.
.nf
#include <stdio.h> /* printf(), putchar() */
#include <unistd.h> /* getlogin() */
#include "username.c" /* yyparse() */
int main()
{
while (yyparse()) /* repeat until EOF */
;
return 0;
}
.fi
.SH PEG GRAMMARS
A grammar consists of a set of named rules.
.nf
name <- pattern
.fi
The
.B pattern
contains one or more of the following elements.
.TP
.B name
The element stands for the entire pattern in the rule with the given
.BR name .
.TP
.BR \(dq characters \(dq
A character or string enclosed in double quotes is matched literally.
The ANSI C esacpe sequences are recognised within the
.IR characters .
.TP
.BR ' characters '
A character or string enclosed in single quotes is matched literally, as above.
.TP
.BR [ characters ]
A set of characters enclosed in square brackets matches any single
character from the set, with escape characters recognised as above.
If the set begins with an uparrow (^) then the set is negated (the
element matches any character
.I not
in the set). Any pair of characters separated with a dash (-)
represents the range of characters from the first to the second,
inclusive. A single alphabetic character or underscore is matched by
the following set.
.nf
[a-zA-Z_]
.fi
Similarly, the following matches any single non-digit character.
.nf
[^0-9]
.fi
.TP
.B .
A dot matches any character. Note that the only time this fails is at
the end of file, where there is no character to match.
.TP
.BR ( \ pattern\ )
Parentheses are used for grouping (modifying the precendence of the
operators described below).
.TP
.BR { \ action\ }
Curly braces surround actions. The action is arbitray C source code
to be executed at the end of matching. Any braces within the action
must be properly nested. Any input text that was matched before the
action and delimited by angle brackets (see below) is made available
within the action as the contents of the character array
.IR yytext .
The length of (number of characters in)
.I yytext
is available in the variable
.IR yyleng .
(These variable names are historical; see
.IR lex (1).)
.TP
.B <
An opening angle bracket always matches (consuming no input) and
causes the parser to begin accumulating matched text. This text will
be made available to actions in the variable
.IR yytext .
.TP
.B >
A closing angle bracket always matches (consuming no input) and causes
the parser to stop accumulating text for
.IR yytext .
.PP
The above
.IR element s
can be made optional and/or repeatable with the following suffixes:
.TP
.RB element\ ?
The element is optional. If present on the input, it is consumed and
the match succeeds. If not present on the input, no text is consumed
and the match succeeds anyway.
.TP
.RB element\ +
The element is repeatable. If present on the input, one or more
occurrences of
.I element
are consumed and the match succeeds. If no occurrences of
.I element
are present on the input, the match fails.
.TP
.RB element\ *
The element is optional and repeatable. If present on the input, one or more
occurrences of
.I element
are consumed and the match succeeds. If no occurrences of
.I element
are present on the input, the match succeeds anyway.
.PP
The above elements and suffixes can be converted into predicates (that
match arbitray input text and subsequently succeed or fail
.I without
consuming that input) with the following prefixes:
.TP
.BR & \ element
The predicate succeeds only if
.I element
can be matched. Input text scanned while matching
.I element
is not consumed from the input and remains available for subsequent
matching.
.TP
.BR ! \ element
The predicate succeeds only if
.I element
cannot be matched. Input text scanned while matching
.I element
is not consumed from the input and remains available for subsequent
matching. A popular idiom is
.nf
!.
.fi
which matches the end of file, after the last character of the input
has already been consumed.
.PP
A special form of the '&' predicate is provided:
.TP
.BR & {\ expression\ }
In this predicate the simple C
.I expression
.RB ( not
statement) is evaluated immediately when the parser reaches the
predicate. If the
.I expression
yields non-zero (true) the 'match' succeeds and the parser continues
with the next element in the pattern. If the
.I expression
yields zero (false) the 'match' fails and the parser backs up to look
for an alternative parse of the input.
.PP
Several elements (with or without prefixes and suffixes) can be
combined into a
.I sequence
by writing them one after the other. The entire sequence matches only
if each individual element within it matches, from left to right.
.PP
Sequences can be separated into disjoint alternatives by the
alternation operator '/'.
.TP
.RB sequence-1\ / \ sequence-2\ / \ ...\ / \ sequence-N
Each sequence is tried in turn until one of them matches, at which
time matching for the overall pattern succeeds. If none of the
sequences matches then the match of the overall pattern fails.
.PP
Finally, the pound sign (#) introduces a comment (discarded) that
continues until the end of the line.
.PP
To summarise the above, the parser tries to match the input text
against a pattern containing literals, names (representing other
rules), and various operators (written as prefixes, suffixes,
juxtaposition for sequencing and and infix alternation operator) that
modify how the elements within the pattern are matched. Matches are
made from left to right, 'descending' into named sub-rules as they are
encountered. If the matching process fails, the parser 'back tracks'
('rewinding' the input appropriately in the process) to find the
nearest alternative 'path' through the grammar. In other words the
parser performs a depth-first, left-to-right search for the first
successfully-matching path through the rules. If found, the actions
along the successful path are executed (in the order they were
encountered).
.PP
Note that predicates are evaluated
.I immediately
during the search for a successful match, since they contribute to the
success or failure of the search. Actions, however, are evaluated
only after a successful match has been found.
.SH PEG GRAMMAR FOR PEG GRAMMARS
The grammar for
.I peg
grammars is shown below. This will both illustrate and formalise
the above description.
.nf
Grammar <- Spacing Definition+ EndOfFile
Definition <- Identifier LEFTARROW Expression
Expression <- Sequence ( SLASH Sequence )*
Sequence <- Prefix*
Prefix <- AND Action
/ ( AND | NOT )? Suffix
Suffix <- Primary ( QUERY / STAR / PLUS )?
Primary <- Identifier !LEFTARROW
/ OPEN Expression CLOSE
/ Literal
/ Class
/ DOT
/ Action
/ BEGIN
/ END
Identifier <- < IdentStart IdentCont* > Spacing
IdentStart <- [a-zA-Z_]
IdentCont <- IdentStart / [0-9]
Literal <- ['] < ( !['] Char )* > ['] Spacing
/ ["] < ( !["] Char )* > ["] Spacing
Class <- '[' < ( !']' Range )* > ']' Spacing
Range <- Char '-' Char / Char
Char <- '\\\\' [abefnrtv'"\\[\\]\\\\]
/ '\\\\' [0-3][0-7][0-7]
/ '\\\\' [0-7][0-7]?
/ '\\\\' '-'
/ !'\\\\' .
LEFTARROW <- '<-' Spacing
SLASH <- '/' Spacing
AND <- '&' Spacing
NOT <- '!' Spacing
QUERY <- '?' Spacing
STAR <- '*' Spacing
PLUS <- '+' Spacing
OPEN <- '(' Spacing
CLOSE <- ')' Spacing
DOT <- '.' Spacing
Spacing <- ( Space / Comment )*
Comment <- '#' ( !EndOfLine . )* EndOfLine
Space <- ' ' / '\\t' / EndOfLine
EndOfLine <- '\\r\\n' / '\\n' / '\\r'
EndOfFile <- !.
Action <- '{' < [^}]* > '}' Spacing
BEGIN <- '<' Spacing
END <- '>' Spacing
.fi
.SH LEG GRAMMARS
.I leg
is a variant of
.I peg
that adds some features of
.IR lex (1)
and
.IR yacc (1).
It differs from
.I peg
in the following ways.
.TP
.BI %{\ text... \ %}
A declaration section can appear anywhere that a rule definition is
expected. The
.I text
between the delimiters '%{' and '%}' is copied verbatim to the
generated C parser code
.I before
the code that implements the parser itself.
.TP
.IB name\ = \ pattern
The 'assignment' operator replaces the left arrow operator '<-'.
.TP
.B rule-name
Hyphens can appear as letters in the names of rules. Each hyphen is
converted into an underscore in the generated C source code. A single
single hyphen '-' is a legal rule name.
.nf
- = [ \\t\\n\\r]*
number = [0-9]+ -
name = [a-zA-Z_][a-zA_Z_0-9]* -
l-paren = '(' -
r-paren = ')' -
.fi
This example shows how ignored whitespace can be obvious when reading
the grammar and yet unobtrusive when placed liberally at the end of
every rule associated with a lexical element.
.TP
.IB seq-1\ | \ seq-2
The alternation operator is vertical bar '|' rather than forward
slash '/'. The
.I peg
rule
.nf
name <- sequence-1
/ sequence-2
/ sequence-3
.fi
is therefore written
.nf
name = sequence-1
| sequence-2
| sequence-3
;
.fi
in
.I leg
(with the final semicolon being optional, as described next).
.TP
.IB pattern\ ;
A semicolon punctuator can optionally terminate a
.IR pattern .
.TP
.BI %% \ text...
A double percent '%%' terminates the rules (and declarations) section of
the grammar. All
.I text
following '%%' is copied verbatim to the generated C parser code
.I after
the parser implementation code.
.TP
.BI $$\ = \ value
A sub-rule can return a semantic
.I value
from an action by assigning it to the pseudo-variable '$$'. All
semantic values must have the same type (which defaults to 'int').
This type can be changed by defining YYSTYPE in a declaration section.
.TP
.IB identifier : name
The semantic value returned (by assigning to '$$') from the sub-rule
.I name
is associated with the
.I identifier
and can be referred to in subsequent actions.
.PP
The desk calclator example below illustrates the use of '$$' and ':'.
.SH LEG EXAMPLE: A DESK CALCULATOR
The extensions in
.I leg
described above allow useful parsers and evaluators (including
declarations, grammar rules, and supporting C functions such
as 'main') to be kept within a single source file. To illustrate this
we show a simple desk calculator supporting the four common arithmetic
operators and named variables. The intermediate results of arithmetic
evaluation will be accumulated on an implicit stack by returning them
as semantic values from sub-rules.
.nf
%{
#include <stdio.h> /* printf() */
#include <stdlib.h> /* atoi() */
int vars[26];
%}
Stmt = - e:Expr EOL { printf("%d\\n", e); }
| ( !EOL . )* EOL { printf("error\\n"); }
Expr = i:ID ASSIGN s:Sum { $$ = vars[i] = s; }
| s:Sum { $$ = s; }
Sum = l:Product
( PLUS r:Product { l += r; }
| MINUS r:Product { l -= r; }
)* { $$ = l; }
Product = l:Value
( TIMES r:Value { l *= r; }
| DIVIDE r:Value { l /= r; }
)* { $$ = l; }
Value = i:NUMBER { $$ = atoi(yytext); }
| i:ID !ASSIGN { $$ = vars[i]; }
| OPEN i:Expr CLOSE { $$ = i; }
NUMBER = < [0-9]+ > - { $$ = atoi(yytext); }
ID = < [a-z] > - { $$ = yytext[0] - 'a'; }
ASSIGN = '=' -
PLUS = '+' -
MINUS = '-' -
TIMES = '*' -
DIVIDE = '/' -
OPEN = '(' -
CLOSE = ')' -
- = [ \\t]*
EOL = '\\n' | '\\r\\n' | '\\r' | ';'
%%
int main()
{
while (yyparse())
;
return 0;
}
.fi
.SH LEG GRAMMAR FOR LEG GRAMMARS
The grammar for
.I leg
grammars is shown below. This will both illustrate and formalise the
above description.
.nf
grammar = -
( declaration | definition )+
trailer? end-of-file
declaration = '%{' < ( !'%}' . )* > RPERCENT
trailer = '%%' < .* >
definition = identifier EQUAL expression SEMICOLON?
expression = sequence ( BAR sequence )*
sequence = prefix+
prefix = AND action
| ( AND | NOT )? suffix
suffix = primary ( QUERY | STAR | PLUS )?
primary = identifier COLON identifier !EQUAL
| identifier !EQUAL
| OPEN expression CLOSE
| literal
| class
| DOT
| action
| BEGIN
| END
identifier = < [-a-zA-Z_][-a-zA-Z_0-9]* > -
literal = ['] < ( !['] char )* > ['] -
| ["] < ( !["] char )* > ["] -
class = '[' < ( !']' range )* > ']' -
range = char '-' char | char
char = '\\\\' [abefnrtv'"\\[\\]\\\\]
| '\\\\' [0-3][0-7][0-7]
| '\\\\' [0-7][0-7]?
| !'\\\\' .
action = '{' < [^}]* > '}' -
EQUAL = '=' -
COLON = ':' -
SEMICOLON = ';' -
BAR = '|' -
AND = '&' -
NOT = '!' -
QUERY = '?' -
STAR = '*' -
PLUS = '+' -
OPEN = '(' -
CLOSE = ')' -
DOT = '.' -
BEGIN = '<' -
END = '>' -
RPERCENT = '%}' -
- = ( space | comment )*
space = ' ' | '\\t' | end-of-line
comment = '#' ( !end-of-line . )* end-of-line
end-of-line = '\\r\\n' | '\\n' | '\\r'
end-of-file = !.
.fi
.SH CUSTOMISING THE PARSER
The following symbols can be redefined in declaration sections to
modify the generated parser code.
.TP
.B YYSTYPE
The semantic value type. The pseudo-variable '$$' and the
identifiers 'bound' to rule results with the colon operator ':' should
all be considered as being declared to have this type. The default
value is 'int'.
.TP
.B YYPARSE
The name of the main entry point to the parser. The default value
is 'yyparse'.
.TP
.B YYPARSEFROM
The name of an alternative entry point to the parser. This function
expects one argument: the function corresponding to the rule from
which the search for a match should begin. The default
is 'yyparsefrom'. Note that yyparse() is defined as
.nf
int yyparse() { return yyparsefrom(yy_foo); }
.fi
where 'foo' is the name of the first rule in the grammar.
.TP
.BI YY_INPUT( buf , \ result , \ max_size )
This macro is invoked by the parser to obtain more input text.
.I buf
points to an area of memory that can hold at most
.I max_size
characters. The macro should copy input text to
.I buf
and then assign the integer variable
.I result
to indicate the number of characters copied. If no more input is available,
the macro should assign 0 to
.IR result .
By default, the YY_INPUT macro is defined as follows.
.nf
#define YY_INPUT(buf, result, max_size) \\
{ \\
int yyc= getchar(); \\
result= (EOF == yyc) ? 0 : (*(buf)= yyc, 1); \\
}
.fi
.TP
.B YY_DEBUG
If this symbols is defined then additional code will be included in
the parser that prints vast quantities of arcane information to the
standard error while the parser is running.
.TP
.B YY_BEGIN
This macro is invoked to mark the start of input text that will be
made available in actions as 'yytext'. This corresponds to
occurrences of '<' in the grammar. These are converted into
predicates that are expected to succeed. The default definition
.nf
#define YY_BEGIN (yybegin= yypos, 1)
.fi
therefore saves the current input position and returns 1 ('true') as
the result of the predicate.
.TP
.B YY_END
This macros corresponds to '>' in the grammar. Again, it is a
predicate so the default definition saves the input position
before 'succeeding'.
.nf
#define YY_END (yyend= yypos, 1)
.fi
.TP
.BI YY_PARSE( T )
This macro declares the parser entry points (yyparse and yyparsefrom)
to be of type
.IR T .
The default definition
.nf
#define YY_PARSE(T) T
.fi
leaves yyparse() and yyparsefrom() with global visibility. If they
should not be externally visible in other source files, this macro can
be redefined to declare them 'static'.
.nf
#define YY_PARSE(T) static T
.fi
.TP
.BI YY_CTX_LOCAL
If this symbol is defined during compilation of a generated parser
then global parser state will be kept in a structure of
type 'yycontext' which can be declared as a local variable. This
allows multiple instances of parsers to coexist and to be thread-safe.
The parsing function
.IR yyparse ()
will be declared to expect a first argument of type 'yycontext *', an
instance of the structure holding the global state for the parser.
This instance must be allocated and initialised to zero by the client.
A trivial but complete example is as follows.
.nf
#include <stdio.h>
#define YY_CTX_LOCAL
#include "the-generated-parser.peg.c"
int main()
{
yycontext ctx;
memset(&ctx, 0, sizeof(yycontext));
while (yyparse(&ctx));
return 0;
}
.fi
Note that if this symbol is undefined then the compiled parser will
statically allocate its global state and will be neither reentrant nor
thread-safe.
.TP
.BI YY_CTX_MEMBERS
If YY_CTX_LOCAL is defined (see above) then the macro YY_CTX_MEMBERS
can be defined to expand to any additional member field declarations
that the client would like included in the declaration of
the 'yycontext' structure type. These additional members are
otherwise ignored by the generated parser. The instance
of 'yycontext' associated with the currently-active parser is
available in actions through the pointer variable
.IR yyctx .
.PP
The following variables can be reffered to within actions.
.TP
.B char *yybuf
This variable points to the parser's input buffer used to store input
text that has not yet been matched.
.TP
.B int yypos
This is the offset (in yybuf) of the next character to be matched and
consumed.
.TP
.B char *yytext
The most recent matched text delimited by '<' and '>' is stored in this variable.
.TP
.B int yyleng
This variable indicates the number of characters in 'yytext'.
.TP
.B yycontext *yyctx
This variable points to the instance of 'yycontext' associated with
the currently-active parser.
.SH DIAGNOSTICS
.I peg
and
.I leg
warn about the following conditions while converting a grammar into a parser.
.TP
.B syntax error
The input grammar was malformed in some way. The error message will
include the text about to be matched (often backed up a huge amount
from the actual location of the error) and the line number of the most
recently considered character (which is often the real location of the
problem).
.TP
.B rule 'foo' used but not defined
The grammar referred to a rule named 'foo' but no definition for it
was given. Attempting to use the generated parser will likely result
in errors from the linker due to undefined symbols associated with the
missing rule.
.TP
.B rule 'foo' defined but not used
The grammar defined a rule named 'foo' and then ignored it. The code
associated with the rule is included in the generated parser which
will in all other respects be healthy.
.TP
.B possible infinite left recursion in rule 'foo'
There exists at least one path through the grammar that leads from the
rule 'foo' back to (a recursive invocation of) the same rule without
consuming any input.
.PP
Left recursion, especially that found in standards documents, is
often 'direct' and implies trivial repetition.
.nf
# (6.7.6)
direct-abstract-declarator =
LPAREN abstract-declarator RPAREN
| direct-abstract-declarator? LBRACKET assign-expr? RBRACKET
| direct-abstract-declarator? LBRACKET STAR RBRACKET
| direct-abstract-declarator? LPAREN param-type-list? RPAREN
.fi
The recursion can easily be eliminated by converting the parts of the
pattern following the recursion into a repeatable suffix.
.nf
# (6.7.6)
direct-abstract-declarator =
direct-abstract-declarator-head?
direct-abstract-declarator-tail*
direct-abstract-declarator-head =
LPAREN abstract-declarator RPAREN
direct-abstract-declarator-tail =
LBRACKET assign-expr? RBRACKET
| LBRACKET STAR RBRACKET
| LPAREN param-type-list? RPAREN
.fi
.SH BUGS
The 'yy' and 'YY' prefixes cannot be changed.
.PP
Left recursion is detected in the input grammar but is not handled
correctly in the generated parser.
.PP
Diagnostics for errors in the input grammar are obscure and not
particularly helpful.
.PP
Several commonly-used
.IR lex (1)
features (yywrap(), yyin, etc.) are completely absent.
.PP
The generated parser foes not contain '#line' directives to direct C
compiler errors back to the grammar description when appropriate.
.IR lex (1)
features (yywrap(), yyin, etc.) are completely absent.
.SH SEE ALSO
D. Val Schorre,
.I META II, a syntax-oriented compiler writing language,
19th ACM National Conference, 1964, pp.\ 41.301--41.311. Describes a
self-implementing parser generator for analytic grammars with no
backtracking.
.PP
Alexander Birman,
.I The TMG Recognition Schema,
Ph.D. dissertation, Princeton, 1970. A mathematical treatment of the
power and complexity of recursive-descent parsing with backtracking.
.PP
Bryan Ford,
.I Parsing Expression Grammars: A Recognition-Based Syntactic Foundation,
ACM SIGPLAN Symposium on Principles of Programming Languages, 2004.
Defines PEGs and analyses them in relation to context-free and regular
grammars. Introduces the syntax adopted in
.IR peg .
.PP
The standard Unix utilies
.IR lex (1)
and
.IR yacc (1)
which influenced the syntax and features of
.IR leg .
.PP
The source code for
.I peg
and
.I leg
whose grammar parsers are written using themselves.
.PP
The latest version of this software and documentation:
.nf
http://piumarta.com/software/peg
.fi
.SH AUTHOR
.IR peg ,
.I leg
and this manual page were written by Ian Piumarta (first-name at
last-name dot com) while investigating the viablility of regular- and
parsing-expression grammars for efficiently extracting type and
signature information from C header files.
.PP
Please send bug reports and suggestions for improvements to the author
at the above address.