-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathparse_tree_representation
59 lines (43 loc) · 2.91 KB
/
parse_tree_representation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Updated: 2010-03-29 for version 0.0.5
A successful parse produces an array of integers representing a stream of events.
This format is documented here.
This format can be processed directly, or it can be used to construct a parse tree which can be displayed or manipulated as desired.
Most users will pass the parse result to an API such as showResult or traverseResult, and should never need to know the details documented here.
The format supports streaming output from streaming parsers, in which case a parse tree node may be opened in one chunk of output and closed in a later chunk.
The only difference in the streaming parser case is that the parser will return the output in chunks as a series of arrays, rather than as a single array representing the entire parse tree.
The sequence of integers making up the parse tree format represent three different kinds of events: opening a named node, closing a named node, and anonymous nodes, which are both opened and closed at the same time.
A named node corresponds to a match of one of the rules appearing in the PEG.
Each rule is identified by the 1-based index of the position in the PEG where the rule appears.
The parser generator also produces an array which can be used to correlate indices with rule names.
A rule is opened when the corresponding index appears in the array.
A rule is closed by a -1 appearing in the array, followed by the length of the match.
Anonymous nodes are represented as a -2 followed by a length.
Example:
Here is a complete grammar, sample input, the array returned by a parser, and the pretty-printed parse tree:
Expr ← Add
Add ← Mult ( S? "+" S? Mult )*
Mult ← Num ( S? "*" S? Num )*
Num ← [0-9]+
S ← " "+
"4 * 3 + 2"
[1,2,3,4,-2,1,5,-2,1,-1,1,5,-2,1,4,-2,1,-2,5,5,-2,1,-1,1,5,-2,1,3,4,-2,1,-2,1,-2,9,-2,9]
Expr 0-9 "4 * 3 + 2"
Add 0-9 "4 * 3 + 2"
Mult 0-5 "4 * 3"
Num 0-1 "4"
S 1-2 " "
anonymous 2-3 "*"
S 3-4 " "
Num 4-5 "3"
S 5-6 " "
anonymous 6-7 "+"
S 7-8 " "
Mult 8-9 "2"
Num 8-9 "2"
Looking at the beginning of the array, we have [1,2,3,4,-2,1], corresponding to the first five events in the parse: first the 1, 2, 3, and 4 serve to open the Expr, the Add, the Mult, and the Num nodes (which are rules 1-4 respectively in the grammar) and then -2,1 closes the Num node with length 1.
After the S node is both opened and closed (by 5,-2,1), there is an anonymous node of length one (-1,1) corresponding to the asterisk in the Mult rule.
Anonymous Nodes
---------------
The reason for anonymous nodes in the output format is that node positions are relative, determined by the length of sibling nodes.
When interpreting the results it is necessary to keep track of the position in the input while reading the events.
As an example, in the grammar above, without the anonymous node matching the "*" in the Mult rule, there would be no way to tell that the S match which follows it begins at position 3 rather than position 2.