-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCypher Antlr - strange parse tree (c#) #515
Comments
Hi Piotr, Thank you for your question and for engaging with the openCypher artifacts. Unfortunately, I cannot really get a handle on your question. Could be please be more precise? What is precisely the mess? In which respect you find things messy? What is the picture suppose to show? Which parser generator framework are you using Antlr, I guess? Which query are you trying to parse? Can you try to reduce the problem to a very small instance (very short and simple query), where you start to get "the mess". Sorry for all the questions, but your post does not provide me enough context to do anything helpful with it. Cheers |
Hi Hannes,
I am starting to get "the mess" in the Expression Evaluation.
For convenience, I attached a graphical representation of all elements of the parse tree diagram As you can see, current interpretation of a parse tree is incomplete and incorrect. Especially this part seems to defy all logic, how to determine if this is any of these expression? The only way I see this parsed is to write Template-like structure to see at the children of every context and somehow take into account every edge case
Here's a sourcecode of ExpressionAnalyzer - It analyzes the expression : (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Cypher/Analyzers/ExpressionAnalyzer.cs) Here's a method that I used to generate text parse tree: (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Cypher/Utility/ParseTreeHelper.cs) Here's a code that I used to generate the Graphical Representation of a parse tree: (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Explorer/Utility/DebugParseTreeVisual.xaml.cs) Here's a working implementation of PatternPart that is working correctly (notice the similar way that I try to parse the oC_PatternElement: (https://github.com/pmikstacki/SliccDB/blob/master/SliccDB.Cypher/Analyzers/PatternPartAnalyzer.cs) I hope that the details above are sufficient. |
Hi Piotr, thanks for your detailed reply. I understand you are confused why a parser gives you for The answer to this is that two-fold. Firstly, the railroad diagrams are not the ground truth of the grammar, but merely a more developer-manual-like representation of the grammar that gets generator out of the grammar xml files. The g4 antlr grammar also gets generated from the same xml files. These xml files are the ground truth of the grammar. For (subjectively) better readability, the railroad diagrams inline certain grammar productions into a single picture. This is again encoded in the grammar xml files. Productions that have the attribute Secondly, let's look at the "true" grammar as encoded in the xml files. The production <production name="Expression">
<non-terminal ref="OrExpression"/>
</production>
<production name="OrExpression" rr:inline="true">
<non-terminal ref="XorExpression"/>
<repeat>&SP; OR &SP;<non-terminal ref="XorExpression"/></repeat>
</production>
<production name="XorExpression" rr:inline="true">
<non-terminal ref="AndExpression"/>
<repeat>&SP; XOR &SP;<non-terminal ref="AndExpression"/></repeat>
</production>
<production name="AndExpression" rr:inline="true">
<non-terminal ref="NotExpression"/>
<repeat>&SP; AND &SP;<non-terminal ref="NotExpression"/></repeat>
</production>
<production name="NotExpression" rr:inline="true">
<repeat>NOT &WS;</repeat>
<non-terminal ref="ComparisonExpression"/>
</production> They all follow the same pattern. Every production allows the syntax to repeat the same operation zero or more time, while the respective operands are expression of operand that is next in order of increasing precedence. This allows to nest operations of higher precedence into operation of lower precedence, so that you can write Note also that each production allows zero repetitions of the operation it encodes, where zero repetition means the operation does not appear at all. The effect is that an expression can "drop through" the operations it does not need. Or more specifically, it allows that an This pattern continues through the whole stack of expressions and ends in the production called This recursive pattern of "drop through" production rules is the usual way expression grammar looks like if they encode the precedence. Some grammars have recursion in each production itself (instead of repetition), to encode operator associativity in the grammar. The oC grammar does not do that. Some grammars have multiple such stacks of production with the grammar also encode value type rules (e.g. you can not When consuming the parsing result, the effect of this grammar design is that you see also expressions in the parse tree. because the parse put a node in the parse tree for each production it uses. This is totally normal and expected. Many implementations have some kind of normalization and simplification step that turn the parse tree into an intermediate query representation (IR) which is usually also a tree and is feed into the query processor. This IR would usually have all "drop through" productions be removed, because they are a pure parsing artifacts. It is easy to detect, for instance, if an Hope that helps. Best |
Hi!
I have generated cypher lexer and parser using grammar on the openCypher Resources page. However when I enter Expression Context it parses into this mess
is it supposed to be that way? When I look at railroad diagram it looks like they're not in parent child relationship, they are in sibling-sibling relationship. Or am I getting it wrong?
Best Regards,
Piotr Mikstacki
The text was updated successfully, but these errors were encountered: