diff --git a/drools-drl/drools-drl10-parser/Developer_Notes.md b/drools-drl/drools-drl10-parser/Developer_Notes.md new file mode 100644 index 00000000000..c521ee6d14f --- /dev/null +++ b/drools-drl/drools-drl10-parser/Developer_Notes.md @@ -0,0 +1,38 @@ +## drools-parser + +This module is a reimplementation of the DRL (Drools Rule Language) parser based on ANTLR4. + +The current [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java) is based on ANTLR3 and contains a lot of custom modifications, which is hard to maintain. This new module should keep the separation between the parser syntax (`DRLParser.g4`) and the Descr generation (`DRLVisitorImpl.java`). + +This module started with a part of LSP to develop DRL editors, but it is not limited to that. This module will also replace DRL6Parser in the drools code base. + +### How is this developed? + +1. The starting point is [DRL6Parser](https://github.com/apache/incubator-kie-drools/blob/main/drools-drl/drools-drl-parser/src/main/java/org/drools/drl/parser/lang/DRL6Parser.java). While it contains lots of customizations, we can map its javadoc (e.g. `packageStatement := PACKAGE qualifiedIdentifier SEMICOLON?`) to `DRLParser.g4` (e.g. `packagedef : PACKAGE name=drlQualifiedName SEMI? ;`). +2. `DRLLexer.g4` is written to define tokens for DRL. +3. `DRLLexer.g4` imports `JavaLexer.g4` to reuse Java tokens. `DRLParser.g4` imports `JavaParser.g4` to reuse Java grammar. These Java parser files are distributed by ANTLR4 under BSD license. +4. In `DRLLexer.g4`, basically define tokens with a prefix "DRL_" to clarify they are DRL keywords. +5. In `DRLParser.g4`, define parser rules with a prefix "drl" if the rule name conflicts with `JavaParser.g4`. Sometimes we need to do that, because such a rule may contain DRL keywords. +6. (As of 2023/10/31) this parser doesn't deeply parse rule RHS (just multiple `RHS_CHUNK`s), because Drools passes RHS text to drools-compiler as-is. In case of developing DRL editors, we may need to integrate another Java LSP to support RHS code completion, etc. +7. LHS constraint (e.g. `age > 30`) is also handled as text. Further processing will be done in the later compiler phase. +8. `DRLParser` processes a DRL text and produces an AST(abstract syntax tree). Then apply `DRLVisitorImpl` to generate PackageDescr following the visitor pattern. So the main work would be implementing `DRLParser.g4` and `DRLVisitorImpl`. +9. Errors are handled by `DRLErrorListener` +10. (As of 2023/10/31) We have 2 test classes. `DRLParserTest` is a very basic test to check if the parser can parse DRL. `MiscDRLParserTest` contains various DRL syntax to check if the parser generates correct Descr objects. `MiscDRLParserTest` was ported from [RuleParserTest](https://github.com/apache/incubator-kie-drools/blob/main/drools-test-coverage/test-compiler-integration/src/test/java/org/drools/mvel/compiler/lang/RuleParserTest.java) so that we can ensure the compatibility of generated Descr objects between the current implementation and the new one. +11. As `DRL6Parser` contains hard-coded customizations, sometimes we need to read and understand the `DRL6Parser` source codes to meet the compatibility. +12. (As of 2023/10/31) `MiscDRLParserTest` still has several test cases with `@Disabled` which are relatively lower priority or edge cases. They need to be resolved at some point in the future. To fix the issues, file a JIRA, remove the `@Disabled` annotation, and fix the implementation to pass the test case. + +### Next steps + +1. Create a feature branch in drools repo and replace `DRL6Parser` with this new parser. +2. We will detect issues in the new parser by running the existing tests in drools repo. If we find any issues, we will fix them in the new parser and add new tests to cover them. Such tests would be more or less Descr comparison tests, so we would add a new test class which is similar to `MiscDRLParserTest`. + +### Refactoring candidates +- `DRLParserHelper` and `DRLParserWrapper` have some duplicated code and purpose. We can merge them into one class. +- `MiscDRLParserTest` can be cleaner and fixed to align with SonarLint suggestions. +- Constraint related parser rules after `conditionalOrExpression` are written in antlr3 style. They could be refactored to antlr4 style (like `lhsExpression`). + +### Development tips +- IntelliJ IDEA has an ANTLR4 plugin, which "ANTLR Preview" window displays a parse tree. It is very useful to debug the parser rules. + +### Resources +[The Definitive ANTLR 4 Reference](https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-reference/) \ No newline at end of file diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java index a15d12adddb..f7c6504df4c 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLErrorListener.java @@ -7,6 +7,9 @@ import org.antlr.v4.runtime.RecognitionException; import org.antlr.v4.runtime.Recognizer; +/** + * Collect errors while parsing DRL + */ public class DRLErrorListener extends BaseErrorListener { private final List errors = new ArrayList<>(); diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java index 6b9fe116943..ec3e41747df 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserError.java @@ -1,5 +1,8 @@ package org.drools.parser; +/** + * Error information while parsing DRL + */ public class DRLParserError { private int lineNumber; diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java index 3bd15b3f4fa..bee7a6e96d4 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserHelper.java @@ -10,11 +10,18 @@ import org.antlr.v4.runtime.tree.TerminalNode; import org.drools.drl.ast.descr.PackageDescr; +/** + * Collection of static helper methods for DRLParser + */ public class DRLParserHelper { private DRLParserHelper() { } + /** + * Entry point for parsing DRL. + * Unlike DRLParserWrapper.parse(), this method does not collect errors. + */ public static PackageDescr parse(String drl) { DRLParser drlParser = createDrlParser(drl); return compilationUnitContext2PackageDescr(drlParser.compilationUnit(), drlParser.getTokenStream()); @@ -27,6 +34,9 @@ public static DRLParser createDrlParser(String drl) { return new DRLParser(commonTokenStream); } + /** + * DRLVisitorImpl visits a parse tree and creates a PackageDescr + */ public static PackageDescr compilationUnitContext2PackageDescr(DRLParser.CompilationUnitContext ctx, TokenStream tokenStream) { DRLVisitorImpl visitor = new DRLVisitorImpl(tokenStream); Object descr = visitor.visit(ctx); @@ -37,6 +47,9 @@ public static PackageDescr compilationUnitContext2PackageDescr(DRLParser.Compila } } + /** + * Given a row and column of the input DRL, return the index of the matched token + */ public static Integer computeTokenIndex(DRLParser parser, int row, int col) { for (int i = 0; i < parser.getInputStream().size(); i++) { Token token = parser.getInputStream().get(i); diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java index 06992f0440e..8f24bf1d3f4 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLParserWrapper.java @@ -10,12 +10,18 @@ import static org.drools.parser.DRLParserHelper.compilationUnitContext2PackageDescr; +/** + * Wrapper for DRLParser. Somewhat duplicated from DRLParserHelper, but this class is instantiated and holds errors. + */ public class DRLParserWrapper { private static final Logger LOGGER = LoggerFactory.getLogger(DRLParserWrapper.class); private final List errors = new ArrayList<>(); + /** + * Main entry point for parsing DRL + */ public PackageDescr parse(String drl) { DRLParser drlParser = DRLParserHelper.createDrlParser(drl); DRLErrorListener errorListener = new DRLErrorListener(); diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java index 8212ac32bee..3f920cc13a5 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/DRLVisitorImpl.java @@ -48,6 +48,12 @@ import static org.drools.parser.ParserStringUtils.trimThen; import static org.drools.util.StringUtils.unescapeJava; +/** + * Visitor implementation for DRLParser. + * Basically, each visit method creates and returns a Descr object traversing the parse tree. + * Finally, visitCompilationUnit() returns a PackageDescr object. + * Try not to depend on DRLVisitorImpl's internal state for clean maintainability + */ public class DRLVisitorImpl extends DRLParserBaseVisitor { private final TokenStream tokenStream; @@ -56,6 +62,9 @@ public DRLVisitorImpl(TokenStream tokenStream) { this.tokenStream = tokenStream; } + /** + * Main entry point for creating PackageDescr from a parser tree. + */ @Override public PackageDescr visitCompilationUnit(DRLParser.CompilationUnitContext ctx) { PackageDescr packageDescr = new PackageDescr(); @@ -67,7 +76,11 @@ public PackageDescr visitCompilationUnit(DRLParser.CompilationUnitContext ctx) { return packageDescr; } + /** + * Add all children Descr to PackageDescr + */ private void applyChildrenDescrs(PackageDescr packageDescr, List descrList) { + // This bunch of if-blocks will be refactored by DROOLS-7564 descrList.forEach(descr -> { if (descr instanceof UnitDescr) { packageDescr.setUnit((UnitDescr) descr); @@ -147,6 +160,8 @@ public FunctionDescr visitFunctiondef(DRLParser.FunctiondefContext ctx) { functionDescr.setReturnType("void"); } functionDescr.setName(ctx.IDENTIFIER().getText()); + + // add function parameters DRLParser.FormalParametersContext formalParametersContext = ctx.formalParameters(); DRLParser.FormalParameterListContext formalParameterListContext = formalParametersContext.formalParameterList(); if (formalParameterListContext != null) { @@ -163,7 +178,7 @@ public FunctionDescr visitFunctiondef(DRLParser.FunctiondefContext ctx) { @Override public BaseDescr visitDeclaredef(DRLParser.DeclaredefContext ctx) { - return visitDescrChildren(ctx).get(0); + return visitDescrChildren(ctx).get(0); // only one child } @Override @@ -202,6 +217,9 @@ public WindowDeclarationDescr visitWindowDeclaration(DRLParser.WindowDeclaration return windowDeclarationDescr; } + /** + * entry point for one rule + */ @Override public RuleDescr visitRuledef(DRLParser.RuledefContext ctx) { RuleDescr ruleDescr = new RuleDescr(safeStripStringDelimiters(ctx.name.getText())); @@ -228,16 +246,27 @@ public RuleDescr visitRuledef(DRLParser.RuledefContext ctx) { if (ctx.rhs() != null) { ruleDescr.setConsequenceLocation(ctx.rhs().getStart().getLine(), ctx.rhs().getStart().getCharPositionInLine()); // location of "then" - ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs()))); + ruleDescr.setConsequence(trimThen(getTextPreservingWhitespace(ctx.rhs()))); // RHS is just a text } return ruleDescr; } private void slimLhsRootDescr(AndDescr root) { + // Root Descr is always AndDescr. + // For example, if there are nested AndDescr like + // AndDescr + // /\ + // P AndDescr + // /\ + // P P + // is slimmed down to + // AndDescr + // / | \ + // P P P List descrList = new ArrayList<>(root.getDescrs()); root.getDescrs().clear(); - descrList.forEach(root::addOrMerge); // This slims down nested AndDescr + descrList.forEach(root::addOrMerge); } @Override @@ -373,6 +402,9 @@ public AttributeDescr visitDurationAttribute(DRLParser.DurationAttributeContext return attributeDescr; } + /** + * entry point for LHS + */ @Override public List visitLhs(DRLParser.LhsContext ctx) { if (ctx.lhsExpression() != null) { @@ -394,10 +426,12 @@ public BaseDescr visitLhsPatternBind(DRLParser.LhsPatternBindContext ctx) { } private PatternDescr getSinglePatternDescr(DRLParser.LhsPatternBindContext ctx) { - Optional optPatternDescr = visitFirstDescrChild(ctx); - PatternDescr patternDescr = optPatternDescr.filter(PatternDescr.class::isInstance) - .map(PatternDescr.class::cast) - .orElseThrow(() -> new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText())); + List patternDescrList = visitDescrChildren(ctx); + if (patternDescrList.isEmpty() || !(patternDescrList.get(0) instanceof PatternDescr)) { + throw new IllegalStateException("lhsPatternBind must have at least one lhsPattern : " + ctx.getText()); + } + PatternDescr patternDescr = (PatternDescr)patternDescrList.get(0); + if (ctx.label() != null) { patternDescr.setIdentifier(ctx.label().IDENTIFIER().getText()); } else if (ctx.unif() != null) { @@ -423,6 +457,9 @@ private OrDescr getOrDescrWithMultiplePatternDescr(DRLParser.LhsPatternBindConte return orDescr; } + /** + * entry point for a Pattern + */ @Override public PatternDescr visitLhsPattern(DRLParser.LhsPatternContext ctx) { PatternDescr patternDescr = new PatternDescr(ctx.objectType.getText()); @@ -528,6 +565,9 @@ public EntryPointDescr visitFromEntryPoint(DRLParser.FromEntryPointContext ctx) return new EntryPointDescr(safeStripStringDelimiters(ctx.stringId().getText())); } + /** + * Collect constraints in a Pattern + */ @Override public List visitConstraints(DRLParser.ConstraintsContext ctx) { List exprConstraintDescrList = new ArrayList<>(); @@ -535,6 +575,9 @@ public List visitConstraints(DRLParser.ConstraintsContext c return exprConstraintDescrList; } + /** + * Collect constraints in a Pattern. Positional constraints comes first with semicolon. + */ private List visitConstraints(DRLParser.PositionalConstraintsContext positionalCtx, DRLParser.ConstraintsContext ctx) { List exprConstraintDescrList = new ArrayList<>(); populateExprConstraintDescrList(positionalCtx, exprConstraintDescrList); @@ -557,6 +600,9 @@ private void populateExprConstraintDescrList(ParserRuleContext ctx, List visitDescrChildren(RuleNode node) { List aggregator = new ArrayList<>(); int n = node.getChildCount(); @@ -768,21 +819,10 @@ public ParseTree getNode() { } } - // leaves of constraint concatenate return Strings + /** + * Return the text of constraint as-is + */ private String visitConstraintChildren(ParserRuleContext ctx) { return getTokenTextPreservingWhitespace(ctx, tokenStream); } - - private Optional visitFirstDescrChild(RuleNode node) { - int n = node.getChildCount(); - - for (int i = 0; i < n && this.shouldVisitNextChild(node, null); ++i) { - ParseTree c = node.getChild(i); - Object childResult = c.accept(this); - if (childResult instanceof BaseDescr) { - return Optional.of((BaseDescr) childResult); - } - } - return Optional.empty(); - } -} +} \ No newline at end of file diff --git a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java index 1f84eb6f9c8..a294bf2a923 100644 --- a/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java +++ b/drools-drl/drools-drl10-parser/src/main/java/org/drools/parser/ParserStringUtils.java @@ -5,13 +5,17 @@ import org.antlr.v4.runtime.misc.Interval; /** - * will be merged in drools-util + * Collection of String utilities used by DRLParser. + * This may be merged in drools-util */ public class ParserStringUtils { private ParserStringUtils() { } + /** + * Strip string delimiters (e.g. "foo" -> foo) + */ public static String safeStripStringDelimiters(String value) { if (value != null) { value = value.trim(); @@ -22,6 +26,9 @@ public static String safeStripStringDelimiters(String value) { return value; } + /** + * Get text from ParserRuleContext's CharStream without trimming whitespace + */ public static String getTextPreservingWhitespace(ParserRuleContext ctx) { // Using raw CharStream int startIndex = ctx.start.getStartIndex(); @@ -34,11 +41,18 @@ public static String getTextPreservingWhitespace(ParserRuleContext ctx) { return ctx.start.getTokenSource().getInputStream().getText(interval); } + /** + * Get text from ParserRuleContext's CharStream without trimming whitespace + * tokenStream is required to get hidden channel token (e.g. whitespace). + * Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString + */ public static String getTokenTextPreservingWhitespace(ParserRuleContext ctx, TokenStream tokenStream) { - // tokenStream is required to get hidden channel token (e.g. whitespace). Unlike getTextPreservingWhitespace, this method reflects Lexer normalizeString return tokenStream.getText(ctx.start, ctx.stop); } + /** + * Just remove leading "then" + */ public static String trimThen(String rhs) { if (rhs.startsWith("then")) { return rhs.substring("then".length());