Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Latest commit

 

History

History
68 lines (42 loc) · 4.46 KB

CONTRIBUTING.md

File metadata and controls

68 lines (42 loc) · 4.46 KB

Introduction

These guidelines are intended to help developers contribute to the grammar and preserve consistency across the project while doing so.

Contributions needed

The main objective of this project is to support all versions of the Java language. This means addressing issues with the "todo" and "known error" labels, assigning yourself to a problem that you're working on so that work isn't duplicated. We're also open to documentation improvements and feature enhancements and recommend filing an issue.

Java Grammar Development Guide

General grammar structure

Language constructs are grouped into the top-level categories denoting declarations, statements within methods and expressions. All granular constructs feed into those through the defined grammar hierarchy.

rules: {
    program: $ => repeat($._statement),

    _statement: $ => prec(1, choice(
      $._expression_statement,
      $._declaration,
      $._method_statement
    )),

Deviating from the language spec

The grammar.js file follows the BNF grammar outlined in the Java Language Specification.

There are situations where we've deviated from the spec:

  • Prefered naming: if common developer parlance prefers a naming convention other than the spec, we tend to deviate. An example of this is for generic_type as the outer wrapper for type_arguments, since generics are a familiar Java programming concept.
  • Simplicity: The spec is convoluted and not conducive to compact, readable code. In this situation, we've preferred structuring things in a way that are more reusable throughout the grammar and also read clearly. An example of this is our preference to use binary_ and unary_ expressions to model relationships between operators, as opposed to supporting the spec's ConditionalExpression hierarchy.

When it's okay to parse invalid Java

There are situations in which we parse invalid code to support end-user experiences. For example, it's important to ensure syntax-highlighting doesn't break down for a snippet of Java code in a markdown file. For this reason, we currently allow expressions to be parsed outside of methods, even though that is not valid Java.

To know what is "valid enough", consider what good documentation would look like:

  • int x = (1 + 2); = This is invalid since it is not within a method, but still comprehensible. Parse this.
  • int x = (1 + ) =; This is not only invalid Java, but it is invalid logic. It wouldn't make sense in documentation. Don't parse this.

Running your code using something like JavaRepl is also a good way to verify the correctness of the input program.

Adding unit tests

The recommendation is to be comprehensive in adding tests. If it's a visible node, add it to a /corpus/ test file. It's typically a good idea to test as many permutations of a particular language construct as possible. This increases test coverage, but doubly acquaints readers with a way to examine expected outputs and understand the "edges" of a language.

Testing on external repos

Three of the "most popular" Java repositories have been cloned into the project under the /examples directory (where popularity is defined by repositories that are most starred and have highest number of active contributers within the last month). Parsing these repos allows us to gauge how well our grammar performs at parsing "real world" Java.

To test:

  • ./script/parse-examples runs the tests and outputs them to known-errors.txt, representing the files that have any errors or MISSING ; flags.
  • The goal is to drive down the errors in known-errors.txt to 0.
  • known-errors.txt allows you to find erroring files and parse them individually to diagnose and debug errors.

Testing with other parsers

It's worth consulting other LR Java parsers (such as JavaParser) to guide your own grammar development. Comparing tree structure and naming can provide valuable insight into what is usable.

References