Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outstanding Grammar Issues #3924

Open
17 tasks
vmg opened this issue Nov 30, 2017 · 40 comments
Open
17 tasks

Outstanding Grammar Issues #3924

vmg opened this issue Nov 30, 2017 · 40 comments

Comments

@vmg
Copy link
Contributor

vmg commented Nov 30, 2017

The following is a detailed list of all the outstanding issues in the grammars that GitHub.com uses for syntax highlighting the code in our website.

These issues are detected by our grammars compiler (#3915) and are probably causing minor rendering bugs in the website.

Help is very much welcome! If you're seeing bugs or rendering issues in your source code in GitHub, please start by taking a look at this list to make sure we're not detecting any issues in your language's grammar.

Feel free to ask any questions about any given issue and what would be the appropriate way to fix it. I'll keep the issue up-to-date as I work through grammar fixes myself.

cc @github/linguist @pchaigno @Alhadis


Last updated: 28 Nov 2024

  • repository vendor/grammars/MATLAB-Language-grammar (from https://github.com/mathworks/MATLAB-Language-grammar) (16 errors)

    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=([,;])(?![^(]*...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=[,;](?![^(]*\)...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=([,;])(?![^(]*...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=([,;])(?![^(]*...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=([,;])(?![^(]*...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=[,;](?![^(]*\)...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=([,;])(?![^(]*...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<!\.{3}.*)(?:(?=[,;](?![^(]*\)...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(\}|(?<!\.{3}.*)\n)": lookbehind assertion is not fixed length (at offset 15))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(\)|(?<!\.{3}.*)\n)": lookbehind assertion is not fixed length (at offset 15))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?=;|,|(?<!(?:\.{3}.*))\n|%)": lookbehind assertion is not fixed length (at offset 22))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(\))?[^\S\n]*(?=;|,|(?<!(?:\.{3}...": lookbehind assertion is not fixed length (at offset 35))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(\)|(?<!\.{3}.*)\n)": lookbehind assertion is not fixed length (at offset 15))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(\)|(?<!\.{3}.*)\n)": lookbehind assertion is not fixed length (at offset 15))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(\)|(?<!\.{3}.*)\n)": lookbehind assertion is not fixed length (at offset 15))
    • Invalid regex in grammar: source.matlab (in Matlab.tmbundle/Syntaxes/MATLAB.tmLanguage) contains a malformed regex (regex "(?<=\))|(?>(?<!\.{3}.*)\n)": lookbehind assertion is not fixed length (at offset 22))
  • repository vendor/grammars/TypeScript-TmLanguage (from https://github.com/Microsoft/TypeScript-TmLanguage) (2 errors)

    • Invalid regex in grammar: source.ts (in TypeScript.tmLanguage) contains a malformed regex (regex "(?!(?<![_$[:alnum:]])(?:(?<=\.\....": lookbehind assertion is not fixed length (at offset 731))
    • Invalid regex in grammar: source.tsx (in TypeScriptReact.tmLanguage) contains a malformed regex (regex "(?!(?<![_$[:alnum:]])(?:(?<=\.\....": lookbehind assertion is not fixed length (at offset 731))
  • repository vendor/grammars/abl-tmlanguage (from https://github.com/chriscamicas/abl-tmlanguage) (2 errors)

    • Invalid regex in grammar: source.abl (in abl.tmLanguage.json) contains a malformed regex (regex "(?i)(?<=\s+|^)(for|preselect)[\s...": lookbehind assertion is not fixed length (at offset 11))
    • Invalid regex in grammar: source.abl (in abl.tmLanguage.json) contains a malformed regex (regex "(?i)(?<=^|\s*)(today|now)(?!\w|-...": lookbehind assertion is not fixed length (at offset 13))
  • repository vendor/grammars/atom-language-julia (from https://github.com/JuliaEditorSupport/atom-language-julia) (1 errors)

    • Invalid regex in grammar: source.julia (in grammars/julia.cson) contains a malformed regex (regex "(?<=\S\s+)\b(as)\b(?=\s+\S)": lookbehind assertion is not fixed length (at offset 9))
  • repository vendor/grammars/c.tmbundle (from https://github.com/textmate/c.tmbundle) (4 errors)

    • Invalid regex in grammar: source.c.platform (in Syntaxes/Platform.tmLanguage) contains a malformed regex (regex "\b(?:A(?:APNot(?:CreatedErr|Foun...": definition too long (282084 bytes))
    • Invalid regex in grammar: source.c.platform (in Syntaxes/Platform.tmLanguage) contains a malformed regex (regex "\b(?:A(?:E(?:A(?:ddressDesc|rray...": definition too long (52248 bytes))
    • Invalid regex in grammar: source.c.platform (in Syntaxes/Platform.tmLanguage) contains a malformed regex (regex "\b(?:CATransform3DIdentity|KERNE...": definition too long (33340 bytes))
    • Invalid regex in grammar: source.c.platform (in Syntaxes/Platform.tmLanguage) contains a malformed regex (regex "(\s*)(\b(?:A(?:E(?:Build(?:Apple...": definition too long (58589 bytes))
  • repository vendor/grammars/csharp-tmLanguage (from https://github.com/dotnet/csharp-tmLanguage) (4 errors)

    • Invalid regex in grammar: source.cs (in grammars/csharp.tmLanguage) contains a malformed regex (regex "(?<!\.\s*)\b(await)\b": lookbehind assertion is not fixed length (at offset 9))
    • Invalid regex in grammar: source.cs (in grammars/csharp.tmLanguage) contains a malformed regex (regex "\G(?=(?~\*/)$)": unrecognized character after (? or (?- (at offset 7))
    • Invalid regex in grammar: source.cs (in grammars/csharp.tmLanguage) contains a malformed regex (regex "^(\s*+)(\*(?!/))?(?=(?~\*/)$)": unrecognized character after (? or (?- (at offset 22))
    • Invalid regex in grammar: source.cs (in grammars/csharp.tmLanguage) contains a malformed regex (regex "(?<!\.\s*)\b(await)\b": lookbehind assertion is not fixed length (at offset 9))
  • repository vendor/grammars/gap-tmbundle (from https://github.com/dhowden/gap-tmbundle) (3 errors)

    • Invalid regex in grammar: source.gap (in Syntaxes/GAP.tmLanguage) contains a malformed regex (regex "\b(16Bits_AssocWord|16Bits_Depth...": definition too long (65523 bytes))
    • Invalid regex in grammar: source.gap (in Syntaxes/GAP.tmLanguage) contains a malformed regex (regex "\b(IndicesChiefNormalSteps|Indic...": definition too long (65529 bytes))
    • Invalid regex in grammar: source.gap (in Syntaxes/GAP.tmLanguage) contains a malformed regex (regex "\b(SMTX_GoodElementGModule|SMTX_...": definition too long (42470 bytes))
  • repository vendor/grammars/godot-vscode-plugin (from https://github.com/godotengine/godot-vscode-plugin) (1 errors)

    • Invalid regex in grammar: source.gdscript (in syntaxes/GDScript.tmLanguage.json) contains a malformed regex (regex "(?<!/\s*)(\$\s*|%|\$%\s*)(/\s*)?...": lookbehind assertion is not fixed length (at offset 8))
  • repository vendor/grammars/linter-lilypond (from https://github.com/nwhetsell/linter-lilypond) (1 errors)

    • Invalid regex in grammar: source.lilypond (in grammars/lilypond.cson) contains a malformed regex (regex "(?<!-)\b(!=|\*(?:(?:location|par...": definition too long (35491 bytes))
  • repository vendor/grammars/mathematica-tmbundle (from https://github.com/shadanan/mathematica-tmbundle) (1 errors)

    • Invalid regex in grammar: source.mathematica (in Syntaxes/Mathematica.tmLanguage) contains a malformed regex (regex "(\b|(?<=_))(Abort|AbortKernels|A...": definition too long (54020 bytes))
  • repository vendor/grammars/nu-grammar (from https://github.com/hustcer/nu-grammar.git) (1 errors)

    • Invalid regex in grammar: source.nushell (in grammars/tmLanguage.json) contains a malformed regex (regex "(?<=]\s*)(:)\s+(\[)": lookbehind assertion is not fixed length (at offset 8))
  • repository vendor/grammars/objective-c.tmbundle (from https://github.com/textmate/objective-c.tmbundle) (2 errors)

    • Invalid regex in grammar: source.objc.platform (in Syntaxes/Platform.tmLanguage) contains a malformed regex (regex "\b(?:AB(?:AddRecordsError|Multip...": definition too long (32854 bytes))
    • Invalid regex in grammar: source.objc.platform (in Syntaxes/Platform.tmLanguage) contains a malformed regex (regex "\b(?:A(?:M(?:Action(?:A(?:pplica...": definition too long (44404 bytes))
  • repository vendor/grammars/sublime-autoit (from https://github.com/AutoIt/SublimeAutoItScript) (2 errors)

    • Invalid regex in grammar: source.autoit (in AutoIt.tmLanguage) contains a malformed regex (regex "\b(?i:_array1dtohistogram|_array...": definition too long (39591 bytes))
    • Invalid regex in grammar: source.autoit (in AutoIt.tmLanguage) contains a malformed regex (regex "\b(?i:_guictrltoolbar_getbuttoni...": definition too long (39600 bytes))
  • repository vendor/grammars/turtle.tmbundle (from https://github.com/peta/turtle.tmbundle) (5 errors)

    • Invalid regex in grammar: source.turtle (in Syntaxes/Turtle.tmLanguage) contains a malformed regex (regex "(?x)( (?: [\p{L}\p{M}] | [:0...": PCRE does not support \L, \l, \N{name}, \U, or \u (at offset 121))
    • Invalid regex in grammar: source.turtle (in Syntaxes/Turtle.tmLanguage) contains a malformed regex (regex "(?x)((?<=\s|^|_)(?:[\p{L}\p{M}] ...": PCRE does not support \L, \l, \N{name}, \U, or \u (at offset 57))
    • Invalid regex in grammar: source.turtle (in Syntaxes/Turtle.tmLanguage) contains a malformed regex (regex "(?x) (?<PN_CHARS_U>[\p{L}\p{M...": PCRE does not support \L, \l, \N{name}, \U, or \u (at offset 73))
    • Invalid regex in grammar: source.turtle (in Syntaxes/Turtle.tmLanguage) contains a malformed regex (regex "\[[\u20\u9\uD\uA]*\]": PCRE does not support \L, \l, \N{name}, \U, or \u (at offset 4))
    • Invalid regex in grammar: source.turtle (in Syntaxes/Turtle.tmLanguage) contains a malformed regex (regex "(?x) (?<PNAME_NS> (?: (?: [\...": PCRE does not support \L, \l, \N{name}, \U, or \u (at offset 68))
  • repository vendor/grammars/vscode-bitbake (from https://github.com/yoctoproject/vscode-bitbake.git) (1 errors)

    • Invalid regex in grammar: source.bb (in client/syntaxes/bitbake.tmLanguage.json) contains a malformed regex (regex "(?<=^|^fakeroot +)\b(python|def)...": lookbehind assertion is not fixed length (at offset 17))
  • repository vendor/grammars/vscode-jest (from https://github.com/jest-community/vscode-jest) (1 errors)

    • Grammar conversion failed. File syntaxes/ExtSettingsSchema.json failed to parse: Undeclared scope in grammar: syntaxes/ExtSettingsSchema.json has no scope name
  • repository vendor/grammars/vscode-yara (from https://github.com/infosec-intern/vscode-yara.git) (1 errors)

    • Invalid regex in grammar: source.yara (in yara/syntaxes/yara.tmLanguage.json) contains a malformed regex (regex "(?<=(^|[\)]|\b(?:them)\b))(?:\s*...": lookbehind assertion is not fixed length (at offset 25))

The grammar library contains 48 errors


Other

  • vendor/grammars/Sublime-QML - skozlovf/Sublime-QML - the project has been restructured and rewritten and only provides a Sublime 3 compatible grammar which is not supported.
    This grammar will need to be replaced and will no longer be updated.
@lildude
Copy link
Member

lildude commented Nov 30, 2017

Thanks for the useful and actionable list @vmg. One question...

repository vendor/grammars/language-babel (from https://github.com/gandm/language-babel) (14 errors)

Is this the actual upstream grammar as it stands now or the old pinned copy we're shipping with Linguist at the moment?

I'm assuming the latter, but thought I'd check to be sure.

@vmg
Copy link
Contributor Author

vmg commented Nov 30, 2017

I'm assuming the latter, but thought I'd check to be sure.

Correct! I fucked up my submodules. Sorry about that, I'll update the list with the proper URL.

@Alhadis
Copy link
Collaborator

Alhadis commented Dec 5, 2017

Hey, sorry about the (really) slow response. My MacBook died last week, which means I've been painfully limited in what I'm able to do on GitHub (I'm using my work's computer for the time being, when time permits).

The issues reported by my grammars are easily fixed; but the LilyPond grammar should be fixed with a PR to the upstream repository. Basically, the scope.AtLilyPond should be replaced with just scope.lilypond so it's consistent with other LilyPond grammars (and I can therefore replace the offending rule in text.roff with a single inclusion: {include: "source.lilypond"}.

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 1, 2018

What's the maximum token length permitted by the new compiler? I was about to start fixing the issues with Emacs Lisp's grammar, but realised I don't have the actual limit to go by.

Admittedly, I'm not really fond about the size limit, because the "fix" here is to simply break the pattern down into multiple rules that're bunched together under the same name. It feels terribly hacky, and the fact that the rules in question were compiled from an external source means that updating the list in future might be made more complicated...

@vmg
Copy link
Contributor Author

vmg commented Jan 8, 2018

@Alhadis: Sorry, you caught me on Holidays. The maximum size is enforced by PCRE, not by our parser, and it's 64kb for a single regexp. I'm aware it's a bummer, but it's the way PCRE was designed.

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 8, 2018

That's understandable. How is whitespace treated inside expressions which use "expanded" notation...?

m/
	abc
	(?:
		xyz
	)
	(?=\w+)
/x;

Because there are two different ways to represent that in CSON. One is with an ordinary quoted-string, which includes embedded newlines as part of the pattern...

pattern: "(?x)
	abc
	(?:
		xyz
	)
	(?=\\w+)
"

... and the other is to use triple-quoted strings ("heredocs"):

pattern: """(?x)
	abc
	(?:
		xyz
	)
	(?=\\w+)
"""

The latter will strip as much indentation as it can, leaving some (but not all) horizontal whitespace after the CSON-to-JSON conversion:

(?x)
	abc
	(?:
	        xyz
	)
	(?=\\w+)

Now this won't make any difference to the regex engine, but it will to my subdivision efforts... 😀

@vmg
Copy link
Contributor Author

vmg commented Jan 8, 2018

@Alhadis I'm honestly not sure of how exactly does PCRE implement this -- you should be able to test it out by simply downloading libpcre and trying to compile in the regexps. Our parser has no custom behavior here.

Alhadis added a commit to Alhadis/language-roff that referenced this issue Jan 10, 2018
Alhadis added a commit to Alhadis/language-pcb that referenced this issue Jan 10, 2018
Alhadis added a commit to Alhadis/language-maxscript that referenced this issue Jan 10, 2018
Alhadis added a commit to Alhadis/language-emacs-lisp that referenced this issue Jan 10, 2018
@Alhadis
Copy link
Collaborator

Alhadis commented Jan 10, 2018

Okay, that's the last of my grammars fixed. 😉

@Alhadis Alhadis closed this as completed Jan 10, 2018
@pchaigno
Copy link
Contributor

@Alhadis You closed this by mistake, right? 😸

@Alhadis Alhadis reopened this Jan 11, 2018
@Alhadis
Copy link
Collaborator

Alhadis commented Jan 11, 2018

Yeah, sorry. I didn't even notice I'd pressed the wrong button to comment. My mistake. 😓

@vmg
Copy link
Contributor Author

vmg commented Jan 11, 2018

@Alhadis 🙇🙇🙇

@lildude
Copy link
Member

lildude commented Jan 29, 2018

I've updated the sublime-mask output in the OP as the latest grammar compiler now prefer the "compiled" .tmLanguage file over the YAML file and sublime-mask has only updated the YAML file. I have pinged the author in tenbits/sublime-mask#1 asking them to update the mask.tmLanguage file too.

@tmillr
Copy link

tmillr commented Sep 19, 2022

I'm not sure where to post this so I'm just gonna post it here. I came across a TypeScript rendering/syntax highlighting issue the other day here on GitHub. I sent in a support ticket and they redirected me to this repo.

The issue can be viewed here. Thanks

@lildude
Copy link
Member

lildude commented Sep 19, 2022

@tmillr grammar problems should be reported with the upstream maintainers. In the case of TypeScript it is this repo. As this is a tree-sitter grammar, Linguist has no control over the updates so it's possible this issue has been fixed but not yet pulled into GitHub.

@DaelonSuzuka
Copy link

@lildude The Godot Engine grammars should have been fixed in godotengine/godot-vscode-plugin#416.

I attempted to validate the fixes using the grammar compiler instructions you gave us, but I don't know how to actually run linguist to make 100% sure that they're working now.

@lildude
Copy link
Member

lildude commented Nov 30, 2022

I don't know how to actually run linguist to make 100% sure that they're working now.

Running Linguist wouldn't help you as it doesn't actually do the highlighting. This is done by an internal service so you can only go based on what the validator says, or not.

@dragoncoder047
Copy link

dragoncoder047 commented Jan 31, 2023

The Python grammar referenced by this repo (MagicStack/MagicPython@7d0f2b2) includes support for the new Python match/case keywords but I'm not seeing them highlighted (see here). Is this just caused by the latest Linguist not being deployed to the Github backend servers (why would it take three months to do that??), or is there some other bug?

@Alhadis
Copy link
Collaborator

Alhadis commented Jan 31, 2023

@dragoncoder047 Unfortunately, GitHub uses a specialised Tree-Sitter parser for Python; the MagicPython grammar is only used to highlight code-blocks in comments.

@dragoncoder047
Copy link

@dragoncoder047 Unfortunately, GitHub uses a specialised Tree-Sitter parser for Python; the MagicPython grammar is only used to highlight code-blocks in comments.

I guess it's a bug in the grammar then. It's already been reported (tree-sitter/tree-sitter-python#141) so I won't bother re-filing it.

@2colours
Copy link

2colours commented Aug 3, 2023

Hello hello,

I don't know how this issue works but we supposedly eliminated the \p problems from the Raku repository. I'm curious about the result...

@lildude
Copy link
Member

lildude commented Aug 4, 2023

@2colours things look better, but it doesn't appear all have been resolved yet:

git submodule update --remote vendor/grammars/atom-language-perl6
Submodule path 'vendor/grammars/atom-language-perl6': checked out '190e4b38d53548b23263f9c399cd5172421aa057'script/grammar-compiler update -f
latest: Pulling from linguist/grammar-compiler
[...]
Status: Downloaded newer image for linguist/grammar-compiler:latest
docker.io/linguist/grammar-compiler:latest
 442 / 442  100.00% 8s
done! processed 442 grammars

- [ ] repository `vendor/grammars/atom-language-perl6` (from https://github.com/perl6/atom-language-perl6) (4 errors)
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.tmLanguage.json`) contains a malformed regex (regex "`(?x) ( [\p{Digit}\pL\pM'\-_]+ ) `...": unknown property name after \P or \p (at offset 16))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.tmLanguage.json`) contains a malformed regex (regex "`[\p{Digit}\pL\pM'\-_]+`": unknown property name after \P or \p (at offset 9))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.tmLanguage.json`) contains a malformed regex (regex "`(?x)(?<!\\)(\$|@|%|&)(?!\$)(`...": unknown property name after \P or \p (at offset 141))
  - Invalid regex in grammar: `source.raku` (in `grammars/raku.tmLanguage.json`) contains a malformed regex (regex "`(?x)(\$|@|%|&)(\.|\*|:|!|\^|~|`...": unknown property name after \P or \p (at offset 131))
[...]

@2colours
Copy link

2colours commented Aug 4, 2023

Not gonna lie, I'm perfectly clueless how Digit could remain when I even noted that it can/should be replaced to Nd... anyway, soon to be addressed.

EDIT: here goes nothing... should be good now 🤞

@2colours
Copy link

@lildude Ping?

@lildude
Copy link
Member

lildude commented Aug 14, 2023

@2colours Pong? 🤣

All looks good now. You'll see the benefit (if there's anything noticeable) when the next release is made. Thanks for addressing these issues 🙇

@AdamRaichu
Copy link

AdamRaichu commented Jan 9, 2024

Should I be mentioning new issues I find in this thread?

@lildude
Copy link
Member

lildude commented Jan 9, 2024

@AdamRaichu No. This is only for issues picked up by the grammar compiler.

@DecimalTurn
Copy link
Contributor

The error for vscode-vba was fixed with the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

14 participants