Better built-in lexers #668

silversquirl · 2018-02-10T00:34:35Z

The syntax highlighting lexers are currently missing support for various things (eg. labels in C, proper string highlighting for shell scripts, etc.)
I'm wondering whether it would be worth creating a wiki page with a list, so that people who want to contribute can look there.

martanne · 2018-02-11T09:35:00Z

If you feel it would help, after all it is a Wiki. But I'm not sure how many people actually read these pages. For discussion an issue (like this one) is probably more helpful? Maybe we could add a general page regarding syntax highlighting (similar to the README in lua/lexers). It could then also feature/reference a list of known issues.

There are some known problems/limitations of the current approach to syntax highlighting (e.g. #110, #603). Eventually I would like to address these, but performance and accuracy needs to be weighted against the introduced complexity.

silversquirl · 2018-02-11T17:44:52Z

If you think an issue is better, then I'm happy with that. I'll try to get acquainted with how LPeg works when I have some time and start working on the issues I mentioned initially.

p-e-w · 2018-02-18T09:38:56Z

Have you considered tree-sitter?

While there is a lot of infrastructure around them, the parsers themselves are pure C, and designed for integration in text editors. Tree-sitter parsers are probably the highest quality syntax highlighting parsers available today, and are only going to get better because they are maintained by the creators of Atom.

I haven't seen any performance benchmarks yet, but I fully expect tree-sitter parsers to outperform all other incremental parsers of comparable quality eventually. Whether they are faster than vis' current lexers is hard to predict, but then those lexers are quite primitive and don't even come close to the accuracy that tree-sitter provides.

Having parsing of such quality in vis would also make new editing commands possible through accurate scope detection. See the examples in this Atom PR.

Thus far tree-sitter parsers exist only for the most popular languages, but integrating those seems like a potentially huge win for vis: High-quality parsers that are maintained by someone else.

lanodan · 2018-02-19T05:45:51Z

Disclaimer/Warning: All of this is my opinion and is As Far As I Know. I’m also quite bad at phrasing this kind of stuff but I don’t want to ignore things and I don’t want things to escalate.

Have you considered tree-sitter?

The ~advertised output is DOM-style, while we need something for Terminals
Another language (C++, JSON, …)
I’m not sure efficiency is that much needed
There seems to be no release or tags !
This is apparently coming from the webapp universe which is pretty remote from where vis is now. (which I would describe as a traditional but modern Unix/Plan9 editor)

new editing commands possible through accurate scope detection.

The current keystrokes and regexes are already making it possible, I don’t really see something new here…

maintained by someone else.

LPeg lexers are coming from/shared with Scintillua.

p-e-w · 2018-02-19T06:48:50Z

A few misunderstandings there:

The ~advertised output is DOM-style, while we need something for Terminals

Tree-sitter offers a DOM-style interface, not DOM-style output. That is purely an API convention, and it is very convenient to use. The interface is specifically designed for the needs of syntax highlighting in editors and has nothing to do with the "web DOM" (or with terminals, for that matter).

Another language (C++, JSON, …)

Tree-sitter uses a compiler written in C++ that processes JSON input to generate a pure C parser without any non-C dependencies. The compiler and the JSON file are not needed for parsing, and don't have to be shipped with the editor.

There seems to be no release or tags !

Every single tree-sitter grammar is versioned, tagged and unit tested. The compiler currently doesn't have versioned releases, but language bindings that depend on the compiler again are versioned, and pin a specific commit for their tree-sitter submodule.

This is apparently coming from the webapp universe which is pretty remote from where vis is now. (which I would describe as a traditional but modern Unix/Plan9 editor)

Tree-sitter has nothing whatsoever to do with the "webapp universe". It is used by the Atom editor (which some people might consider a "webapp") as a dependency via Node.js bindings that are independent of the core parser and compiler. There are also bindings for Haskell and Ruby, and of course the native C API.

Using tree-sitter in vis doesn't bring vis any closer to the webapp universe than hosting its code on GitHub does.

The current keystrokes and regexes are already making it possible, I don’t really see something new here…

The whole reason this issue even exists is because vis/Scintillua parsers are of poor quality and do not even recognize lots of standard syntax. If structured text editing is to be reliable, quality syntax parsing is required first, and I don't ever see this happening with the kind of grammars that vis uses now.

silversquirl · 2018-02-19T09:27:15Z

I'm skeptical of whether tree-sitter would work for vis. To use parsers written in C would either require recompilation every time you want to add a new language, or use of dynamic loading, neither of which seem appealing to me.

If you want the benefits provided by tree-sitter, perhaps writing a Lua plugin to add support for it would be better. That way, it's optional and people who don't want to use it don't have to.

silversquirl · 2018-02-19T15:28:56Z

Finally had a chance to work on this. I've patched the Bash lexer to support variable expansion inside heredocs and double-quoted strings. Next I'll add highlighting for $() and correct the highlighting for ``, which is currently highlighted like a single-quoted string.

I'll submit a PR when I'm happy with it.

mqudsi · 2018-04-26T22:38:27Z

I would strongly caution against working around n issues rather than addressing the root problem. You don't want to end up in a situation where the amount of effort invested in moving from a 20% solution to a 40% or 60% solution leads a project to stick to the former instead of moving to a 90% alternative.

The problem with C-based parsers is, however, a proper concern. One option is to offer "core" lexers powered by tree-sitter and compiled into vis, with fallback / user-defined lua-based lexers so a recompile wouldn't be necessary, but that's not ideal. What would be cool is if it were possible to use llvm to compile that parser to lua, but some googling does not reveal any attempts at making such an endeavour generally possible (and it would significantly complicate the build toolchain, unless a 3rd party project maintaining a lua-compiled version of all tree-sitter parsers were used, which would definitively address that issue).

Given that lexers aren't "core" functionality, I don't think ruling out dynamic loading without some hefty consideration is the way to go, but still, users having to compile C code each time they want to add a lexer is a burden...

mcepl · 2020-02-29T19:50:33Z

tree-sitter is being added to neovim ([WIP] Tree-sitter integration neovim/neovim#9219 and many other issues/PRs)
I wonder whether @bfredl actually wrote Lua bindings for tree-sitter and whether it couldn’t be used by vis.

bfredl · 2020-02-29T20:13:41Z

That PR is pretty old, neovim/neovim#11724 is a better reference. But yes https://github.com/neovim/neovim/blob/master/src/nvim/lua/treesitter.c is relatively self-contained, except for directly reading neovim buffers in one function.

mcepl · 2022-11-29T23:39:50Z

@ninewise, this is probably irrelevant now when we switched to the Scintillua lexers, right?

mcepl · 2023-08-10T20:12:04Z

@rnpnr ??? This too should probably be closed, right?

rnpnr · 2023-08-10T21:15:27Z

Yes I think this covered now that we use Scintillua.

erf · 2023-09-05T13:37:58Z

I know this issue was closed as we use Scintillua , but if there are limitations to our current solution (i saw a few issues mentioned in the 0.9 release), would it be interesting to open up this discussion again?

It seem performance and syntax highlighting could be improved using tree-sitter, although not sure as many languages are supported or how difficult it would be to add this support to vis.

Here is an intro video to tree-sitter and tree-sitter for dummies video

mqudsi · 2023-09-05T16:58:13Z

@erf tree-sitter is (already) heavily discussed above.

mcepl · 2023-09-05T17:06:59Z

I don’t think so. I have been following neovim development for years, and although treesitter originally promised to be just one C-file, it is certainly much more than that these days. Change to vis requiring this would be enormous and results are questionable (i.e., scintillua works just fine for most of the time). I also like handing over part of our code upstream.

Problems with the current merging of the latest development was that we were using a way too old version of scintillua in vis and we are now struggling with differences. If you want to help, the working code is at the devel branch of https://git.sr.ht/~mcepl/vis and we apparently need to work on themes, any patches would be more than welcome.

If the speed is really a problem, and I have never saw any evidence for that, it might be a fun project to make vis work with LuaJIT (it actually shows some signs of life again).

erf · 2023-09-05T18:31:03Z

@mqudsi I can see it has been discussed, but i can't really see a conclusion, other than "we are currently using this other solution".

What i find most appealing with tree-sitter is that it is more intelligent about the types it gathers from the language and therefor gives better syntax hightlighting, for example if you make a c struct now, it's type is not recognized as a type like an int or bool, but tree-sitter would recognize this. Also if you have long sentences, it is difficult for regex based solutions, but tree-sitter has no problem with this and performance is good. Also tree-sitter ads new capabilities like code folding, and could also help with multiple selections.

I'm not necessarily pushing for changes on this in the near future, but thought i'll chip in on this topic.

Here's the difference in syntax highlighting between vis and helix based on the example in this video

mcepl · 2023-09-05T20:16:51Z

What i find most appealing with tree-sitter is that it is more intelligent about the types it gathers from the language and therefor gives better syntax hightlighting, for example if you make a c struct now, it's type is not recognized as a type like an int or bool, but tree-sitter would recognize this. Also if you have long sentences, it is difficult for regex based solutions, but tree-sitter has no problem with this and performance is good. Also tree-sitter ads new capabilities like code folding, and could also help with multiple selections.

You are wasting your breath, as I said, I have been using neovim built from master for years, so I know perfectly well what it brings above regexp-based highlighting. If you feel strongly about it, make a fork.

mcepl · 2023-09-05T21:56:15Z

@mqudsi I can see it has been discussed, but i can't really see a conclusion, other than "we are currently using this other solution".

That’s all what was said. What you mentioned, and also “the benefits of this change don’t balance costs of huge change to the vis codebase.” My comment about fork was not a flippant rejection of your idea, but I don’t think that such big change could be done with somebody who proves her ability to do it. To quote the classic, “Show us the code!”.

mcepl · 2023-09-11T16:18:20Z

@mqudsi BTW, if you are interested in the speed of vis, it may be worthy to try to make vis work with LuaJIT (#291), I have unfinished (and not compiling ATM) and very unpolished patch for that in the luajit branch of https://git.sr.ht/~mcepl/vis/ . If you want to give it a stab, I would appreciate it a lot.

mqudsi · 2023-09-13T01:26:24Z

@mcepl I've read the issue and looked at your branch: interesting work - and it looks like you've gotten very far already! I'm somewhat maxed out on my spare time right now, but I can put it on the backburner -- assuming you don't get around to it before I do.

mcepl · 2023-09-13T05:45:09Z

I'm somewhat maxed out on my spare time right now, … assuming you don't get around to it before I do.

Unlikely, it is that time after holidays when you find out how much you put on the back burner and now you find out how much is there.

martanne mentioned this issue Jan 28, 2020

Double quotes are incorrectly matched sometimes #711

Closed

martanne added the lua:lexer label Jan 30, 2020

erf mentioned this issue Apr 28, 2022

vis future? #1001

Closed

rnpnr closed this as completed Aug 10, 2023

mcepl mentioned this issue Sep 10, 2023

Support old option names and don't deprecate them #1127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better built-in lexers #668

Better built-in lexers #668

silversquirl commented Feb 10, 2018

martanne commented Feb 11, 2018

silversquirl commented Feb 11, 2018

p-e-w commented Feb 18, 2018

lanodan commented Feb 19, 2018 •

edited

Loading

p-e-w commented Feb 19, 2018

silversquirl commented Feb 19, 2018

silversquirl commented Feb 19, 2018

mqudsi commented Apr 26, 2018

mcepl commented Feb 29, 2020

bfredl commented Feb 29, 2020

mcepl commented Nov 29, 2022

mcepl commented Aug 10, 2023

rnpnr commented Aug 10, 2023

erf commented Sep 5, 2023

mqudsi commented Sep 5, 2023

mcepl commented Sep 5, 2023 •

edited

Loading

erf commented Sep 5, 2023 •

edited

Loading

mcepl commented Sep 5, 2023

mcepl commented Sep 5, 2023

mcepl commented Sep 11, 2023

mqudsi commented Sep 13, 2023

mcepl commented Sep 13, 2023 •

edited

Loading

Better built-in lexers #668

Better built-in lexers #668

Comments

silversquirl commented Feb 10, 2018

martanne commented Feb 11, 2018

silversquirl commented Feb 11, 2018

p-e-w commented Feb 18, 2018

lanodan commented Feb 19, 2018 • edited Loading

p-e-w commented Feb 19, 2018

silversquirl commented Feb 19, 2018

silversquirl commented Feb 19, 2018

mqudsi commented Apr 26, 2018

mcepl commented Feb 29, 2020

bfredl commented Feb 29, 2020

mcepl commented Nov 29, 2022

mcepl commented Aug 10, 2023

rnpnr commented Aug 10, 2023

erf commented Sep 5, 2023

mqudsi commented Sep 5, 2023

mcepl commented Sep 5, 2023 • edited Loading

erf commented Sep 5, 2023 • edited Loading

mcepl commented Sep 5, 2023

mcepl commented Sep 5, 2023

mcepl commented Sep 11, 2023

mqudsi commented Sep 13, 2023

mcepl commented Sep 13, 2023 • edited Loading

lanodan commented Feb 19, 2018 •

edited

Loading

mcepl commented Sep 5, 2023 •

edited

Loading

erf commented Sep 5, 2023 •

edited

Loading

mcepl commented Sep 13, 2023 •

edited

Loading