-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better built-in lexers #668
Comments
If you feel it would help, after all it is a Wiki. But I'm not sure how many people actually read these pages. For discussion an issue (like this one) is probably more helpful? Maybe we could add a general page regarding syntax highlighting (similar to the README in There are some known problems/limitations of the current approach to syntax highlighting (e.g. #110, #603). Eventually I would like to address these, but performance and accuracy needs to be weighted against the introduced complexity. |
If you think an issue is better, then I'm happy with that. I'll try to get acquainted with how LPeg works when I have some time and start working on the issues I mentioned initially. |
Have you considered tree-sitter? While there is a lot of infrastructure around them, the parsers themselves are pure C, and designed for integration in text editors. Tree-sitter parsers are probably the highest quality syntax highlighting parsers available today, and are only going to get better because they are maintained by the creators of Atom. I haven't seen any performance benchmarks yet, but I fully expect tree-sitter parsers to outperform all other incremental parsers of comparable quality eventually. Whether they are faster than vis' current lexers is hard to predict, but then those lexers are quite primitive and don't even come close to the accuracy that tree-sitter provides. Having parsing of such quality in vis would also make new editing commands possible through accurate scope detection. See the examples in this Atom PR. Thus far tree-sitter parsers exist only for the most popular languages, but integrating those seems like a potentially huge win for vis: High-quality parsers that are maintained by someone else. |
Disclaimer/Warning: All of this is my opinion and is As Far As I Know. I’m also quite bad at phrasing this kind of stuff but I don’t want to ignore things and I don’t want things to escalate.
The current keystrokes and regexes are already making it possible, I don’t really see something new here…
LPeg lexers are coming from/shared with Scintillua. |
A few misunderstandings there:
Tree-sitter offers a DOM-style interface, not DOM-style output. That is purely an API convention, and it is very convenient to use. The interface is specifically designed for the needs of syntax highlighting in editors and has nothing to do with the "web DOM" (or with terminals, for that matter).
Tree-sitter uses a compiler written in C++ that processes JSON input to generate a pure C parser without any non-C dependencies. The compiler and the JSON file are not needed for parsing, and don't have to be shipped with the editor.
Every single tree-sitter grammar is versioned, tagged and unit tested. The compiler currently doesn't have versioned releases, but language bindings that depend on the compiler again are versioned, and pin a specific commit for their tree-sitter submodule.
Tree-sitter has nothing whatsoever to do with the "webapp universe". It is used by the Atom editor (which some people might consider a "webapp") as a dependency via Node.js bindings that are independent of the core parser and compiler. There are also bindings for Haskell and Ruby, and of course the native C API. Using tree-sitter in vis doesn't bring vis any closer to the webapp universe than hosting its code on GitHub does.
The whole reason this issue even exists is because vis/Scintillua parsers are of poor quality and do not even recognize lots of standard syntax. If structured text editing is to be reliable, quality syntax parsing is required first, and I don't ever see this happening with the kind of grammars that vis uses now. |
I'm skeptical of whether tree-sitter would work for vis. To use parsers written in C would either require recompilation every time you want to add a new language, or use of dynamic loading, neither of which seem appealing to me. If you want the benefits provided by tree-sitter, perhaps writing a Lua plugin to add support for it would be better. That way, it's optional and people who don't want to use it don't have to. |
Finally had a chance to work on this. I've patched the Bash lexer to support variable expansion inside heredocs and double-quoted strings. Next I'll add highlighting for I'll submit a PR when I'm happy with it. |
I would strongly caution against working around n issues rather than addressing the root problem. You don't want to end up in a situation where the amount of effort invested in moving from a 20% solution to a 40% or 60% solution leads a project to stick to the former instead of moving to a 90% alternative. The problem with C-based parsers is, however, a proper concern. One option is to offer "core" lexers powered by tree-sitter and compiled into vis, with fallback / user-defined lua-based lexers so a recompile wouldn't be necessary, but that's not ideal. What would be cool is if it were possible to use llvm to compile that parser to lua, but some googling does not reveal any attempts at making such an endeavour generally possible (and it would significantly complicate the build toolchain, unless a 3rd party project maintaining a lua-compiled version of all tree-sitter parsers were used, which would definitively address that issue). Given that lexers aren't "core" functionality, I don't think ruling out dynamic loading without some hefty consideration is the way to go, but still, users having to compile C code each time they want to add a lexer is a burden... |
|
That PR is pretty old, neovim/neovim#11724 is a better reference. But yes https://github.com/neovim/neovim/blob/master/src/nvim/lua/treesitter.c is relatively self-contained, except for directly reading neovim buffers in one function. |
@ninewise, this is probably irrelevant now when we switched to the Scintillua lexers, right? |
@rnpnr ??? This too should probably be closed, right? |
Yes I think this covered now that we use Scintillua. |
I know this issue was closed as we use Scintillua , but if there are limitations to our current solution (i saw a few issues mentioned in the 0.9 release), would it be interesting to open up this discussion again? It seem performance and syntax highlighting could be improved using tree-sitter, although not sure as many languages are supported or how difficult it would be to add this support to Here is an intro video to tree-sitter and tree-sitter for dummies video |
@erf tree-sitter is (already) heavily discussed above. |
I don’t think so. I have been following neovim development for years, and although treesitter originally promised to be just one C-file, it is certainly much more than that these days. Change to vis requiring this would be enormous and results are questionable (i.e., scintillua works just fine for most of the time). I also like handing over part of our code upstream. Problems with the current merging of the latest development was that we were using a way too old version of scintillua in vis and we are now struggling with differences. If you want to help, the working code is at the devel branch of https://git.sr.ht/~mcepl/vis and we apparently need to work on themes, any patches would be more than welcome. If the speed is really a problem, and I have never saw any evidence for that, it might be a fun project to make vis work with LuaJIT (it actually shows some signs of life again). |
@mqudsi I can see it has been discussed, but i can't really see a conclusion, other than "we are currently using this other solution". What i find most appealing with tree-sitter is that it is more intelligent about the types it gathers from the language and therefor gives better syntax hightlighting, for example if you make a c struct now, it's type is not recognized as a type like an int or bool, but tree-sitter would recognize this. Also if you have long sentences, it is difficult for regex based solutions, but tree-sitter has no problem with this and performance is good. Also tree-sitter ads new capabilities like code folding, and could also help with multiple selections. I'm not necessarily pushing for changes on this in the near future, but thought i'll chip in on this topic. Here's the difference in syntax highlighting between vis and helix based on the example in this video |
You are wasting your breath, as I said, I have been using neovim built from master for years, so I know perfectly well what it brings above regexp-based highlighting. If you feel strongly about it, make a fork. |
That’s all what was said. What you mentioned, and also “the benefits of this change don’t balance costs of huge change to the vis codebase.” My comment about fork was not a flippant rejection of your idea, but I don’t think that such big change could be done with somebody who proves her ability to do it. To quote the classic, “Show us the code!”. |
@mqudsi BTW, if you are interested in the speed of vis, it may be worthy to try to make vis work with LuaJIT (#291), I have unfinished (and not compiling ATM) and very unpolished patch for that in the |
@mcepl I've read the issue and looked at your branch: interesting work - and it looks like you've gotten very far already! I'm somewhat maxed out on my spare time right now, but I can put it on the backburner -- assuming you don't get around to it before I do. |
Unlikely, it is that time after holidays when you find out how much you put on the back burner and now you find out how much is there. |
The syntax highlighting lexers are currently missing support for various things (eg. labels in C, proper string highlighting for shell scripts, etc.)
I'm wondering whether it would be worth creating a wiki page with a list, so that people who want to contribute can look there.
The text was updated successfully, but these errors were encountered: