Replies: 26 comments
-
Great list 👍 Just to expand a little, from my experience:
|
Beta Was this translation helpful? Give feedback.
-
Another one:
|
Beta Was this translation helpful? Give feedback.
-
Updated the initial message, and added your use-case @Xophmeister |
Beta Was this translation helpful? Give feedback.
-
Added another one. I'm thinking about committing this file to the repository itself, what do you think? |
Beta Was this translation helpful? Give feedback.
-
I don't see why not. But on the other hand, what purpose would serve a file that gathers this information? |
Beta Was this translation helpful? Give feedback.
-
The same as this issue, I think:
Except that it would stay visible, and its modifications would be |
Beta Was this translation helpful? Give feedback.
-
I suppose another way to achieve that would be to add a |
Beta Was this translation helpful? Give feedback.
-
This is more of a meta-issue, but I figured it's worth mentioning here as it's somewhat relevant: Our language query files are declarative formatting directives, where each query targets some syntactic structure attested by the grammar. Targets can overlap and, in general (AFAIK) there is no limit to the number of queries that can target a specific node. For non-trivial languages, this quickly flaunts the usual good practices of writing code. For example, language query files can become hundreds-to-thousands of lines long and it's up to whoever wrote them to organise them coherently and remember the state, such that any new queries don't conflict or cause unintended interactions. By in large, this is doable rather than intractable, because programming languages' syntactic structures are (anecdotally) relatively well siloed. However, edge cases certainly exist where it's not immediately obvious which queries are being applied. |
Beta Was this translation helpful? Give feedback.
-
@nbacquey I think it depends on the objective (or maybe we need both). If the goal is to discuss how they can be addressed, to create workarounds in our workflows or upstream contributions, then one-issue-per-grievance is the way to go. If the goal is to document gotchas when writing query files, then this should be part of our documentation. Maybe we need a bit of both. @Xophmeister I don't think that there is a fundamental solution to this. In that there probably isn't an ideal formatting DSL that doesn't exhibit the problem. Maybe what we could think about is to have debugging tools that make clear which queries have matched a particular piece of code? |
Beta Was this translation helpful? Give feedback.
-
Indeed; some kind of tooling to aid development is what I had in mind 👍 |
Beta Was this translation helpful? Give feedback.
-
I was thinking about a debugging tool, or a debugging mode, as well. |
Beta Was this translation helpful? Give feedback.
-
One more point: There is no way to define constants to reduce duplication. For example, the Bash grammar uses this particular construct 9 times: [(command) (list) (pipeline) (compound_statement) (subshell) (redirected_statement) (variable_assignment)] |
Beta Was this translation helpful? Give feedback.
-
A metalanguage that compiles down to Tree-sitter queries sounds quite attractive. (Leveraging Nickel to do the job would be the icing on the cake!) |
Beta Was this translation helpful? Give feedback.
-
Good point. And actually this particular problem (reusable constants) could be solved with the most basic templating solution imaginable. |
Beta Was this translation helpful? Give feedback.
-
@nbacquey I was reading some documentation, and, for completeness, there is a limited form of negation in the query language
|
Beta Was this translation helpful? Give feedback.
-
The documentation explicitly calls out |
Beta Was this translation helpful? Give feedback.
-
I haven't seen an example of unnamed nodes not being ignored here. Is this grievance's title misleading? I think that it's clear from the Your example does document a pretty important pitfall though: it's typically a bad idea to use |
Beta Was this translation helpful? Give feedback.
-
I'm not sure that your example demonstrates this. It looks like it may be an occurrence of the somewhat counter intuitive behaviour of Maybe we should try a more constrained query
And run it on various length of lists and see which one match (if the |
Beta Was this translation helpful? Give feedback.
-
Can't remember the exact details now, but I do remember not being able to use built-in predicates, and finding that they were indeed not implemented in the Rust binding. Worth another look. |
Beta Was this translation helpful? Give feedback.
-
AFAIK, the need for this kind of negation has never come up. It arises in a more general sense, when there's a pattern that applies to a strict subset of what the grammar affords (i.e., you end up with "exceptional nodes") and |
Beta Was this translation helpful? Give feedback.
-
Added a paragraph on tokens captured by regular expressions in the grammar |
Beta Was this translation helpful? Give feedback.
-
I never realised the difference between strings and regular expressions in a grammar until tree-sitter/tree-sitter-ocaml#63. I created tree-sitter/tree-sitter-ocaml#77 to make the behaviour for operators more consistent. @nbacquey do you think that's a good idea, or will it make it even more complicated for topiary? |
Beta Was this translation helpful? Give feedback.
-
Hi @314eter, thanks for taking an interest in this issue! Basically, it would help us a great deal if all the nodes from this rule could appear in the syntax tree: infix_operator: $ => choice(
$._pow_operator,
$._mult_operator,
$._add_operator,
$._concat_operator,
$._rel_operator,
$._and_operator,
$._or_operator,
$._assign_operator
), |
Beta Was this translation helpful? Give feedback.
-
Done in v0.20.2. |
Beta Was this translation helpful? Give feedback.
-
Also see #537 (comment) |
Beta Was this translation helpful? Give feedback.
-
Another one I learned the hard way in a different context: After three captures on any given node, further captures are silently ignored. E.g. given
Right now, there aren't enough orthogonal captures in topiary to hit this...but as it grows, this might crop up, and boy does it hurt. |
Beta Was this translation helpful? Give feedback.
-
This issue will only serve to discuss weaknesses of the tree-sitter query language. We will be able to address them separately if needed.
No negation predicate
We often want to express something like "match any node that hasn't that particular type". However, the query language has no negation predicate yet, so the only way to do that is to enumerate all node types that can appear in the given context, and remove the one we want to exclude.
It makes for unnecessarily cumbersome code.
This is a known issue in the tree-sitter repo, which had been dormant for 2+ years.
The anchor
.
is useful in practice, but useless in theoryThe
.
anchor operator is very useful for performance and correctness of queries, but it is ultimately impossible to use correctly.This is because most language grammars allow some special types of nodes, that can pop anywhere in the syntax tree (e.g. comments). The anchor operator doesn't ignore those special nodes.
Consider this query for OCaml:
( (number) . ";" @tag )
It will match all three semicolons in the following code:
But only two in this code:
The same holds for anchors marking the beginning or end of a node's children.
Worse: even if we decide to bite the bullet and replace every
.
by.(comment)*.
, it won't work as expected either (see next point).The anchor
.
ignores non-named nodes, except when it doesn'tConsider this OCaml query:
Ran on this code:
Which has the following syntax tree:
The result is:
Which means that both
(number)
and";"
were matched by the query (i.e. were considered "just before""]"
). I understand that we may want to ignore non-named nodes when checking adjacency, but having more that one node being the immediate left neighbor of another node, severely breaks some invariants you want to have when writing queries.The Kleene star
*
can match non-consecutive sequences of nodesThe behavior of the
*
operator is quite surprising. Consider this OCaml query:Ran on this code:
Which has the following syntax tree:
The result is:
This seems to indicate that the
(number)*
predicate matches:(number)
(expected)(number).";"
(unexpected)(number).";".(number)
(very unexpected)I don't know if the exact behavior for the Kleene star is documented anywhere, but it seems very unusual to me.
We can't use the (apparent?) full set of tree-sitter queries features
The documentation states that we can use predicates of the form
( (identifier) @constant (#match? @constant "^[A-Z][A-Z_]+") )
However, we can't currently compile them in the Rust library.
Inabilty to specify disjunctions in encompassing node types
We sometimes need to write queries of the form:
(X Y ; etc. )
Where
X
ranges over a few nodes (say,a
,b
andc
) andY
is constant. It would be nice if it were possible to alternate over head nodes (e.g.,([a b c] Y))
, rather than rewrite the rule several times and/or use scopes, or being less precise and using a wildcard in the head position.Tokens that are captured with regexes never appear in the CST
In tree-sitter grammars, tokens can either be hardcoded, or captured through regular expressions.
For instance, here is the definition of the additive infix operators in the OCaml grammar:
+
,+.
,-
, and-.
are hardcoded, while another additive operator like-%
isn't. This impacts what appears in the CST:This is the CST of
1 + 2
:This is the CST of
1 -% 2
:Note that there is no
{Node -% (0, 2) - (0, 4)} - Named: false
.This means that tree-sitter queries cannot match infix operators that are defined in regular expressions.
For instance, the following tree-sitter query may have matches:
(infix_operator "+" @do_something )
While the following will never match anything, and isn't even a correct query:
(infix_operator "-%" @do_something )
See #418 and #462 for further reference
Beta Was this translation helpful? Give feedback.
All reactions