Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance lexer functionality and improve token reporting #226

Merged
merged 3 commits into from
Sep 23, 2024

Conversation

jow-
Copy link
Owner

@jow- jow- commented Sep 23, 2024

This PR introduces the following changes to the lexer:

  1. Make lexer API functions public for use in loadable extensions
  2. Emit comment and template statement block tokens for improved parsing
  3. Improve token position reporting:
    • Add end position for emitted tokens
    • Fix start offset for continued template literal string tokens
    • Report proper start offset for TK_LEXP tokens

These changes enhance the lexer's usefulness for non-compilation downstream parse processes, such as code intelligence gathering in language server implementations.

 - Report end position for emitted tokens. This is required to reliably
   determine the token length, e.g. for downstream code intelligence
   use cases

 - Fix start offset of continued template literal string tokens.
   Previously the start offset of a literal string following a `${...}`
   placeholder expressions was shifted by one byte

 - Report proper start offset of `TK_LEXP` tokens.

Signed-off-by: Jo-Philipp Wich <[email protected]>
Tweak the token stream reported by the lexer in order to make it more useful
for alternative, non-compilation downstream parse processes such as code
intelligence gathering within a language server implementation.

 - Instead of silently discarding source code comments in the lexing phase,
   emit TK_COMMENT tokens which is useful to e.g. parse type annotations and
   other structured information.

 - Do not silently discard TK_LSTM tokens but report them to downstream
   parsers instead.

 - Do not silently emit TK_RSTM tokens as TK_SCOL but report them as-is to
   downstrem parsers.

 - Adjust the byte code compiler to properly deal with the changed token
   reporting by discarding incoming TK_COMMENT and TK_LSTM tokens and by
   remapping read TK_RSTM tokens to the TK_SCOL type.

Signed-off-by: Jo-Philipp Wich <[email protected]>
Make the lexer API functions `uc_lexer_init()`, `us_lexer_free()` and
`uc_lexer_next_token()` public for use in loadable extensions.

Signed-off-by: Jo-Philipp Wich <[email protected]>
@jow- jow- merged commit 9cf53dd into master Sep 23, 2024
7 checks passed
@jow- jow- deleted the lexer-improvements branch September 23, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant