Enhance lexer functionality and improve token reporting #226

jow- · 2024-09-23T21:56:08Z

This PR introduces the following changes to the lexer:

Make lexer API functions public for use in loadable extensions
Emit comment and template statement block tokens for improved parsing
Improve token position reporting:
- Add end position for emitted tokens
- Fix start offset for continued template literal string tokens
- Report proper start offset for TK_LEXP tokens

These changes enhance the lexer's usefulness for non-compilation downstream parse processes, such as code intelligence gathering in language server implementations.

- Report end position for emitted tokens. This is required to reliably determine the token length, e.g. for downstream code intelligence use cases - Fix start offset of continued template literal string tokens. Previously the start offset of a literal string following a `${...}` placeholder expressions was shifted by one byte - Report proper start offset of `TK_LEXP` tokens. Signed-off-by: Jo-Philipp Wich <[email protected]>

Tweak the token stream reported by the lexer in order to make it more useful for alternative, non-compilation downstream parse processes such as code intelligence gathering within a language server implementation. - Instead of silently discarding source code comments in the lexing phase, emit TK_COMMENT tokens which is useful to e.g. parse type annotations and other structured information. - Do not silently discard TK_LSTM tokens but report them to downstream parsers instead. - Do not silently emit TK_RSTM tokens as TK_SCOL but report them as-is to downstrem parsers. - Adjust the byte code compiler to properly deal with the changed token reporting by discarding incoming TK_COMMENT and TK_LSTM tokens and by remapping read TK_RSTM tokens to the TK_SCOL type. Signed-off-by: Jo-Philipp Wich <[email protected]>

Make the lexer API functions `uc_lexer_init()`, `us_lexer_free()` and `uc_lexer_next_token()` public for use in loadable extensions. Signed-off-by: Jo-Philipp Wich <[email protected]>

jow- added 3 commits September 23, 2024 23:29

lexer: make api functions public

2b2e732

Make the lexer API functions `uc_lexer_init()`, `us_lexer_free()` and `uc_lexer_next_token()` public for use in loadable extensions. Signed-off-by: Jo-Philipp Wich <[email protected]>

jow- merged commit 9cf53dd into master Sep 23, 2024
7 checks passed

jow- deleted the lexer-improvements branch September 23, 2024 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance lexer functionality and improve token reporting #226

Enhance lexer functionality and improve token reporting #226

jow- commented Sep 23, 2024

Enhance lexer functionality and improve token reporting #226

Enhance lexer functionality and improve token reporting #226

Conversation

jow- commented Sep 23, 2024