-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TASK: Implement alternative approach to lexical analysis #34
base: main
Are you sure you want to change the base?
Conversation
Wait a second, isnt this similar to the philosophy of the new fusion parser? ❤️ edit it seems more sophisticated ^^ would love to talk about this ^^ |
$source, | ||
TokenTypes::from( | ||
TokenType::TEMPLATE_LITERAL_DELIMITER, | ||
TokenType::SPACE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interestring. because we allow the space token here its not a TEMPLATE_LITERAL_CONTENT
(but it probably should be in real world ^^)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the TemplateLiteralParser
will read this differently :)
1e67c62
to
ba5a324
Compare
ba5a324
to
4f5b603
Compare
a78721c
to
66e0e7c
Compare
use PackageFactory\ComponentEngine\Parser\Source\Position; | ||
use PackageFactory\ComponentEngine\Parser\Source\Range; | ||
|
||
final class Buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm im just wondering what are the pros and cons of making this thing mutable ...
on the one side, the lexer can expose it as public readonly member but methods like override
and reset
might always be smelly. Then again, this mutable buffer might be a performance optimization, as we dont need a new object every time.
...and move the (rule -> matcher) cache concern over to the Scanner class.
solves: #3, #33
This PR introduces a
Lexer
class that implements a very different approach to lexical analysis than the existentTokenizer
:Sequence
that uses state to switch between multiple subsequent matchersmb_*
functionsI focused somewhat on memory-efficiency and expect that the implementation will be more economical on memory use than the
Tokenizer
.Token types have also changed to a more rigid set. This will simplify some of the parser implementations later on.
As of right now, I'm not too sure of this approach and expect things to break when I turn to the parser implementations. I'm also not quite sure if I haven't missed anything on the multi-byte character handling (seems too easy to me 😅). It'll require more tests further down the line to be on the safe side with this. For now, all of this is just an experiment.
Remaining TODOs