Dart lexical analysis tool
This tool is meant for lexical analysis of an input string. In order to do so,
you need to subclass various classes: For each token you want to recognize, you
have to subclass the Token
class, for each State the lexer can go through you
have to subclass the LexerState
class and finally for the lexer itself you
have to subclass the Lexer
class.
In order to correctly subclass the Token
class, you have to override at least
two methods: get regexProto
and Type get nextState
. Additionaly, you can
override the onMatch
method to do some calculations on a match. By default,
this method returns a List with a single element, the token itself. So, if you've
got a token, which you want to match but which shouldn't appear in the list of
tokens later (like for example, a whitespace token) then you could simply override
the onMatch method to return an empty list. For convenience, this has already
been done for you in the IgnoreToken
class.
This method can be overriden in two ways: You can either directly write a regex or you can define a RegExp object. Examples:
...
get regexProto => r"."; // Raw string
//or
get regexProto => new RegExp("."); // RegExp object
Through this method you specify the next state after this token was matched.
For example, you might have a token which matches on /*
and want to switch
into the comment state after this. Then simply write:
Type get nextState => CmtState;
This method defines the result of a matched token. As said before, if you
want to match but ignore the token, subclass the IgnoreToken
class.
If you want to, for example, calculate the value of a token, you could proceed
as follows:
List<Token> get onMatch {
this.value = int.parse(content);
return super.onMatch;
}
The LexerState
class has one method you have to override and two methods
you can override. You have to override the List<Type> get tokens
method,
which specifies the tokens that are being matched in this LexerState.
The two remaining overridable methods are void onEnd()
and
int get errorHandling
. The onEnd
method is called on a state, when
the end of the input string is reached. If you are, for example, in a comment
state, you could throw an exception like Unclosed commend
. Per default, this
method does nothing.
The int get errorHandling
method is used to specify the action a LexerState
should do if no token could be matched. Per default, this is
LexerState.ON_ERROR_ERROR_MSG
, which uses the logger to log an error message
containing more detailed information. Other constants controlling this behaviour
are LexerState.ON_ERROR_NOTHING
and LexerState.ON_ERROR_THROW_ERROR
, which
should be quite self-explanatory.
Last but not least you need to subclass the Lexer class to build an own Lexer.
The Lexer
class has one method you have to override and one method you could
override: The method you have to override is the List<Type> get states
method, which specifies all states of the Lexer
. The method you could
override is the Type get startState
method, but by default, the start state
of a Lexer
is the first element of the get states
method.
If you're done doing all the subclassing, you can finally use the lexer. To do
this, simply instantiate your lexer and call the run
method on it with
your input string as arguments. This will, if it can, produce an output
List of Token
s. Example:
var lexer = new MyLexer();
var tokens = lexer.run("1 + 2 + 3 + 4");
print(tokens); // Or do something more intelligent with the tokens
That's it, for further examples look into the test
directory. If you've got
enhancements, feel free to create a pull request. Have fun tokenizing!