Using multiple delimiters when string splitting to an array #11167
FriendlyGuardian
started this conversation in
Scripting
Replies: 1 comment 1 reply
-
This is usually implemented using a tokenizer/parser. This is useful if, for example, you decide to add support for string literals in the future, as this is difficult to implement using a Example 1. Tokenizer@tool
extends EditorScript
class Tokenizer extends RefCounted:
const _TAB: int = 0x0009 # "\t"
const _NEWLINE: int = 0x000A # "\n"
const _SPACE: int = 0x0020 # " "
const _PAREN_OPEN: int = 0x0028 # "("
const _PAREN_CLOSE: int = 0x0029 # ")"
const _COMMA: int = 0x002C # ","
const _PERIOD: int = 0x002E # "."
const _ZERO: int = 0x0030 # "0"
const _NINE: int = 0x0039 # "9"
const _CAPITAL_A: int = 0x0041 # "A"
const _CAPITAL_Z: int = 0x005A # "Z"
const _UNDERSCORE: int = 0x005F # "_"
const _SMALL_A: int = 0x0061 # "a"
const _SMALL_Z: int = 0x007A # "z"
var _source: String
var _pos: int
var _len: int
func _init(source: String) -> void:
_source = source
_len = source.length()
func scan() -> String:
while _pos < _len:
match _source.unicode_at(_pos):
_TAB, _SPACE, _NEWLINE:
_pos += 1
_PAREN_OPEN, _PAREN_CLOSE, _COMMA, _PERIOD:
_pos += 1
return _source[_pos - 1]
_ when _is_identifier_start(_source.unicode_at(_pos)):
var from := _pos
while _pos < _len and _is_identifier_continue(_source.unicode_at(_pos)):
_pos += 1
return _source.substr(from, _pos - from)
_ when _is_digit(_source.unicode_at(_pos)):
var from := _pos
while _pos < _len and _is_digit(_source.unicode_at(_pos)):
_pos += 1
return _source.substr(from, _pos - from)
_:
_pos += 1
return '#Error|Invalid character "%s".' % _source[_pos - 1]
return "#EOF"
func _is_digit(c: int) -> bool:
return _ZERO <= c and c <= _NINE
func _is_identifier_start(c: int) -> bool:
return c == _UNDERSCORE \
or (_CAPITAL_A <= c and c <= _CAPITAL_Z) \
or (_SMALL_A <= c and c <= _SMALL_Z)
func _is_identifier_continue(c: int) -> bool:
return c == _UNDERSCORE \
or (_CAPITAL_A <= c and c <= _CAPITAL_Z) \
or (_SMALL_A <= c and c <= _SMALL_Z) \
or (_ZERO <= c and c <= _NINE)
func _run() -> void:
var tokenizer := Tokenizer.new("box.move(10,0)")
while true:
var token := tokenizer.scan()
print(token)
if token == "#EOF":
break Example 2. Regular expression@tool
extends EditorScript
func tokenize(source: String) -> Array[String]:
var token_regex := RegEx.create_from_string(r"[A-Za-z_]\w*|\d+|\(|\)|\.|,|(.|\n)")
var result: Array[String]
for m in token_regex.search_all(source):
var fallback := m.get_string(1)
if not fallback.is_empty():
if not fallback.strip_edges().is_empty():
result.append('#Error|Invalid character "%s".' % fallback)
continue
result.append(m.get_string())
result.append("#EOF")
return result
func _run() -> void:
for token in tokenize("box.move(10, 0)"):
print(token) Output
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Today I was trying to program an algorithm to read code that game users type.
An example of what the user could input would look like this "box.move(10,0)", with the goal for the user to program inside the game world.
I ran into a large issue however, when trying to use the String.split function, I could only use one delimiter per one split function. since my desired result was ["box", "move", "10", "0"] I struggled to achieve what felt like should be such a simple problem. I ended up programming it myself from scratch but I wonder if it would be helpful for other users working with strings to have that functionality built into GD script. This is something you can do in Unity using C#. So as a semi-new user who switched from Unity, I was very sad. I found some online forms and such that led me on a wild goose chase like using regex. I was using Godot v4.3 with GD script. I was unable to find an already existent discussion or issue about this so I figured I should open one myself! I don't know enough about the inner workings of Godot to program this myself but here is the code that I wrote that solved this problem for me (yes I know it is kind of messy and inefficient, I was just sick of this problem after 4 hours)
Thanks and have a great day!
Edit by the production team: added syntax highlight
Beta Was this translation helpful? Give feedback.
All reactions