Using multiple delimiters when string splitting to an array #11167

FriendlyGuardian · 2024-11-15T19:18:21Z

FriendlyGuardian
Nov 15, 2024

Today I was trying to program an algorithm to read code that game users type.
An example of what the user could input would look like this "box.move(10,0)", with the goal for the user to program inside the game world.
I ran into a large issue however, when trying to use the String.split function, I could only use one delimiter per one split function. since my desired result was ["box", "move", "10", "0"] I struggled to achieve what felt like should be such a simple problem. I ended up programming it myself from scratch but I wonder if it would be helpful for other users working with strings to have that functionality built into GD script. This is something you can do in Unity using C#. So as a semi-new user who switched from Unity, I was very sad. I found some online forms and such that led me on a wild goose chase like using regex. I was using Godot v4.3 with GD script. I was unable to find an already existent discussion or issue about this so I figured I should open one myself! I don't know enough about the inner workings of Godot to program this myself but here is the code that I wrote that solved this problem for me (yes I know it is kind of messy and inefficient, I was just sick of this problem after 4 hours)

func _read_code():
	lines = text.split("\n")
	for l in lines:
		if l == "" or l == " " or l == "\n": continue
		var split_characters = [".", "(", ",", ")", ";"]
		var words = []
		var i = 0
		var splits = []
		for c in l:
			for s in split_characters:
				if c == s:
					splits.append(i)
			i += 1
		var h = 0
		for s in splits:
			var word = ""
			for j in l:
				if h == s:
					break
				word += l[h]
				h += 1
			if word != "":
				words.append(word)
			h += 1
		print(words)

Thanks and have a great day!

Edit by the production team: added syntax highlight

dalexeev · 2024-11-15T23:14:20Z

dalexeev
Nov 15, 2024
Collaborator

Today I was trying to program an algorithm to read code that game users type.

This is usually implemented using a tokenizer/parser. This is useful if, for example, you decide to add support for string literals in the future, as this is difficult to implement using a split()-based approach or regular expressions.

Example 1. Tokenizer

@tool
extends EditorScript

class Tokenizer extends RefCounted:
    const _TAB:         int = 0x0009 # "\t"
    const _NEWLINE:     int = 0x000A # "\n"
    const _SPACE:       int = 0x0020 # " "
    const _PAREN_OPEN:  int = 0x0028 # "("
    const _PAREN_CLOSE: int = 0x0029 # ")"
    const _COMMA:       int = 0x002C # ","
    const _PERIOD:      int = 0x002E # "."
    const _ZERO:        int = 0x0030 # "0"
    const _NINE:        int = 0x0039 # "9"
    const _CAPITAL_A:   int = 0x0041 # "A"
    const _CAPITAL_Z:   int = 0x005A # "Z"
    const _UNDERSCORE:  int = 0x005F # "_"
    const _SMALL_A:     int = 0x0061 # "a"
    const _SMALL_Z:     int = 0x007A # "z"

    var _source: String
    var _pos: int
    var _len: int

    func _init(source: String) -> void:
        _source = source
        _len = source.length()

    func scan() -> String:
        while _pos < _len:
            match _source.unicode_at(_pos):
                _TAB, _SPACE, _NEWLINE:
                    _pos += 1
                _PAREN_OPEN, _PAREN_CLOSE, _COMMA, _PERIOD:
                    _pos += 1
                    return _source[_pos - 1]
                _ when _is_identifier_start(_source.unicode_at(_pos)):
                    var from := _pos
                    while _pos < _len and _is_identifier_continue(_source.unicode_at(_pos)):
                        _pos += 1
                    return _source.substr(from, _pos - from)
                _ when _is_digit(_source.unicode_at(_pos)):
                    var from := _pos
                    while _pos < _len and _is_digit(_source.unicode_at(_pos)):
                        _pos += 1
                    return _source.substr(from, _pos - from)
                _:
                    _pos += 1
                    return '#Error|Invalid character "%s".' % _source[_pos - 1]
        return "#EOF"

    func _is_digit(c: int) -> bool:
        return _ZERO <= c and c <= _NINE

    func _is_identifier_start(c: int) -> bool:
        return c == _UNDERSCORE \
                or (_CAPITAL_A <= c and c <= _CAPITAL_Z) \
                or (_SMALL_A <= c and c <= _SMALL_Z)

    func _is_identifier_continue(c: int) -> bool:
        return c == _UNDERSCORE \
                or (_CAPITAL_A <= c and c <= _CAPITAL_Z) \
                or (_SMALL_A <= c and c <= _SMALL_Z) \
                or (_ZERO <= c and c <= _NINE)

func _run() -> void:
    var tokenizer := Tokenizer.new("box.move(10,0)")
    while true:
        var token := tokenizer.scan()
        print(token)
        if token == "#EOF":
            break

Example 2. Regular expression

@tool
extends EditorScript

func tokenize(source: String) -> Array[String]:
    var token_regex := RegEx.create_from_string(r"[A-Za-z_]\w*|\d+|\(|\)|\.|,|(.|\n)")
    var result: Array[String]
    for m in token_regex.search_all(source):
        var fallback := m.get_string(1)
        if not fallback.is_empty():
            if not fallback.strip_edges().is_empty():
                result.append('#Error|Invalid character "%s".' % fallback)
            continue
        result.append(m.get_string())
    result.append("#EOF")
    return result

func _run() -> void:
    for token in tokenize("box.move(10, 0)"):
        print(token)

Output

box
.
move
(
10
,
0
)
#EOF

1 reply

FriendlyGuardian Nov 17, 2024
Author

Thanks for taking the time to reply!
I see where I went wrong with regex from your example, so thanks for that! I believe that adding a built in feature for several delimiters would simplify the problem significantly and that these examples are proof that this problem is currently a little unnecessarily complex! There is a chance that I would want to add string literals in the near future and that would complicate the process!
Thanks again for taking the time to reply and give examples and I hope you have a fantastic day!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using multiple delimiters when string splitting to an array #11167

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Using multiple delimiters when string splitting to an array #11167

FriendlyGuardian Nov 15, 2024

Replies: 1 comment · 1 reply

dalexeev Nov 15, 2024 Collaborator

FriendlyGuardian Nov 17, 2024 Author

FriendlyGuardian
Nov 15, 2024

Replies: 1 comment 1 reply

dalexeev
Nov 15, 2024
Collaborator

FriendlyGuardian Nov 17, 2024
Author