Refactor command-line handling #335

Yoric · 2022-07-19T16:32:27Z

The objective here is to progressively make command-line more extensible, more orthogonal and less hacky.

This refactoring introduces a new class Lexer designed to aid with tokenizing/validating the command-line without losing data. Further down the line, this should help replace the big if ... else and hardcoded help in CommandHandler.ts with something closer to e.g. argparse.

Note: It's entirely possible that I've been heading the wrong direction and that I should have headed straight for e.g. https://www.npmjs.com/package/argparse .

Gnuxie · 2022-07-20T09:09:32Z

package.json

@@ -14,6 +14,7 @@
    "start:dev": "yarn build && node --async-stack-traces lib/index.js",
    "test": "ts-mocha --project ./tsconfig.json test/commands/**/*.ts",
    "test:integration": "NODE_ENV=harness ts-mocha --async-stack-traces --require test/integration/fixtures.ts --timeout 300000 --project ./tsconfig.json \"test/integration/**/*Test.ts\"",
+    "test:YORIC": "NODE_ENV=harness ts-mocha --async-stack-traces --require test/integration/fixtures.ts --timeout 300000 --project ./tsconfig.json \"test/integration/roomMembersTest.ts\"",


Gnuxie · 2022-07-20T09:18:53Z

src/commands/Lexer.ts

+const WHITESPACE = /\s+/;
+const COMMAND = /![a-zA-Z_]+/;
+const IDENTIFIER = /[a-zA-Z_]+/;
+const USER_ID = /@[a-zA-Z0-9_.=\-/]+:\S+/;
+const GLOB_USER_ID = /@[a-zA-Z0-9_.=\-?*/]+:\S+/;
+const ROOM_ID = /![a-zA-Z0-9_.=\-/]+:\S+/;
+const ROOM_ALIAS = /#[a-zA-Z0-9_.=\-/]+:\S+/;
+const ROOM_ALIAS_OR_ID = /[#!][a-zA-Z0-9_.=\-/]+:\S+/;
+const INT = /[+-]?[0-9]+/;
+const STRING = /"((?:\\"|[^\r\n])*)"/;
+const DATE_OR_DURATION = /(?:"([^"]+)")|([^"]\S+)/;
+const STAR = /\*/;
+const ETC = /.*$/;


Why keep these separate from the lexer rules?

I kinda expect that we're going to need some of these regexps in other places, but I may be wrong.

src/commands/Lexer.ts

Gnuxie · 2022-07-20T10:21:17Z

src/commands/CommandHandler.ts

-    // We do **not** want to parse `#` as a comment start, though.
-    const tokens = tokenize(cmd.replace("#", "\\#")).slice(/* get rid of ["!mjolnir", command] */ 2);
+    const lexer = new Lexer(line);
+    lexer.token("command"); // Consume `!mjolnir`.


what happens when the lexer fails then? how different is it from what happens now?

Gnuxie · 2022-07-20T10:23:10Z

src/commands/CommandHandler.ts

            return await execBanCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'unban' && parts.length > 2) {
+        } else if (cmd === 'unban' && parts.length > 2) {
            return await execUnbanCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'rules' && parts.length === 4 && parts[2] === 'matching') {
+        } else if (cmd === 'rules' && parts.length === 4 && parts[2] === 'matching') {
            return await execRulesMatchingCommand(roomId, event, mjolnir, parts[3])
-        } else if (parts[1] === 'rules') {
+        } else if (cmd === 'rules') {
            return await execDumpRulesCommand(roomId, event, mjolnir);
-        } else if (parts[1] === 'sync') {
+        } else if (cmd === 'sync') {
            return await execSyncCommand(roomId, event, mjolnir);
-        } else if (parts[1] === 'verify') {
+        } else if (cmd === 'verify') {
            return await execPermissionCheckCommand(roomId, event, mjolnir);
-        } else if (parts.length >= 5 && parts[1] === 'list' && parts[2] === 'create') {
+        } else if (parts.length >= 5 && cmd === 'list' && parts[2] === 'create') {
            return await execCreateListCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'watch' && parts.length > 1) {
+        } else if (cmd === 'watch' && parts.length > 1) {
            return await execWatchCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'unwatch' && parts.length > 1) {
+        } else if (cmd === 'unwatch' && parts.length > 1) {
            return await execUnwatchCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'redact' && parts.length > 1) {
+        } else if (cmd === 'redact' && parts.length > 1) {
            return await execRedactCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'import' && parts.length > 2) {
+        } else if (cmd === 'import' && parts.length > 2) {
            return await execImportCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'default' && parts.length > 2) {
+        } else if (cmd === 'default' && parts.length > 2) {
            return await execSetDefaultListCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'deactivate' && parts.length > 2) {
+        } else if (cmd === 'deactivate' && parts.length > 2) {
            return await execDeactivateCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'protections') {
+        } else if (cmd === 'protections') {
            return await execListProtections(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'enable' && parts.length > 1) {
+        } else if (cmd === 'enable' && parts.length > 1) {
            return await execEnableProtection(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'disable' && parts.length > 1) {
+        } else if (cmd === 'disable' && parts.length > 1) {
            return await execDisableProtection(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'config' && parts[2] === 'set' && parts.length > 3) {
+        } else if (cmd === 'config' && parts[2] === 'set' && parts.length > 3) {
            return await execConfigSetProtection(roomId, event, mjolnir, parts.slice(3))
-        } else if (parts[1] === 'config' && parts[2] === 'add' && parts.length > 3) {
+        } else if (cmd === 'config' && parts[2] === 'add' && parts.length > 3) {
            return await execConfigAddProtection(roomId, event, mjolnir, parts.slice(3))
-        } else if (parts[1] === 'config' && parts[2] === 'remove' && parts.length > 3) {
+        } else if (cmd === 'config' && parts[2] === 'remove' && parts.length > 3) {
            return await execConfigRemoveProtection(roomId, event, mjolnir, parts.slice(3))
-        } else if (parts[1] === 'config' && parts[2] === 'get') {
+        } else if (cmd === 'config' && parts[2] === 'get') {
            return await execConfigGetProtection(roomId, event, mjolnir, parts.slice(3))
-        } else if (parts[1] === 'rooms' && parts.length > 3 && parts[2] === 'add') {
+        } else if (cmd === 'rooms' && parts.length > 3 && parts[2] === 'add') {
            return await execAddProtectedRoom(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'rooms' && parts.length > 3 && parts[2] === 'remove') {
+        } else if (cmd === 'rooms' && parts.length > 3 && parts[2] === 'remove') {
            return await execRemoveProtectedRoom(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'rooms' && parts.length === 2) {
+        } else if (cmd === 'rooms' && parts.length === 2) {
            return await execListProtectedRooms(roomId, event, mjolnir);
-        } else if (parts[1] === 'move' && parts.length > 3) {
+        } else if (cmd === 'move' && parts.length > 3) {
            return await execMoveAliasCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'directory' && parts.length > 3 && parts[2] === 'add') {
+        } else if (cmd === 'directory' && parts.length > 3 && parts[2] === 'add') {
            return await execAddRoomToDirectoryCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'directory' && parts.length > 3 && parts[2] === 'remove') {
+        } else if (cmd === 'directory' && parts.length > 3 && parts[2] === 'remove') {
            return await execRemoveRoomFromDirectoryCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'alias' && parts.length > 4 && parts[2] === 'add') {
+        } else if (cmd === 'alias' && parts.length > 4 && parts[2] === 'add') {
            return await execAddAliasCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'alias' && parts.length > 3 && parts[2] === 'remove') {
+        } else if (cmd === 'alias' && parts.length > 3 && parts[2] === 'remove') {
            return await execRemoveAliasCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'resolve' && parts.length > 2) {
+        } else if (cmd === 'resolve' && parts.length > 2) {
            return await execResolveCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'powerlevel' && parts.length > 3) {
+        } else if (cmd === 'powerlevel' && parts.length > 3) {
            return await execSetPowerLevelCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'shutdown' && parts[2] === 'room' && parts.length > 3) {
+        } else if (cmd === 'shutdown' && parts[2] === 'room' && parts.length > 3) {
            return await execShutdownRoomCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'since') {
-            return await execSinceCommand(roomId, event, mjolnir, tokens);
-        } else if (parts[1] === 'kick' && parts.length > 2) {
+        } else if (cmd === 'since') {
+            return await execSinceCommand(roomId, event, mjolnir, lexer);
+        } else if (cmd === 'kick' && parts.length > 2) {
            return await execKickCommand(roomId, event, mjolnir, parts);
-        } else if (parts[1] === 'make' && parts[2] === 'admin' && parts.length > 3) {
+        } else if (cmd === 'make' && parts[2] === 'admin' && parts.length > 3) {


Well, we should certainly figure out how to get rid of this giant switch statement and use a table to dispatch commands instead. We'd also be able to isolate commands and stop editing this file each time we need to add/remove one too by doing this.

Absolutely, that's the objective of the refactoring.

Gnuxie · 2022-07-20T10:24:53Z

src/commands/SinceCommand.ts

-    });
-    if ("error" in maxEntriesResult) {
-        return maxEntriesResult;
+async function execSinceCommandAux(destinationRoomId: string, event: any, mjolnir: Mjolnir, lexer: Lexer): Promise<Result<undefined>> {


rather than passing the lexer to commands, wouldn't it be better if commands declared what arguments they are expecting (can be part of the same api to register with the handler)? This means that they don't need to know implementation detail of the lexer

I would like to move towards what you suggest. I'm not sure I'll succeed in a single pass, though.

Gnuxie · 2022-07-25T10:39:48Z

src/commands/Command.ts

+            }
+
+            const body = content['body'];
+            const lowercaseBody = body.toLowerCase();


Are we sure we want to do that? Did we do it before?

Yes, that's for backwards compatibility.

Gnuxie · 2022-07-25T10:49:05Z

src/commands/Command.ts

+                let longestLength = -1;
+                for (let command of this.commands) {
+                    if (command.command.length > longestLength
+                        && line.startsWith(command.command)) {
+                        if (command.accept) {
+                            let lexer = new Lexer(line.substring(command.command.length));
+                            if (!command.accept(lexer)) {
+                                continue;
+                            }
+                        }
+                        longestLength = command.command.length;
+                        cmd = command;
+                    }
+                }


Why do we do this though? isn't this what commandsPerCommand is for?

Ah, I should remove commandsPerCommand. Turns out that we cannot do anything so nice because we have distinct commands with the same name (but different help / behavior), so a command is not a single word but rather a prefix.

Gnuxie · 2022-07-25T10:53:17Z

src/commands/SinceCommand.ts

+    let reason = "";
+    do {
+
+        let token = lexer.alternatives(
+            // Room
+            () => lexer.token("STAR"),
+            () => lexer.token("roomAliasOrID"),
+            // Reason
+            () => lexer.token("string"),
+            () => lexer.token("ETC")
+        );
+        if (!token || token.type === "EOF") {
+            // We have reached the end of rooms, no reason.
+            break;
+        } else if (token.type === "STAR") {
+            for (let roomId of Object.keys(mjolnir.protectedRooms)) {
                rooms.add(roomId);
-                continue;
            }
-            // If we reach this step, it's not a room, so it must be a reason.
-            // All further arguments are now part of `reason`.
-            reasonParts = [];
+            continue;
+        } else if (token.type === "roomAliasOrID") {
+            const roomId = await mjolnir.client.resolveRoom(token.text);
+            if (!(roomId in mjolnir.protectedRooms)) {
+                return mjolnir.logMessage(LogLevel.WARN, "SinceCommand", `This room is not protected: ${htmlEscape(roomId)}.`);
+            }
+            rooms.add(roomId);
+            continue;
+        } else if (token.type === "string" || token.type === "ETC") {
+            // We have reached the end of rooms with a reason.
+            reason = token.text;
+            break;
        }
-        reasonParts.push(maybeRoom);
-    }
-
+    } while(true);


It doesn't seem right to do have to do this for each command

In the longer term, I'd like to be able to specify a grammar for each command, but I'm not even sure of the shape of the DSL we'd need to specify such a grammar, so I'm planning to postpone this to a followup PR.

The attempt is good and yes there is progress in that we have a table instead of a giant switch (supposedly).
There is also some reduced parsing overhead by having the defined tokens.

However this is still incredibly hacky and maybe even less intuitive than before, so i don't think it's right to merge and follow up.

describeCommand( { nameParts: string[], // e.g. ["rules", "matching"] arguments: Descriptions[], // Something we can generate the parsing from or just accept a callback to parse an argument instead cb: (commandRoomId: string, mjolnir: Mjolnir, event, ...arguments) => Promise<void> }

The DSL doesn't have to be complicated I was imagining something like this but i haven't played around with it. I don't think it matters if we break some backwards incompatibilities since it probably is going to be worth the cost. Though maybe it is worth looking at some other bridges or bots to see what they do?

Unfortunately, it gets complicated very quick. But I'll see if I can make something higher-level.

In this case, the grammar really is something along the lines of

(roomAliasOrID || STAR)+ (string || ETC)

so I guess we could write a DSL to express this as

{ // Note: In JavaScript, field order is meaningful. room: { grammar: optional(or([Token.roomAliasOrID, Token.STAR])), description: "The room in which the operation takes place, or `*` for all protected rooms." } reason: { grammar: or([Token.string, Token.ETC]), description: "A human-readable reason for the operation, for audit purposes" } }

Would you find this readable?

Note that this is not sufficient to express subcommands such as config get/config set that have entirely different help messages, so I'd need to come up with something to handle these.

I don't like breaking compatibility, in case some other bots have been designed to communicate with Mjölnir with commands.

Bots should not be communicating with Mjolnir via commands, not just because it's not meant for that but just because it'd be a really inefficient indirect way of achieving something (that is probably simple).

Note that this is not sufficient to express subcommands such as config get/config set that have entirely different help messages, so I'd need to come up with something to handle these.

Well your example only shows arguments and not the whole command, but nameParts in the example above was meant for this situation e.g. ["config", "get"] & ["config", "set"]. Though that might not work.

I don't think the arguments should be expressed in terms of lexer tokens but what they actually should be e.g. we could say we want a user id but we probably won't want the raw string token but an instance of matrix-bot-sdk's UserID class (not sure this example is realistic enough, but you get the point that there is a difference).
We will also probably will need to account for a situation where we want parse an argument ourselves using a callback (though with the lexer library this probably means parsing a token? rather than composing something from a token stream)

Yoric added 5 commits July 19, 2022 14:28

Reworking CLI tokenization

5395a66

WIP

7b094f3

WIP

fb54799

WIP

3cb4ffc

WIP

da4edb8

Yoric requested a review from Gnuxie July 19, 2022 16:32

Gnuxie reviewed Jul 20, 2022

View reviewed changes

Yoric added 2 commits July 22, 2022 11:33

WIP

0cc6862

WIP

a9d3c2b

Yoric marked this pull request as draft July 22, 2022 11:49

Gnuxie reviewed Jul 25, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor command-line handling #335

Refactor command-line handling #335

Yoric commented Jul 19, 2022

Gnuxie Jul 20, 2022

Gnuxie Jul 20, 2022

Yoric Jul 22, 2022

Gnuxie Jul 20, 2022

Gnuxie Jul 20, 2022

Yoric Jul 22, 2022

Gnuxie Jul 20, 2022

Yoric Jul 22, 2022

Gnuxie Jul 25, 2022

Yoric Jul 25, 2022

Gnuxie Jul 25, 2022

Yoric Jul 25, 2022

Gnuxie Jul 25, 2022

Yoric Jul 25, 2022

Gnuxie Jul 25, 2022

Yoric Jul 26, 2022 •

edited

Loading

Gnuxie Jul 26, 2022

Refactor command-line handling #335

Are you sure you want to change the base?

Refactor command-line handling #335

Conversation

Yoric commented Jul 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yoric Jul 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yoric Jul 26, 2022 •

edited

Loading