Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor command-line handling #335

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Refactor command-line handling #335

wants to merge 7 commits into from

Conversation

Yoric
Copy link
Contributor

@Yoric Yoric commented Jul 19, 2022

The objective here is to progressively make command-line more extensible, more orthogonal and less hacky.

This refactoring introduces a new class Lexer designed to aid with tokenizing/validating the command-line without losing data. Further down the line, this should help replace the big if ... else and hardcoded help in CommandHandler.ts with something closer to e.g. argparse.

Note: It's entirely possible that I've been heading the wrong direction and that I should have headed straight for e.g. https://www.npmjs.com/package/argparse .

@Yoric Yoric requested a review from Gnuxie July 19, 2022 16:32
@@ -14,6 +14,7 @@
"start:dev": "yarn build && node --async-stack-traces lib/index.js",
"test": "ts-mocha --project ./tsconfig.json test/commands/**/*.ts",
"test:integration": "NODE_ENV=harness ts-mocha --async-stack-traces --require test/integration/fixtures.ts --timeout 300000 --project ./tsconfig.json \"test/integration/**/*Test.ts\"",
"test:YORIC": "NODE_ENV=harness ts-mocha --async-stack-traces --require test/integration/fixtures.ts --timeout 300000 --project ./tsconfig.json \"test/integration/roomMembersTest.ts\"",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oopsie

Comment on lines 9 to 21
const WHITESPACE = /\s+/;
const COMMAND = /![a-zA-Z_]+/;
const IDENTIFIER = /[a-zA-Z_]+/;
const USER_ID = /@[a-zA-Z0-9_.=\-/]+:\S+/;
const GLOB_USER_ID = /@[a-zA-Z0-9_.=\-?*/]+:\S+/;
const ROOM_ID = /![a-zA-Z0-9_.=\-/]+:\S+/;
const ROOM_ALIAS = /#[a-zA-Z0-9_.=\-/]+:\S+/;
const ROOM_ALIAS_OR_ID = /[#!][a-zA-Z0-9_.=\-/]+:\S+/;
const INT = /[+-]?[0-9]+/;
const STRING = /"((?:\\"|[^\r\n])*)"/;
const DATE_OR_DURATION = /(?:"([^"]+)")|([^"]\S+)/;
const STAR = /\*/;
const ETC = /.*$/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep these separate from the lexer rules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda expect that we're going to need some of these regexps in other places, but I may be wrong.

src/commands/Lexer.ts Outdated Show resolved Hide resolved
src/commands/Lexer.ts Outdated Show resolved Hide resolved
// We do **not** want to parse `#` as a comment start, though.
const tokens = tokenize(cmd.replace("#", "\\#")).slice(/* get rid of ["!mjolnir", command] */ 2);
const lexer = new Lexer(line);
lexer.token("command"); // Consume `!mjolnir`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when the lexer fails then? how different is it from what happens now?

Comment on lines 56 to 129
return await execBanCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'unban' && parts.length > 2) {
} else if (cmd === 'unban' && parts.length > 2) {
return await execUnbanCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'rules' && parts.length === 4 && parts[2] === 'matching') {
} else if (cmd === 'rules' && parts.length === 4 && parts[2] === 'matching') {
return await execRulesMatchingCommand(roomId, event, mjolnir, parts[3])
} else if (parts[1] === 'rules') {
} else if (cmd === 'rules') {
return await execDumpRulesCommand(roomId, event, mjolnir);
} else if (parts[1] === 'sync') {
} else if (cmd === 'sync') {
return await execSyncCommand(roomId, event, mjolnir);
} else if (parts[1] === 'verify') {
} else if (cmd === 'verify') {
return await execPermissionCheckCommand(roomId, event, mjolnir);
} else if (parts.length >= 5 && parts[1] === 'list' && parts[2] === 'create') {
} else if (parts.length >= 5 && cmd === 'list' && parts[2] === 'create') {
return await execCreateListCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'watch' && parts.length > 1) {
} else if (cmd === 'watch' && parts.length > 1) {
return await execWatchCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'unwatch' && parts.length > 1) {
} else if (cmd === 'unwatch' && parts.length > 1) {
return await execUnwatchCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'redact' && parts.length > 1) {
} else if (cmd === 'redact' && parts.length > 1) {
return await execRedactCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'import' && parts.length > 2) {
} else if (cmd === 'import' && parts.length > 2) {
return await execImportCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'default' && parts.length > 2) {
} else if (cmd === 'default' && parts.length > 2) {
return await execSetDefaultListCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'deactivate' && parts.length > 2) {
} else if (cmd === 'deactivate' && parts.length > 2) {
return await execDeactivateCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'protections') {
} else if (cmd === 'protections') {
return await execListProtections(roomId, event, mjolnir, parts);
} else if (parts[1] === 'enable' && parts.length > 1) {
} else if (cmd === 'enable' && parts.length > 1) {
return await execEnableProtection(roomId, event, mjolnir, parts);
} else if (parts[1] === 'disable' && parts.length > 1) {
} else if (cmd === 'disable' && parts.length > 1) {
return await execDisableProtection(roomId, event, mjolnir, parts);
} else if (parts[1] === 'config' && parts[2] === 'set' && parts.length > 3) {
} else if (cmd === 'config' && parts[2] === 'set' && parts.length > 3) {
return await execConfigSetProtection(roomId, event, mjolnir, parts.slice(3))
} else if (parts[1] === 'config' && parts[2] === 'add' && parts.length > 3) {
} else if (cmd === 'config' && parts[2] === 'add' && parts.length > 3) {
return await execConfigAddProtection(roomId, event, mjolnir, parts.slice(3))
} else if (parts[1] === 'config' && parts[2] === 'remove' && parts.length > 3) {
} else if (cmd === 'config' && parts[2] === 'remove' && parts.length > 3) {
return await execConfigRemoveProtection(roomId, event, mjolnir, parts.slice(3))
} else if (parts[1] === 'config' && parts[2] === 'get') {
} else if (cmd === 'config' && parts[2] === 'get') {
return await execConfigGetProtection(roomId, event, mjolnir, parts.slice(3))
} else if (parts[1] === 'rooms' && parts.length > 3 && parts[2] === 'add') {
} else if (cmd === 'rooms' && parts.length > 3 && parts[2] === 'add') {
return await execAddProtectedRoom(roomId, event, mjolnir, parts);
} else if (parts[1] === 'rooms' && parts.length > 3 && parts[2] === 'remove') {
} else if (cmd === 'rooms' && parts.length > 3 && parts[2] === 'remove') {
return await execRemoveProtectedRoom(roomId, event, mjolnir, parts);
} else if (parts[1] === 'rooms' && parts.length === 2) {
} else if (cmd === 'rooms' && parts.length === 2) {
return await execListProtectedRooms(roomId, event, mjolnir);
} else if (parts[1] === 'move' && parts.length > 3) {
} else if (cmd === 'move' && parts.length > 3) {
return await execMoveAliasCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'directory' && parts.length > 3 && parts[2] === 'add') {
} else if (cmd === 'directory' && parts.length > 3 && parts[2] === 'add') {
return await execAddRoomToDirectoryCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'directory' && parts.length > 3 && parts[2] === 'remove') {
} else if (cmd === 'directory' && parts.length > 3 && parts[2] === 'remove') {
return await execRemoveRoomFromDirectoryCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'alias' && parts.length > 4 && parts[2] === 'add') {
} else if (cmd === 'alias' && parts.length > 4 && parts[2] === 'add') {
return await execAddAliasCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'alias' && parts.length > 3 && parts[2] === 'remove') {
} else if (cmd === 'alias' && parts.length > 3 && parts[2] === 'remove') {
return await execRemoveAliasCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'resolve' && parts.length > 2) {
} else if (cmd === 'resolve' && parts.length > 2) {
return await execResolveCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'powerlevel' && parts.length > 3) {
} else if (cmd === 'powerlevel' && parts.length > 3) {
return await execSetPowerLevelCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'shutdown' && parts[2] === 'room' && parts.length > 3) {
} else if (cmd === 'shutdown' && parts[2] === 'room' && parts.length > 3) {
return await execShutdownRoomCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'since') {
return await execSinceCommand(roomId, event, mjolnir, tokens);
} else if (parts[1] === 'kick' && parts.length > 2) {
} else if (cmd === 'since') {
return await execSinceCommand(roomId, event, mjolnir, lexer);
} else if (cmd === 'kick' && parts.length > 2) {
return await execKickCommand(roomId, event, mjolnir, parts);
} else if (parts[1] === 'make' && parts[2] === 'admin' && parts.length > 3) {
} else if (cmd === 'make' && parts[2] === 'admin' && parts.length > 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we should certainly figure out how to get rid of this giant switch statement and use a table to dispatch commands instead. We'd also be able to isolate commands and stop editing this file each time we need to add/remove one too by doing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, that's the objective of the refactoring.

});
if ("error" in maxEntriesResult) {
return maxEntriesResult;
async function execSinceCommandAux(destinationRoomId: string, event: any, mjolnir: Mjolnir, lexer: Lexer): Promise<Result<undefined>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than passing the lexer to commands, wouldn't it be better if commands declared what arguments they are expecting (can be part of the same api to register with the handler)? This means that they don't need to know implementation detail of the lexer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to move towards what you suggest. I'm not sure I'll succeed in a single pass, though.

@Yoric Yoric marked this pull request as draft July 22, 2022 11:49
}

const body = content['body'];
const lowercaseBody = body.toLowerCase();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to do that? Did we do it before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's for backwards compatibility.

Comment on lines +342 to +355
let longestLength = -1;
for (let command of this.commands) {
if (command.command.length > longestLength
&& line.startsWith(command.command)) {
if (command.accept) {
let lexer = new Lexer(line.substring(command.command.length));
if (!command.accept(lexer)) {
continue;
}
}
longestLength = command.command.length;
cmd = command;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this though? isn't this what commandsPerCommand is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I should remove commandsPerCommand. Turns out that we cannot do anything so nice because we have distinct commands with the same name (but different help / behavior), so a command is not a single word but rather a prefix.

Comment on lines +110 to +141
let reason = "";
do {

let token = lexer.alternatives(
// Room
() => lexer.token("STAR"),
() => lexer.token("roomAliasOrID"),
// Reason
() => lexer.token("string"),
() => lexer.token("ETC")
);
if (!token || token.type === "EOF") {
// We have reached the end of rooms, no reason.
break;
} else if (token.type === "STAR") {
for (let roomId of Object.keys(mjolnir.protectedRooms)) {
rooms.add(roomId);
continue;
}
// If we reach this step, it's not a room, so it must be a reason.
// All further arguments are now part of `reason`.
reasonParts = [];
continue;
} else if (token.type === "roomAliasOrID") {
const roomId = await mjolnir.client.resolveRoom(token.text);
if (!(roomId in mjolnir.protectedRooms)) {
return mjolnir.logMessage(LogLevel.WARN, "SinceCommand", `This room is not protected: ${htmlEscape(roomId)}.`);
}
rooms.add(roomId);
continue;
} else if (token.type === "string" || token.type === "ETC") {
// We have reached the end of rooms with a reason.
reason = token.text;
break;
}
reasonParts.push(maybeRoom);
}

} while(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem right to do have to do this for each command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the longer term, I'd like to be able to specify a grammar for each command, but I'm not even sure of the shape of the DSL we'd need to specify such a grammar, so I'm planning to postpone this to a followup PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attempt is good and yes there is progress in that we have a table instead of a giant switch (supposedly).
There is also some reduced parsing overhead by having the defined tokens.

However this is still incredibly hacky and maybe even less intuitive than before, so i don't think it's right to merge and follow up.

describeCommand( {
    nameParts: string[], // e.g. ["rules", "matching"]
    arguments: Descriptions[], // Something we can generate the parsing from or just accept a callback to parse an argument instead
    cb: (commandRoomId: string, mjolnir: Mjolnir, event, ...arguments) => Promise<void>
}

The DSL doesn't have to be complicated I was imagining something like this but i haven't played around with it. I don't think it matters if we break some backwards incompatibilities since it probably is going to be worth the cost. Though maybe it is worth looking at some other bridges or bots to see what they do?

Copy link
Contributor Author

@Yoric Yoric Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it gets complicated very quick. But I'll see if I can make something higher-level.

In this case, the grammar really is something along the lines of

(roomAliasOrID || STAR)+ (string || ETC)

so I guess we could write a DSL to express this as

{ // Note: In JavaScript, field order is meaningful.
  room: {
      grammar: optional(or([Token.roomAliasOrID, Token.STAR])),
      description: "The room in which the operation takes place, or `*` for all protected rooms."
  }
  reason: {
      grammar: or([Token.string, Token.ETC]),
      description: "A human-readable reason for the operation, for audit purposes"
  }
}

Would you find this readable?

Note that this is not sufficient to express subcommands such as config get/config set that have entirely different help messages, so I'd need to come up with something to handle these.

I don't like breaking compatibility, in case some other bots have been designed to communicate with Mjölnir with commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bots should not be communicating with Mjolnir via commands, not just because it's not meant for that but just because it'd be a really inefficient indirect way of achieving something (that is probably simple).

Note that this is not sufficient to express subcommands such as config get/config set that have entirely different help messages, so I'd need to come up with something to handle these.

Well your example only shows arguments and not the whole command, but nameParts in the example above was meant for this situation e.g. ["config", "get"] & ["config", "set"]. Though that might not work.

I don't think the arguments should be expressed in terms of lexer tokens but what they actually should be e.g. we could say we want a user id but we probably won't want the raw string token but an instance of matrix-bot-sdk's UserID class (not sure this example is realistic enough, but you get the point that there is a difference).
We will also probably will need to account for a situation where we want parse an argument ourselves using a callback (though with the lexer library this probably means parsing a token? rather than composing something from a token stream)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants