Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP arguments protocol is lacking #11

Open
resolritter opened this issue Apr 16, 2021 · 8 comments
Open

TCP arguments protocol is lacking #11

resolritter opened this issue Apr 16, 2021 · 8 comments

Comments

@resolritter
Copy link

The implemented TCP protocol is

$TOKEN $PWD [...args]

The most glaring problem there is that $PWD can have spaces and that is not taken into account.

Further, I saw that the protocol accepts JSON although it's not documented.

core_d.js/lib/server.js

Lines 25 to 26 in bfe1f7c

if (data.substring(0, 1) === '{') {
return JSON.parse(data);

Yet another problem there: text is supposed to be included in the JSON payload, but it shouldn't; encoding thousands of lines to JSON is costly and there's no reason for it.

My suggestion is the following:

  • First line should have all the parameters
  • Second line and below is the text

i.e. parseData should be like

function parseData(data) {
  const newlineIndex = data.indexOf("\n")
  const payload = JSON.parse(data.slice(0, newlineIndex))
  const text = data.slice(newlineIndex + 1)
  return {
    payload,
    text,
  }
}

for the format

<payload>
<text>

as an example:

{ "cwd": "foo", "args": ["bar"] }
function sayHello() { console.log("hello") }
sayHello()

Summary of the proposal

  • Deprecate space-delimited protocol
  • Implement new protocol
    • All options should be as JSON object in the first line
    • Second line and below is reserved for text

Even in Bash it's pretty easy to escape the cwd for JSON, which would get rid of the spacing problem. Here's how I'm doing it in a script:

# escape a character of choice with '\'; outputs the result to $out
escape_string() {
  v="$1"
  c="$2"
  len=${#v}
  for ((i=0; i<$len; i++)); do
    if [ "${v:$i:1}" = "$c" ]; then
      v="${v:0:$i}\\${c}${v:$(( $i + 1 ))}"
      len=$(( $len + 1 ))
      i=$(( $i + 1 ))
    fi
  done
  out="$v"
}

cwd="$(dirname "$1")"
# escape all double quotes since they're used as delimiters in JSON
escape_string "$cwd" '"'
encoded_cwd="$out"

msg="{ \"cwd\": \"$encoded_cwd\", \"args\": [\"--stdin\"] }"

Upsides for JSON:

  • You can support any other options in the future without messing with the encoding
  • It's ubiquituous
  • Reasonably human-readable for a single line
@mantoni
Copy link
Owner

mantoni commented Apr 22, 2021

Thank you for raising this. You're right that it's unnecessary and inefficient to encode the text in JSON. Since this would be a breaking change, I'd like to take a step back and think about these alternatives:

1️⃣ We could get away with newline delimited plain text, like this:

<cwd>
<args>
<text>

The question is whether we'd encode args as JSON or leave that to clients using this library and pass the string as is.

2️⃣ We could skip parsing the payload entirely and just pass it on to service.invoke. This library doesn't need to understand the data at all.

What do you think?

@resolritter
Copy link
Author

resolritter commented Apr 23, 2021

1️⃣ Not encoding args as JSON has the same whitespace problem, unless you want to separate each in a line. Simplest thing I can think of is

<cwd>
<number of args>
<arg1>
<arg2>
...
<text>

e.g. 3 lines of arguments:

/home/user/project
3
--stdin
--lint
--configurationPath=/home/directory 1/directory 2/file
text

or for no arguments

/home/project
0
text

Looks somewhat janky, but you're right that it doesn't require encoding it as JSON, so that's a big plus.

2️⃣ seems like a good idea, although I wonder what's the point of the TCP connection then? Simply skipping file creation in the disk? Yeah it makes sense, have core_d only act as a "forwarder" to the service.

@mantoni
Copy link
Owner

mantoni commented Apr 23, 2021

Option 2️⃣ is becoming my preference. I think of core_d as a convenience utility that allows me to safely start and stop a daemon process for something like eslint_d. It doesn't need to specify a wire protocol really. Transporting the cwd, "some args" and "some body" should be good enough.

Would you be interested in contributing a change for this?

@resolritter
Copy link
Author

resolritter commented Apr 23, 2021

I am perhaps sensing some confusion here, so let's confirm the understanding first. My understanding is as follows:

Option 1

core_d adjusts the parsing

Protocol

<cwd>
<number of args>
<arg1>
<arg2>
...
<text>

e.g. 3 lines of arguments:

/home/user/project
3
--stdin
--lint
--configurationPath=/home/user/directory 2/file
text

or for no arguments

/home/project
0
text

Service

Stays the same since core_d is still doing the parsing.

function (cwd, args, text, mtime, cb) {
  // ...
}

Option 2

core_d gets rid of the parsing and simply sends the raw data into the service

Protocol

<raw_data>

Service

Dictates the protocol and parses it however it wants

function (raw_data, mtime, cb) {
  const { cwd, args, text, ...other } = parse(raw_data)
  // ...
}

Is that what you're thinking as well?

Also let's remember why JSON was proposed in the first place: it gets rid of whitespace ambiguity for separating the components. Separating them into newlines achieves the same.

Transporting the cwd, "some args" and "some body" should be good enough.

If you want any upfront parsing to remain before sending data to the service, then that's not the "Option 2" I am thinking of.

@mantoni
Copy link
Owner

mantoni commented Apr 24, 2021

I see. My proposal is a solution that is a mix of both. I'd change the protocol to be:

<cwd>
<args>
<body>

The first line is the current working directory, as you stated, we would get rid of the whitespace problem.

The second line would be arguments, can be blank, and it's up to the service to parse it. It could be JSON, but core_d would not try to parse it.

The remaining text is the body, also passed on without looking at it.

Does that make sense? Are am I missing something?

@resolritter
Copy link
Author

I understand your idea now. It's similar to the one originally proposed in #11, laid out again for comparison:

Protocol

<payload>
<text>

example:

{ "cwd": "foo", "args": ["bar"] }
function sayHello() { console.log("hello") }
sayHello()

I don't get why cwd gets its own line in your idea, though. Why not encode it in the payload line too like shown above? Well, either way is fine, I suppose.


This did give me an idea for an Option 3 which is a simplification of Option 1 from #11.

Option 3

First line tells to expect N next argument lines, followed by N argument lines, then the text comes next. It would get rid of the <cwd> line from Option 1 because it can also be put in the arguments.

Protocol

<number of args>
<arg1>
<arg2>
...
<text>

example (note --cwd=)

2
--cwd=/path/to/directory
--lint
console.log("foo")

or for no arguments

0
console.log("foo")

Service

function (argsLines, mtime, cb) {
  const { cwd, lint, ...other } = parse(argsLines)
  // ...
}

I like Option 3 better since not every service is obligated to use a cwd. It's not limiting because "argument lines" can have whatever the service wants - core_d would not look at it, only split the lines before sending it to the service. If the service wants, It could even be:

2
/path/to/directory
--lint --check --fast
console.log("hi")

i.e. it would allow for your idea as well.

I like this Option 3 the most, although I can also accept Option 2 from #11. The other ones are not as "clean" as those IMO.

@mantoni
Copy link
Owner

mantoni commented Apr 25, 2021

I like where this collaborative idea development is going 👍

I'd like to keep the cwd an explicit part of the protocol because eslint_d sets it itself.

I like the idea of placing the additional arguments on separate lines. That seems cleaner to me too. However, I'm not a fan of specifying the count of argument lines upfront. Could we separate the arguments from the body with a blank line maybe?

<cwd>
<arg1>
<arg2>

<body>

@resolritter
Copy link
Author

Yeah, separating them with a blank line looks better than specifying the line count. I'd agree with that format as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants