Switch to a `string.replace` approach #50

swansontec · 2021-11-02T22:30:47Z

This approach is vastly simpler and faster, since it leverages the regular expression engine to to most of the work.

To handle comments inside of strings, simply search for strings as well, but then leave the strings as-is in the replacer function.

The benchmarks do run quite a bit faster:

strip JSON comments is 80% faster
strip JSON comments without whitespace is 180% faster
strip Big JSON comments is 50% faster

This approach is vastly simpler and faster, since it leverages the regular expression engine to to most of the work. To handle comments inside of strings, simply search for strings as well, but then leave the strings as-is in the replacer function. The benchmarks do run quite a bit faster: - strip JSON comments is 80% faster - strip JSON comments without whitespace is 180% faster - strip Big JSON comments is 50% faster

sindresorhus · 2021-11-03T07:34:06Z

While I appreciate the pull request, I am unfortunately not going to accept this for multiple reasons:

This package is already more than fast enough for what it's used for.
The parser already works well and is more maintainable.
Every change is a risk. There might be untested cases that the parser handles that the regex misses.
But most importantly, regexes have a lot of problems. I have gotten an endless amount of ReDoS vulnerability reports in the past year for seemingly simple regexes and I don't want to risk that here. I am actually often doing the opposite of what's done here, moving an existing regex to a simple parser.

swansontec · 2021-11-04T01:53:18Z

Thanks, I can appreciate this reasoning, and respect your decision.

However, regarding point 3, the current implementation still has at least two existing bugs not covered by tests:

The first bug is that malformed input like [] /* will be stripped to just [] , which is now valid JSON. This library should leave the invalid comment as-is, so users get a proper error.
The second bug is that a plain \r should terminate line comments (//), as it does in the ECMA-262 specification. Few modern editors use the \r line ending (it was an old Mac thing), but this is still a bug.

However, the new regular expression version covers these cases without any additional effort, because regular expressions are basically just a transliteration of comment grammar and sting grammar in those respective specs.

So, besides being shorter & simpler, this regular expression is actually safer and less bug-prone that what's already there.

Unfortunately, regular expressions can sometimes be slow. It's unfortunate that we live in a world where "bad performance" can now mean "security flaw", but here we are... Nevertheless, I did design the regular expression with performance in mind, so it doesn't contain any exponential cases. However, after digging deeper, it now seems that even quadratic slowdowns are getting the ReDoS label these days. I do see one case where that can happen, so I will go ahead and address that, even if you don't want to accept this PR.

swansontec · 2021-11-04T02:32:39Z

I have pushed a fixup commit, to document the needed changes.

This regular expression now executes in linear time regardless of input text.

Why? Well, once the engine picks one of the three paths (" | // | /*), it enters a state every possible input character (including end-of-string) is a valid path forward towards a match. Therefore, there will be no situation where we start down a path, realize it doesn't match at the end, and then back-track.

In the case of strings, I just made the final quote optional. That way, if it's missing, or if the last character in the file is a dangling \, we can still treat that as a string match. The three choices inside the string body are mutually exclusive (quote, backslash, or anything else), so the path forward is unambiguous with no need for internal back-tracking.

Line comments were already safe, since the final newline was never part of the match.

Block comments are basically the same as strings. As with strings, my only change was to make the final */ optional, so an unterminated block quote will just match to the end of the file. The two options inside the block quote are mutually exclusive (any non-star character, or a star followed by a non-slash character), so there is no internal back-tracking.

Since the regular expression now matches broken block quotes, I have to filter those out too, just like quoted strings. Icky, but necessary.

Finally, I added tests & benchmarks for these cases.

I know you don't plan to merge this, but I sill want it to be correct & of high quality.

sindresorhus · 2021-11-24T18:46:20Z

The first bug is that malformed input like [] /* will be stripped to just [] , which is now valid JSON. This library should leave the invalid comment as-is, so users get a proper error.

Thanks for catching that: #51

The second bug is that a plain \r should terminate line comments (//), as it does in the ECMA-262 specification. Few modern editors use the \r line ending (it was an old Mac thing), but this is still a bug.

\r is dead. I don't plan to add any fix/workaround for it.

sindresorhus · 2021-11-24T18:52:04Z

I'm happy to link to your version if you decide to decide to publish it as separate package :)

swansontec force-pushed the string-replace branch from 2e6e291 to 0835e97 Compare November 4, 2021 02:02

fixup! Switch to a string.replace approach

6e1aa5a

swansontec force-pushed the string-replace branch from 0835e97 to 6e1aa5a Compare November 4, 2021 02:14

sindresorhus closed this Nov 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to a `string.replace` approach #50

Switch to a `string.replace` approach #50

swansontec commented Nov 2, 2021

sindresorhus commented Nov 3, 2021

swansontec commented Nov 4, 2021 •

edited

Loading

swansontec commented Nov 4, 2021 •

edited

Loading

sindresorhus commented Nov 24, 2021

sindresorhus commented Nov 24, 2021

Switch to a string.replace approach #50

Switch to a string.replace approach #50

Conversation

swansontec commented Nov 2, 2021

sindresorhus commented Nov 3, 2021

swansontec commented Nov 4, 2021 • edited Loading

swansontec commented Nov 4, 2021 • edited Loading

sindresorhus commented Nov 24, 2021

sindresorhus commented Nov 24, 2021

Switch to a `string.replace` approach #50

Switch to a `string.replace` approach #50

swansontec commented Nov 4, 2021 •

edited

Loading

swansontec commented Nov 4, 2021 •

edited

Loading