-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to a string.replace
approach
#50
Conversation
This approach is vastly simpler and faster, since it leverages the regular expression engine to to most of the work. To handle comments inside of strings, simply search for strings as well, but then leave the strings as-is in the replacer function. The benchmarks do run quite a bit faster: - strip JSON comments is 80% faster - strip JSON comments without whitespace is 180% faster - strip Big JSON comments is 50% faster
While I appreciate the pull request, I am unfortunately not going to accept this for multiple reasons:
|
Thanks, I can appreciate this reasoning, and respect your decision. However, regarding point 3, the current implementation still has at least two existing bugs not covered by tests:
However, the new regular expression version covers these cases without any additional effort, because regular expressions are basically just a transliteration of comment grammar and sting grammar in those respective specs. So, besides being shorter & simpler, this regular expression is actually safer and less bug-prone that what's already there. Unfortunately, regular expressions can sometimes be slow. It's unfortunate that we live in a world where "bad performance" can now mean "security flaw", but here we are... Nevertheless, I did design the regular expression with performance in mind, so it doesn't contain any exponential cases. However, after digging deeper, it now seems that even quadratic slowdowns are getting the ReDoS label these days. I do see one case where that can happen, so I will go ahead and address that, even if you don't want to accept this PR. |
2e6e291
to
0835e97
Compare
0835e97
to
6e1aa5a
Compare
I have pushed a fixup commit, to document the needed changes. This regular expression now executes in linear time regardless of input text. Why? Well, once the engine picks one of the three paths ( In the case of strings, I just made the final quote optional. That way, if it's missing, or if the last character in the file is a dangling Line comments were already safe, since the final newline was never part of the match. Block comments are basically the same as strings. As with strings, my only change was to make the final Since the regular expression now matches broken block quotes, I have to filter those out too, just like quoted strings. Icky, but necessary. Finally, I added tests & benchmarks for these cases. I know you don't plan to merge this, but I sill want it to be correct & of high quality. |
Thanks for catching that: #51
|
I'm happy to link to your version if you decide to decide to publish it as separate package :) |
This approach is vastly simpler and faster, since it leverages the regular expression engine to to most of the work.
To handle comments inside of strings, simply search for strings as well, but then leave the strings as-is in the replacer function.
The benchmarks do run quite a bit faster: