diff --git a/README.md b/README.md index 5436c82..7aae76e 100644 --- a/README.md +++ b/README.md @@ -194,9 +194,9 @@ Sets the JavaScript language version for generated patterns and flags. Later tar ## ✅ Supported features -Following are the supported features by target. Targets `ES2024` and `ESNext` have the same emulation capabilities, although resulting regexes might differ (though not in the strings they match). +Following are the supported features by target. Targets `ES2024` and `ESNext` have the same emulation capabilities (resulting regexes might differ, but not in the strings they match). -Notice that nearly every feature has at least subtle differences from JavaScript. Some features and sub-features listed as unsupported can be added in future versions, but some are not emulatable with native JavaScript regexes. Unsupported features throw an error. +Notice that nearly every feature has at least subtle differences from JavaScript. Some features and subfeatures listed as unsupported can be added in future versions, but some are not emulatable with native JavaScript regexes. Unsupported features throw an error. @@ -264,7 +264,7 @@ Notice that nearly every feature has at least subtle differences from JavaScript - + @@ -281,11 +281,20 @@ Notice that nearly every feature has at least subtle differences from JavaScript - + + + + + + + + @@ -299,9 +308,9 @@ Notice that nearly every feature has at least subtle differences from JavaScript @@ -310,7 +319,7 @@ Notice that nearly every feature has at least subtle differences from JavaScript @@ -321,7 +330,7 @@ Notice that nearly every feature has at least subtle differences from JavaScript @@ -332,7 +341,7 @@ Notice that nearly every feature has at least subtle differences from JavaScript @@ -342,7 +351,7 @@ Notice that nearly every feature has at least subtle differences from JavaScript @@ -452,7 +461,7 @@ Notice that nearly every feature has at least subtle differences from JavaScript @@ -466,12 +475,11 @@ Notice that nearly every feature has at least subtle differences from JavaScript - + @@ -558,7 +566,7 @@ Notice that nearly every feature has at least subtle differences from JavaScript @@ -671,12 +679,32 @@ Notice that nearly every feature has at least subtle differences from JavaScript + + + + + + + + + + + + + + + + + + @@ -745,13 +773,9 @@ Notice that nearly every feature has at least subtle differences from JavaScript ✔ Error; not passed through
- - - -
CharactersCharacters Literal E, ! ✔ Different allowed set than JS
- ✔ Invalid for multibyte chars
+ ✔ Error for multibyte chars
Char escapesEscaped metachar\\, \. + ✔ Same as JS
+
Shorthand \t - ✔ 1 hex digit \xF
- ✔ 2 hex digits with max value \x7F (unlike JS)
- ✔ Incomplete \x is invalid (like JS with flag u, v)
+ ✔ Allows 1 hex digit
+ ✔ Error for 2 hex digits above 7F
+ ✔ Error for incomplete \x (like JS with flag u, v)
- ✔ Incomplete \u is invalid (like JS with flag u, v)
+ ✔ Error for incomplete \u (like JS with flag u, v)
✔ Allows whitespace padding
✔ Allows leading 0s up to 6 total hex digits (JS allows unlimited)
- ✔ Incomplete \u{ is invalid (like JS with flag u, v)
+ ✔ Error for incomplete \u{ (like JS with flag u, v)
✔ Can be backref, error, null, octal, identity escape, or any of these combined with literal digits, based on complex rules that differ from JS
✔ Always handles escaped single digit 1-9 outside char class as backref
- ✔ Allows null with 1-3 0s (unlike JS in any mode)
+ ✔ Allows null with 1-3 0s (unlike JS)
✔ With A-Za-z (JS: only \c form)
- ✔ Incomplete \c is invalid (like JS with flag u, v)
+ ✔ Error for incomplete \c (like JS with flag u, v)
- ✔ Invalid (unlike JS)
+ ✔ Error (unlike JS)
POSIX classes[[:word:]][[:word:]],
[[:^word:]]
☑️[3] ✔ All use Unicode interpretations
- ✔ Negate with [:^…:]
- ✔ Variable-length quantifiers within lookbehind invalid (unlike JS)
+ ✔ Error for variable-length quantifiers within lookbehind (unlike JS)
✔ Allows variable-length top-level alternatives
✔ Allows following quantifier (unlike JS in any mode)
✔ Values captured within min-0 quantified lookbehind remain referenceable
@@ -628,8 +636,8 @@ Notice that nearly every feature has at least subtle differences from JavaScript
- ✔ Allows duplicate names
- ✔ Error for group names invalid in Oniguruma or JS
+ ✔ Duplicate names allowed (no restrictions)
+ ✔ Error for names invalid in Oniguruma or JS
☑️ ☑️ - ✔ Error if named backref and group defined to the right
- ● Error if numbered backref and group defined to the right[5]
- ✔ Fail to match when referencing a containing group
- ✔ Fail to match (or don't include as a multiplex option) if group defined in a preceding alternation path
- ✔ Groups to the right not included as multiplex options
- ❌ Some rare cases are indeterminable through static analysis, and use JS behavior of matching the empty string
+ ✔ Error if group defined to the right[5]
+ ✔ Duplicate names/subroutines to the right not included in multiplex
+ ✔ Fail to match (or don't include in multiplex) ancestor groups and groups in preceding alternation paths
+ ❌ Some rare cases are indeterminable at compile time, so use JS behavior (match empty string)
+
SubroutinesTODO: Add me + ✔
+
RecursionTODO: Add me☑️[6]☑️[6] + ✔
Not yet complete…
-Despite all the details in the table above, it doesn't include all aspects that Oniguruma-To-ES emulates (e.g., some error handling, most aspects that work the same as in JavaScript, and many aspects of non-JavaScript features that work the same in other regex flavors that support them). +Despite all the details in the table above, it doesn't include all aspects that Oniguruma-To-ES emulates (including error handling, most aspects that work the same as in JavaScript, and many aspects of non-JavaScript features that work the same in other regex flavors that support them). ### Footnotes @@ -759,7 +783,8 @@ Despite all the details in the table above, it doesn't include all aspects that 2. Unicode blocks are easily emulatable but their character data would significantly increase library weight, and they're a flawed, arguably-unuseful feature (use Unicode scripts and other properties instead). 3. With target `ES2018`, the specific POSIX classes `[:graph:]` and `[:print:]` use ASCII-based versions rather than the Unicode versions available for target `ES2024` and later, and they result in an error if option `allowBestEffort` is disabled. 4. Target `ES2018` doesn't allow nested negated character classes. -5. It's not an error for *numbered* backreferences to come before their referenced group in Oniguruma, but an error is the best path for Oniguruma-To-ES because (1) almost all placements are mistakes and can never match (based on the Oniguruma behavior for backreferences to nonparticipating groups), and (2) the edge cases where it's matchable rely on rules for backreference resetting within quantified groups that are different in JS and are not emulatable. Note that it's not a backreference in the first place if `\10`+ and not as many capturing groups defined to the left (it's an octal or identity escape). +5. It's not an error for *numbered* backreferences to come before their referenced group in Oniguruma, but an error is the best path for Oniguruma-To-ES because (1) almost all placements are mistakes and can never match (based on the Oniguruma behavior for backreferences to nonparticipating groups), (2) it matches the behavior of named groups, and (3) the edge cases where it's matchable rely on rules for backreference resetting within quantified groups that are different in JS and are not emulatable. Note that it's not a backreference in the first place if `\10`+ and not as many capturing groups defined to the left (it's an octal or identity escape). +6. Recursion depth is limited; specified by option `maxRecursionDepth`. ## ㊗️ Unicode / mixed case-sensitivity