From a1296022a519d7b1952d7faa307f96fae67650f2 Mon Sep 17 00:00:00 2001 From: Steven Levithan Date: Thu, 31 Oct 2024 14:39:34 +0100 Subject: [PATCH] Readme: Feature table progress --- README.md | 235 ++++++++++++++++++++++++++++++------------------------ 1 file changed, 131 insertions(+), 104 deletions(-) diff --git a/README.md b/README.md index 2f5ef2f..6aea019 100644 --- a/README.md +++ b/README.md @@ -194,22 +194,20 @@ Sets the JavaScript language version for generated patterns and flags. Later tar ## ✅ Supported features - +
- - + - - - + + @@ -217,9 +215,8 @@ Sets the JavaScript language version for generated patterns and flags. Later tar - - - + + @@ -227,9 +224,8 @@ Sets the JavaScript language version for generated patterns and flags. Later tar - - - + + - - - + + @@ -254,9 +250,8 @@ Sets the JavaScript language version for generated patterns and flags. Later tar - - - + + @@ -264,9 +259,8 @@ Sets the JavaScript language version for generated patterns and flags. Later tar - - - + + - - - + + - - - + + - + - - - + + @@ -309,143 +300,176 @@ Sets the JavaScript language version for generated patterns and flags. Later tar - - - + + - - - + + - - - + + - - - + + - - - - + + + - - - + + - - - - - - + + + + + - - - - - - - - - - - - + + + - - - - - + + + + - - - - - + + + + - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + @@ -453,7 +477,10 @@ Sets the JavaScript language version for generated patterns and flags. Later tar ### Footnotes -1. Target ES2018 doesn't allow Unicode properties added after ES2018. +1. Emulation capabilities are the same for targets ES2024 and ESNext, although resulting regex patterns and flags might differ. +2. Target ES2018 doesn't allow Unicode property names added after ES2018. +3. With target ES2018, the specific POSIX classes `[:graph:]` and `[:print:]` use ASCII versions rather than the Unicode versions available for target ES2024 and later. They are an error if option `allowBestEffort` is disabled. +4. Target ES2018 doesn't allow nested negated character classes. ## ㊗️ Unicode / mixed case-sensitivity @@ -468,7 +495,7 @@ Oniguruma-To-ES focuses on being lightweight to make it better for use in browse ## 👀 Similar projects -[JsRegex](https://github.com/jaynetics/js_regex) transpiles [Onigmo](https://github.com/k-takata/Onigmo) regexes to JavaScript (Onigmo is a fork of Oniguruma that has slightly different syntax/behavior). JsRegex is written in Ruby and relies on the [Regexp::Parser](https://github.com/ammar/regexp_parser) Onigmo parser, which means regexes must be pre-transpiled on the server to use them in JavaScript. In contrast, Oniguruma-To-ES is written in JavaScript, so it can be used at runtime. JsRegex also produces regexes with more edge cases that don't perfectly follow Oniguruma's behavior, in addition to the Oniguruma/Onigmo differences. +[JsRegex](https://github.com/jaynetics/js_regex) transpiles [Onigmo](https://github.com/k-takata/Onigmo) regexes to JavaScript (Onigmo is a fork of Oniguruma that has slightly different syntax/behavior). JsRegex is written in Ruby and relies on the Ruby [Regexp::Parser](https://github.com/ammar/regexp_parser) Onigmo parser, which means regexes must be pre-transpiled on the server to use them in JavaScript. In contrast, Oniguruma-To-ES is written in JavaScript, so it can be used at runtime. JsRegex also produces regexes with more edge cases that don't perfectly follow Oniguruma's behavior, in addition to the Oniguruma/Onigmo differences. ## 🏷️ About
Description Example ES2018ES2024ESNextES2024+[1] Comments
Flags i i ✔ Unicode case folding
m m ✔ Equivalent to JS flag s (dotAll)
x x ✔ Unicode whitespace ignored
✔ Line comments with #
@@ -243,10 +239,10 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
Flag modifiers Groups (?im-x:…) + ✔ Unicode case folding for i
✔ Allows enabling and disabling the same flag (priority: disable)
✔ Allows lone or multiple -
Directives (?im-x) ✔ Continues until end of pattern or group (spanning alternatives)
Comment groups (?#…) ✔ Allows escaping \), \\
✔ Comments allowed between a token and its quantifier
@@ -277,9 +271,8 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
Characters Literal E! ✔ Code point based matching
✔ Standalone ], {, } don't require escaping
@@ -288,20 +281,18 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
Identity escape \E\! ✔ Different set than JS
✔ Invalid for multibyte chars
MetacharChar escapes \t ✔ JS set plus \a, \e
\x \xA0 ✔ 1-digit hex \xA
✔ 2-digit hex \xA0
- ✔ Incomplete \x invalid
\u \uFFFF - ✔ Incomplete \u invalid
+ ✔ Same as JS
\u{…} \u{A} - ✔ Incomplete \u{…} invalid
- ✔ Allows whitespace
- ✔ Allows leading 0s up to 6 total hex digits
+ ✔ Allows whitespace padding
+ ✔ Allows leading 0s up to 6 total hex digits (JS allows unlimited)
Escaped num \20 - ✔ Can be null, backref, error, octal, identity escape, literal, or multiple of these, based on complex context
- ✔ Always treat escaped single digit 1-9 outside char class as backref
+ ✔ Can be backref, error, null, octal, identity escape, or one these combined with literal digits, based on complex context
+ ✔ Always treats escaped single digit 1-9 outside char class as backref
Control\C-A\cA, \C-A - ✔ \cx with A-Za-z
- ✔ \C-x with A-Za-z
- ✔ Incomplete \c/\C invalid
+ ✔ With A-Za-z (JS: only \c)
Other \M-\1✖️✖️✖️✖️✖️ - Not yet supported; all extremely rare
+ Not yet supported; very rare
\cx, \C-x with non-A-Za-z
✘ Meta-code \M-x, \M-\C-x
Character setsDot.Character setsDigit, word\d, \w, etc. - ✔ Excludes only \n (unlike JS)
-
Digit\d\D - ✔ ASCII
+ ✔ Same as JS (ASCII)
Hex digit\h\H\h, \H ✔ ASCII
Word\w\WWhitespace\s, \S - ✔ ASCII
+ ✔ ASCII (unlike JS)
Whitespace\s\SDot. - ✔ ASCII (unlike JS)
+ ✔ Excludes only \n (unlike JS)
Unicode property \p/\P\p{greek}[1]Unicode property\p{L}, \P{L}[2] ✔ Categories
✔ Binary properties
✔ Scripts
- ✘ Blocks (wontfix)
✔ Aliases
+ ✔ POSIX
+ ✘ Blocks (wontfix)
✔ Negate with \p{^…}, \P{^…}
✔ Insignificant spaces, underscores, and casing in names
- ✔ \p/\P is identity escape
- ✔ Incomplete \p{/\P{ invalid
+ ✔ \p, \P without { is identity escape
✔ JS prefixes (ex: Script=) invalid
✔ JS properties of strings invalid
Character classesBase[ab], [^a] + ✔ Literal unescaped - in some contexts (different than any JS mode)
+ ✔ Fewer chars require escaping than JS
+ ✔ No subtraction operator (from JS flag v)
+
Empty[], [^] + ✔ Invalid (unlike JS)
+
Ranges[a-z] + ✔ Same as JS with flag u, v
+
POSIX classes[[:word:]]☑️[3] + ✔ Unicode interpretations
+ ✔ Negate with [:^…:]
+
Nested classes[a[b]]☑️[4] + ✔ Same as JS with flag v
+
Intersection[a-z&&\h] + ✔ Doesn't require nested classes for union and ranges (unlike JS)
+
Work in progress…