Skip to content

Commit

Permalink
Readme: Feature table progress
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed Oct 31, 2024
1 parent e5919b2 commit a129602
Showing 1 changed file with 131 additions and 104 deletions.
235 changes: 131 additions & 104 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,42 +194,38 @@ Sets the JavaScript language version for generated patterns and flags. Later tar

## ✅ Supported features

<table >
<table>
<tr>
<th colspan="2">Description</th>
<th>Example</th>
<th>ES2018</th>
<th>ES2024</th>
<th>ESNext</th>
<th>ES2024+<sup>[1]</sup></th>
<th>Comments</th>
</tr>
<tr valign="top">
<th align="left" rowspan="3"><b>Flags</b></th>
<td><code>i</code></td>
<td><code>i</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Unicode case folding<br>
</td>
</tr>
<tr valign="top">
<td><code>m</code></td>
<td><code>m</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Equivalent to JS flag <code>s</code> (<code>dotAll</code>)<br>
</td>
</tr>
<tr valign="top">
<td><code>x</code></td>
<td><code>x</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Unicode whitespace ignored<br>
✔ Line comments with <code>#</code><br>
Expand All @@ -243,30 +239,28 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
<th align="left" rowspan="2" valign="top"><b>Flag modifiers</b></th>
<td>Groups</td>
<td><code>(?im-x:…)</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Unicode case folding for <code>i</code><br>
✔ Allows enabling and disabling the same flag (priority: disable)<br>
✔ Allows lone or multiple <code>-</code><br>
</td>
</tr>
<tr valign="top">
<td>Directives</td>
<td><code>(?im-x)</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Continues until end of pattern or group (spanning alternatives)<br>
</td>
</tr>
<tr valign="top">
<th align="left" colspan="2"><b>Comment groups</b></th>
<td><code>(?#…)</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Allows escaping <code>\)</code>, <code>\\</code><br>
✔ Comments allowed between a token and its quantifier<br>
Expand All @@ -277,9 +271,8 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
<th align="left" rowspan="9"><b>Characters</b></th>
<td>Literal</td>
<td><code>E!</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Code point based matching<br>
✔ Standalone <code>]</code>, <code>{</code>, <code>}</code> don't require escaping<br>
Expand All @@ -288,172 +281,206 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
<tr valign="top">
<td>Identity escape</td>
<td><code>\E\!</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Different set than JS<br>
✔ Invalid for multibyte chars<br>
</td>
</tr>
<tr valign="top">
<td>Metachar</td>
<td>Char escapes</td>
<td><code>\t</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ JS set plus <code>\a</code>, <code>\e</code><br>
</td>
</tr>
<tr valign="top">
<td><code>\x</code></td>
<td><code>\xA0</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ 1-digit hex <code>\xA</code><br>
✔ 2-digit hex <code>\xA0</code><br>
✔ Incomplete <code>\x</code> invalid<br>
</td>
</tr>
<tr valign="top">
<td><code>\u</code></td>
<td><code>\uFFFF</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
Incomplete <code>\u</code> invalid<br>
Same as JS<br>
</td>
</tr>
<tr valign="top">
<td><code>\u{…}</code></td>
<td><code>\u{A}</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Incomplete <code>\u{…}</code> invalid<br>
✔ Allows whitespace<br>
✔ Allows leading 0s up to 6 total hex digits<br>
✔ Allows whitespace padding<br>
✔ Allows leading 0s up to 6 total hex digits (JS allows unlimited)<br>
</td>
</tr>
<tr valign="top">
<td>Escaped num</td>
<td><code>\20</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Can be null, backref, error, octal, identity escape, literal, or multiple of these, based on complex context<br>
✔ Always treat escaped single digit 1-9 outside char class as backref<br>
✔ Can be backref, error, null, octal, identity escape, or one these combined with literal digits, based on complex context<br>
✔ Always treats escaped single digit 1-9 outside char class as backref<br>
</td>
</tr>
<tr valign="top">
<td>Control</td>
<td><code>\C-A</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td><code>\cA</code>, <code>\C-A</code></td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ <code>\cx</code> with A-Za-z<br>
✔ <code>\C-x</code> with A-Za-z<br>
✔ Incomplete <code>\c</code>/<code>\C</code> invalid<br>
✔ With A-Za-z (JS: only <code>\c</code>)<br>
</td>
</tr>
<tr valign="top">
<td>Other</td>
<td><code>\M-\1</code></td>
<td>✖️</td>
<td>✖️</td>
<td>✖️</td>
<td align="middle">✖️</td>
<td align="middle">✖️</td>
<td>
Not yet supported; all extremely rare<br>
Not yet supported; very rare<br>
✘ <code>\cx</code>, <code>\C-x</code> with non-A-Za-z<br>
✘ Meta-code <code>\M-x</code>, <code>\M-\C-x</code><br>
</td>
</tr>
<tr valign="top">
<th align="left" rowspan="6"><b>Character sets</b></th>
<td>Dot</td>
<td><code>.</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<th align="left" rowspan="5"><b>Character sets</b></th>
<td>Digit, word</td>
<td><code>\d</code>, <code>\w</code>, etc.</td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Excludes only <code>\n</code> (unlike JS)<br>
</td>
</tr>
<tr valign="top">
<td>Digit</td>
<td><code>\d\D</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td>
✔ ASCII<br>
✔ Same as JS (ASCII)<br>
</td>
</tr>
<tr valign="top">
<td>Hex digit</td>
<td><code>\h\H</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td><code>\h</code>, <code>\H</code></td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ ASCII<br>
</td>
</tr>
<tr valign="top">
<td>Word</td>
<td><code>\w\W</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td>Whitespace</td>
<td><code>\s</code>, <code>\S</code></td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ ASCII<br>
✔ ASCII (unlike JS)<br>
</td>
</tr>
<tr valign="top">
<td>Whitespace</td>
<td><code>\s\S</code></td>
<td>✅</td>
<td>✅</td>
<td>✅</td>
<td>Dot</td>
<td><code>.</code></td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
ASCII (unlike JS)<br>
Excludes only <code>\n</code> (unlike JS)<br>
</td>
</tr>
<tr valign="top">
<td>Unicode property <code>\p</code>/<code>\P</code></td>
<td><code>\p{greek}</code></td>
<td>✅ <sup>[1]</sup></td>
<td>✅</td>
<td>✅</td>
<td>Unicode property</td>
<td><code>\p{L}</code>, <code>\P{L}</code></td>
<td align="middle">✅<sup>[2]</sup></td>
<td align="middle">✅</td>
<td>
✔ Categories<br>
✔ Binary properties<br>
✔ Scripts<br>
✘ Blocks (wontfix)<br>
✔ Aliases<br>
✔ POSIX<br>
✘ Blocks (wontfix)<br>
✔ Negate with <code>\p{^…}</code>, <code>\P{^…}</code><br>
✔ Insignificant spaces, underscores, and casing in names<br>
✔ <code>\p</code>/<code>\P</code> is identity escape<br>
✔ Incomplete <code>\p{</code>/<code>\P{</code> invalid<br>
✔ <code>\p</code>, <code>\P</code> without <code>{</code> is identity escape<br>
✔ JS prefixes (ex: <code>Script=</code>) invalid<br>
✔ JS properties of strings invalid<br>
</td>
</tr>
<tr valign="top">
<th align="left" rowspan="6"><b>Character classes</b></th>
<td>Base</td>
<td><code>[ab]</code>, <code>[^a]</code></td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Literal unescaped <code>-</code> in some contexts (different than any JS mode)<br>
✔ Fewer chars require escaping than JS<br>
✔ No subtraction operator (from JS flag <code>v</code>)<br>
</td>
</tr>
<tr valign="top">
<td>Empty</td>
<td><code>[]</code>, <code>[^]</code></td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Invalid (unlike JS)<br>
</td>
</tr>
<tr valign="top">
<td>Ranges</td>
<td><code>[a-z]</code></td>
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ Same as JS with flag <code>u</code>, <code>v</code><br>
</td>
</tr>
<tr valign="top">
<td>POSIX classes</td>
<td><code>[[:word:]]</code></td>
<td align="middle">☑️<sup>[3]</sup></td>
<td align="middle">✅</td>
<td>
✔ Unicode interpretations<br>
✔ Negate with <code>[:^…:]</code><br>
</td>
</tr>
<tr valign="top">
<td>Nested classes</td>
<td><code>[a[b]]</code></td>
<td align="middle">☑️<sup>[4]</sup></td>
<td align="middle">✅</td>
<td>
✔ Same as JS with flag <code>v</code><br>
</td>
</tr>
<tr valign="top">
<td>Intersection</td>
<td><code>[a-z&&\h]</code></td>
<td align="middle">❌</td>
<td align="middle">✅</td>
<td>
✔ Doesn't require nested classes for union and ranges (unlike JS)<br>
</td>
</tr>
<tr valign="top">
<td colspan="7"><b>Work in progress…</b></td>
</tr>
</table>

### Footnotes

1. Target ES2018 doesn't allow Unicode properties added after ES2018.
1. Emulation capabilities are the same for targets ES2024 and ESNext, although resulting regex patterns and flags might differ.
2. Target ES2018 doesn't allow Unicode property names added after ES2018.
3. With target ES2018, the specific POSIX classes `[:graph:]` and `[:print:]` use ASCII versions rather than the Unicode versions available for target ES2024 and later. They are an error if option `allowBestEffort` is disabled.
4. Target ES2018 doesn't allow nested negated character classes.

## ㊗️ Unicode / mixed case-sensitivity

Expand All @@ -468,7 +495,7 @@ Oniguruma-To-ES focuses on being lightweight to make it better for use in browse

## 👀 Similar projects

[JsRegex](https://github.com/jaynetics/js_regex) transpiles [Onigmo](https://github.com/k-takata/Onigmo) regexes to JavaScript (Onigmo is a fork of Oniguruma that has slightly different syntax/behavior). JsRegex is written in Ruby and relies on the [Regexp::Parser](https://github.com/ammar/regexp_parser) Onigmo parser, which means regexes must be pre-transpiled on the server to use them in JavaScript. In contrast, Oniguruma-To-ES is written in JavaScript, so it can be used at runtime. JsRegex also produces regexes with more edge cases that don't perfectly follow Oniguruma's behavior, in addition to the Oniguruma/Onigmo differences.
[JsRegex](https://github.com/jaynetics/js_regex) transpiles [Onigmo](https://github.com/k-takata/Onigmo) regexes to JavaScript (Onigmo is a fork of Oniguruma that has slightly different syntax/behavior). JsRegex is written in Ruby and relies on the Ruby [Regexp::Parser](https://github.com/ammar/regexp_parser) Onigmo parser, which means regexes must be pre-transpiled on the server to use them in JavaScript. In contrast, Oniguruma-To-ES is written in JavaScript, so it can be used at runtime. JsRegex also produces regexes with more edge cases that don't perfectly follow Oniguruma's behavior, in addition to the Oniguruma/Onigmo differences.

## 🏷️ About

Expand Down

0 comments on commit a129602

Please sign in to comment.