Skip to content

Commit

Permalink
Move JS flags to options
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed Nov 1, 2024
1 parent e331b16 commit 1e6e572
Show file tree
Hide file tree
Showing 9 changed files with 86 additions and 76 deletions.
24 changes: 18 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ A string with `i`, `m`, and `x` in any order (all optional).
```ts
type CompileOptions = {
allowBestEffort?: boolean;
global?: boolean;
hasIndices?: boolean;
maxRecursionDepth?: number | null;
optimize?: boolean;
target?: 'ES2018' | 'ES2024' | 'ESNext';
Expand All @@ -96,13 +98,11 @@ Transpiles an Oniguruma regex pattern and flags and returns a native JavaScript
```ts
function toRegExp(
pattern: string,
flags?: string,
flags?: OnigurumaFlags,
options?: CompileOptions
): RegExp;
```

The `flags` string can be any combination of Oniguruma flags `i`, `m`, and `x`, plus JavaScript flags `d` and `g`. Oniguruma's flag `m` is equivalent to JavaScript's flag `s`. See [Options](#-options) for more details.

> [!TIP]
> Try it in the [demo REPL](https://slevithan.github.io/oniguruma-to-es/demo/).
Expand All @@ -128,7 +128,7 @@ function toRegexAst(
): RegexAst;
```

`regex`'s syntax and behavior is a strict superset of native JavaScript, so the AST is very close to representing native ESNext JavaScript `RegExp` but with some added features (atomic groups, possessive quantifiers, recursion). The `regex` AST doesn't use some of `regex`'s extended features like flag `x` or subroutines because they follow PCRE behavior and work somewhat differently than in Oniguruma. The AST represents what's needed to precisely reproduce the Oniguruma behavior using `regex`.
`regex`'s syntax and behavior is a strict superset of native JavaScript, so the AST is very close to representing native ESNext `RegExp` but with some added features (atomic groups, possessive quantifiers, recursion). The `regex` AST doesn't use some of `regex`'s extended features like flag `x` or subroutines because they follow PCRE behavior and work somewhat differently than in Oniguruma. The AST represents what's needed to precisely reproduce the Oniguruma behavior using `regex`.

## 🔩 Options

Expand All @@ -154,6 +154,18 @@ Specifically, this option enables the following additional features, depending o
- Enables use of POSIX classes `[:graph:]` and `[:print:]` using ASCII-based versions rather than the Unicode versions available for `ES2024` and later. Other POSIX classes always use Unicode.
</details>

### `global`

Include JavaScript flag `g` (`global`) in results.

*Default: `false`.*

### `hasIndices`

Include JavaScript flag `d` (`hasIndices`) in results.

*Default: `false`.*

### `maxRecursionDepth`

If `null`, any use of recursion throws. If an integer between `2` and `100` (and `allowBestEffort` is `true`), common recursion forms are supported and recurse up to the specified max depth.
Expand Down Expand Up @@ -745,7 +757,7 @@ Notice that nearly every feature below has at least subtle differences from Java
<td align="middle">✅</td>
<td>
● Same behavior as numbered<br>
✔ Error if refs a duplicate name<br>
✔ Error if reffed group uses duplicate name<br>
</td>
</tr>

Expand Down Expand Up @@ -834,7 +846,7 @@ Notice that nearly every feature below has at least subtle differences from Java
<td align="middle">✅</td>
<td align="middle">✅</td>
<td>
✔ <code>[\q{…}]</code> matches literal <code>q</code>, etc.<br>
✔ <code>[\q{…}]</code> matches one of literal <code>q</code>, <code>{</code>, etc.<br>
✔ <code>[a--b]</code> includes the invalid reversed range <code>a</code> to <code>-</code><br>
</td>
</tr>
Expand Down
67 changes: 27 additions & 40 deletions demo/demo.js
Original file line number Diff line number Diff line change
@@ -1,34 +1,35 @@
let useFlagI = getValue('flag-i');
let useFlagM = getValue('flag-m');
let useFlagX = getValue('flag-x');
let optionAllowBestEffortValue = getValue('option-allow-best-effort');
let optionMaxRecursionDepthValue = getValue('option-max-recursion-depth');
let optionOptimizeValue = getValue('option-optimize');
let optionTargetValue = getValue('option-target');

function getValue(id) {
const el = document.getElementById(id);
return el.type === 'checkbox' ? el.checked : el.value;
}
const state = {
flags: {
i: getValue('flag-i'),
m: getValue('flag-m'),
x: getValue('flag-x'),
},
opts: {
allowBestEffort: getValue('option-allow-best-effort'),
global: getValue('option-global'),
hasIndices: getValue('option-has-indices'),
maxRecursionDepth: getValue('option-max-recursion-depth'),
optimize: getValue('option-optimize'),
target: getValue('option-target'),
},
};

const inputEl = document.getElementById('input');
autoGrow(inputEl);
showOutput(inputEl);

function showOutput(el) {
const input = el.value;
const flags = `${useFlagI ? 'i' : ''}${useFlagM ? 'm' : ''}${useFlagX ? 'x' : ''}`;
const flags = `${state.flags.i ? 'i' : ''}${state.flags.m ? 'm' : ''}${state.flags.x ? 'x' : ''}`;
const outputEl = document.getElementById('output');
outputEl.classList.remove('error');
let output = '';
try {
// Use `compile` but display output as if `toRegExp` was called. This avoids erroring when the
// selected `target` includes features that don't work in the user's browser
const re = OnigurumaToES.compile(input, flags, {
allowBestEffort: optionAllowBestEffortValue,
maxRecursionDepth: optionMaxRecursionDepthValue === '' ? null : +optionMaxRecursionDepthValue,
optimize: optionOptimizeValue,
target: optionTargetValue,
...state.opts,
maxRecursionDepth: state.opts.maxRecursionDepth === '' ? null : +state.opts.maxRecursionDepth,
});
output = `/${getRegExpLiteralPattern(re.pattern)}/${re.flags}`;
} catch (e) {
Expand All @@ -51,31 +52,17 @@ function getRegExpLiteralPattern(str) {
return str ? str.replace(/\\?./gsu, m => m === '/' ? '\\/' : m) : '(?:)';
}

function setFlagI(checked) {
useFlagI = checked;
showOutput(inputEl);
}
function setFlagM(checked) {
useFlagM = checked;
showOutput(inputEl);
}
function setFlagX(checked) {
useFlagX = checked;
showOutput(inputEl);
}
function setOptionAllowBestEffort(checked) {
optionAllowBestEffortValue = checked;
showOutput(inputEl);
}
function setOptionMaxRecursionDepth(value) {
optionMaxRecursionDepthValue = value;
showOutput(inputEl);
function getValue(id) {
const el = document.getElementById(id);
return el.type === 'checkbox' ? el.checked : el.value;
}
function setOptionOptimize(checked) {
optionOptimizeValue = checked;

function setFlag(flag, value) {
state.flags[flag] = value;
showOutput(inputEl);
}
function setOptionTarget(value) {
optionTargetValue = value;

function setOption(option, value) {
state.opts[option] = value;
showOutput(inputEl);
}
24 changes: 16 additions & 8 deletions demo/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,25 @@ <h1>
<p>This is a basic REPL for testing the output of <a href="https://github.com/slevithan/oniguruma-to-es">Oniguruma-To-ES</a>, an Oniguruma to JavaScript RegExp transpiler. See <a href="https://github.com/kkos/oniguruma/blob/master/doc/RE">Oniguruma syntax</a> for an overview, but there are many subtleties to Oniguruma's differences from JavaScript that aren't shown in the docs.</p>

<h2>Try it</h2>
<p><textarea id="input" spellcheck="false" oninput="showOutput(this); autoGrow(this)"></textarea></p>
<p><textarea id="input" spellcheck="false" oninput="autoGrow(this); showOutput(this)"></textarea></p>
<p>
<b class="label">Flags:</b>
<label>
<input type="checkbox" id="flag-i" onchange="setFlagI(this.checked)">
<input type="checkbox" id="flag-i" onchange="setFlag('i', this.checked)">
<code>i</code>
</label>
<label>
<input type="checkbox" id="flag-m" onchange="setFlagM(this.checked)">
<input type="checkbox" id="flag-m" onchange="setFlag('m', this.checked)">
<code>m</code> <small>(JS flag <code>s</code>)</small>
</label>
<label>
<input type="checkbox" id="flag-x" onchange="setFlagX(this.checked)">
<input type="checkbox" id="flag-x" onchange="setFlag('x', this.checked)">
<code>x</code>
</label>
</p>
<p>
<b class="label"><code>target</code>:</b>
<select id="option-target" onchange="setOptionTarget(this.value)">
<select id="option-target" onchange="setOption('target', this.value)">
<option value="ES2018">ES2018</option>
<option value="ES2024" selected>ES2024</option>
<option value="ESNext">ESNext</option>
Expand All @@ -45,17 +45,25 @@ <h2>Try it</h2>
<summary>More options</summary>
<p>
<label>
<input type="checkbox" id="option-allow-best-effort" checked onchange="setOptionAllowBestEffort(this.checked)">
<input type="checkbox" id="option-allow-best-effort" checked onchange="setOption('allowBestEffort', this.checked)">
<code>allowBestEffort</code>
</label>
<label>
<input type="checkbox" id="option-optimize" checked onchange="setOptionOptimize(this.checked)">
<input type="checkbox" id="option-optimize" checked onchange="setOption('optimize', this.checked)">
<code>optimize</code>
</label>
<label>
<input type="number" id="option-max-recursion-depth" value="6" min="2" max="100" onchange="setOptionMaxRecursionDepth(this.value)" onkeyup="setOptionMaxRecursionDepth(this.value)">
<input type="number" id="option-max-recursion-depth" value="6" min="2" max="100" onchange="setOption('maxRecursionDepth', this.value)" onkeyup="setOption('maxRecursionDepth', this.value)">
<code>maxRecursionDepth</code>
</label>
<label>
<input type="checkbox" id="option-global" onchange="setOption('global', this.checked)">
<code>global</code>
</label>
<label>
<input type="checkbox" id="option-has-indices" onchange="setOption('hasIndices', this.checked)">
<code>hasIndices</code>
</label>
</p>
</details>
<pre id="output"></pre>
Expand Down
8 changes: 4 additions & 4 deletions dist/index.min.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion spec/match-assertion.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ describe('Assertion', () => {
});

it('should match only at the start of the search when applied repeatedly', () => {
expect('abbcbb'.match(toRegExp(r`\G[ab]`, 'g'))).toEqual(['a', 'b', 'b']);
expect('abbcbb'.match(toRegExp(r`\G[ab]`, '', {global: true}))).toEqual(['a', 'b', 'b']);
});

it('should apply with positive min quantification', () => {
Expand Down
2 changes: 1 addition & 1 deletion spec/match-recursion.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ describe('Recursion', () => {
'<>', '<<>>', '<a<b<c>d>e>', '<<<<<<a>>>bc>>>',
]).toExactlyMatch(pattern);
expect(
'test > <balanced <<brackets>>> <> <<a>> < <b>'.match(toRegExp(pattern, 'g'))
'test > <balanced <<brackets>>> <> <<a>> < <b>'.match(toRegExp(pattern, '', {global: true}))
).toEqual(['<balanced <<brackets>>>', '<>', '<<a>>', '<b>']);
});

Expand Down
18 changes: 12 additions & 6 deletions src/compile.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ import {recursion} from 'regex-recursion';
/**
@typedef {{
allowBestEffort?: boolean;
global?: boolean;
hasIndices?: boolean;
maxRecursionDepth?: number | null;
optimize?: boolean;
target?: keyof Target;
Expand All @@ -17,7 +19,7 @@ import {recursion} from 'regex-recursion';
/**
Transpiles an Oniguruma regex pattern and flags to native JS.
@param {string} pattern Oniguruma regex pattern.
@param {import('./tokenize.js').OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS's flag `s`.
@param {import('./tokenize.js').OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS flag `s`.
@param {CompileOptions} [options]
@returns {{
pattern: string;
Expand All @@ -37,7 +39,7 @@ function compile(pattern, flags, options) {
const generated = generate(regexAst, opts);
return {
pattern: atomic(possessive(recursion(generated.pattern))),
flags: `${generated.flags}${generated.options.disable.v ? 'u' : 'v'}`,
flags: `${opts.hasIndices ? 'd' : ''}${opts.global ? 'g' : ''}${generated.flags}${generated.options.disable.v ? 'u' : 'v'}`,
};
}

Expand All @@ -53,16 +55,20 @@ function getOptions(options) {
// Set default values
return {
// Allows results that differ from Oniguruma in rare cases. If `false`, throws if the pattern
// can't be emulated with identical behavior.
// can't be emulated with identical behavior
allowBestEffort: true,
// Include JS flag `g` in results
global: false,
// Include JS flag `d` in results
hasIndices: false,
// If `null`, any use of recursion throws. If an integer between `2` and `100` (and
// `allowBestEffort` is on), common recursion forms are supported and recurse up to the
// specified max depth.
// specified max depth
maxRecursionDepth: 6,
// Simplify the generated pattern when it doesn't change the meaning.
// Simplify the generated pattern when it doesn't change the meaning
optimize: true,
// Sets the JavaScript language version for generated patterns and flags. Later targets allow
// faster processing, simpler generated source, and support for additional features.
// faster processing, simpler generated source, and support for additional features
target: 'ES2024',
...options,
};
Expand Down
13 changes: 5 additions & 8 deletions src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import {tokenize} from './tokenize.js';
/**
Generates an Oniguruma AST from an Oniguruma pattern and flags.
@param {string} pattern Oniguruma regex pattern.
@param {import('./tokenize.js').OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS's flag `s`.
@param {import('./tokenize.js').OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS flag `s`.
@returns {import('./parse.js').OnigurumaAst}
*/
function toOnigurumaAst(pattern, flags) {
Expand All @@ -29,7 +29,7 @@ function toOnigurumaAst(pattern, flags) {
/**
Generates a `regex` AST from an Oniguruma pattern and flags.
@param {string} pattern Oniguruma regex pattern.
@param {import('./tokenize.js').OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS's flag `s`.
@param {import('./tokenize.js').OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS flag `s`.
@returns {import('./transform.js').RegexAst}
*/
function toRegexAst(pattern, flags) {
Expand All @@ -39,16 +39,13 @@ function toRegexAst(pattern, flags) {
/**
Transpiles an Oniguruma regex pattern and flags and returns a native JS RegExp.
@param {string} pattern Oniguruma regex pattern.
@param {string} [flags] Any combination of Oniguruma flags `imx` and JS flags `dg`. Flag `m` is
equivalent to JS's flag `s`.
@param {import('./tokenize.js').OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS flag `s`.
@param {import('./compile.js').CompileOptions} [options]
@returns {RegExp}
*/
function toRegExp(pattern, flags = '', options) {
const allowedJsFlags = flags.replace(/[^dg]+/g, '');
flags = flags.replace(/[dg]+/g, '');
function toRegExp(pattern, flags, options) {
const result = compile(pattern, flags, options);
return new RegExp(result.pattern, `${allowedJsFlags}${result.flags}`);
return new RegExp(result.pattern, result.flags);
}

export {
Expand Down
4 changes: 2 additions & 2 deletions src/tokenize.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ const TokenTypes = /** @type {const} */ ({
GroupOpen: 'GroupOpen',
Subroutine: 'Subroutine',
Quantifier: 'Quantifier',
// These aren't allowed in char classes, so they aren't equivalent to JS's `[\q{}]`
// These aren't allowed in char classes, so they aren't equivalent to JS `[\q{}]`
VariableLengthCharacterSet: 'VariableLengthCharacterSet',
// Intermediate representation not included in results
EscapedNumber: 'EscapedNumber',
Expand Down Expand Up @@ -122,7 +122,7 @@ const charClassTokenRe = new RegExp(r`
*/
/**
@param {string} pattern
@param {OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS's flag `s`.
@param {OnigurumaFlags} [flags] Oniguruma flags. Flag `m` is equivalent to JS flag `s`.
@returns {TokenizerResult}
*/
function tokenize(pattern, flags = '') {
Expand Down

0 comments on commit 1e6e572

Please sign in to comment.