Skip to content

Commit

Permalink
Readme/tests
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed Jan 9, 2025
1 parent 78a420b commit c68bcb6
Show file tree
Hide file tree
Showing 6 changed files with 145 additions and 134 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ An **[Oniguruma](https://github.com/kkos/oniguruma) to JavaScript regex translat
- Take advantage of Oniguruma's many extended regex features in JavaScript.
- Run regexes written for Oniguruma from JavaScript, such as those used in TextMate grammars (used by VS Code, [Shiki](https://shiki.style/) syntax highlighter, etc.).
- Share regexes across your Ruby and JavaScript code.<sup>✳︎</sup>
- Evaluate Oniguruma regexes for validity, and traverse their ASTs.

Compared to running the Oniguruma C library via WASM bindings using [vscode-oniguruma](https://github.com/microsoft/vscode-oniguruma), this library is **less than 4% of the size** and its regexes often run much faster since they run as native JavaScript.

Expand Down Expand Up @@ -129,6 +130,8 @@ function toOnigurumaAst(
): OnigurumaAst;
```

An error is thrown if the pattern isn't valid in Oniguruma. But unlike `toRegExp` and `toDetails`, this won't evaluate whether the regex can be emulated in JavaScript.

### `EmulatedRegExp`

Works the same as JavaScript's native `RegExp` constructor in all contexts, but can be given results from `toDetails` to produce the same result as `toRegExp`.
Expand Down Expand Up @@ -214,6 +217,7 @@ Advanced options that override standard behavior, error checking, and flags when
- Oniguruma option `ONIG_OPTION_CAPTURE_GROUP`; on by default in `vscode-oniguruma`.
- `ignoreUnsupportedGAnchors`: Remove unsupported uses of `\G`, rather than erroring.
- Oniguruma-To-ES uses a variety of strategies to accurately emulate many common uses of `\G`. When using this option, if a `\G` is found that doesn't have a known emulation strategy, the `\G` is simply removed. This might lead to some false positive matches, but is useful for non-critical matching (like syntax highlighting) when having some mismatches is better than not working.
- Validation of the regex doesn't ignore unsupported `\G`s, so e.g. a quantifier after `\G` will still error.
- `recursionLimit`: Change the recursion depth limit from Oniguruma's `20` to an integer `2``20`.

### `target`
Expand Down
1 change: 0 additions & 1 deletion spec/match-directive.spec.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
// TODO: Add me

// import {toDetails} from '../dist/index.mjs';
// import {r} from '../src/utils.js';
// import {matchers} from './helpers/matchers.js';

Expand Down
13 changes: 7 additions & 6 deletions spec/match-flags.spec.js
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import {r} from '../src/utils.js';
import {matchers} from './helpers/matchers.js';
// TODO: Add me

beforeEach(() => {
jasmine.addMatchers(matchers);
});
// import {r} from '../src/utils.js';
// import {matchers} from './helpers/matchers.js';

// beforeEach(() => {
// jasmine.addMatchers(matchers);
// });

// TODO: Add me
// describe('Flags', () => {
// it('should', () => {
// expect('').toExactlyMatch(r``);
Expand Down
214 changes: 106 additions & 108 deletions spec/match-lookaround.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,129 +5,127 @@ beforeEach(() => {
jasmine.addMatchers(matchers);
});

describe('Assertion: Lookaround', () => {
describe('lookahead', () => {
it('should match fixed-length positive lookahead', () => {
expect('ab').toFindMatch('a(?=b)');
expect([
'ac', 'a',
]).not.toFindMatch('a(?=b)');
});
describe('Assertion: lookahead', () => {
it('should match fixed-length positive lookahead', () => {
expect('ab').toFindMatch('a(?=b)');
expect([
'ac', 'a',
]).not.toFindMatch('a(?=b)');
});

it('should match fixed-length negative lookahead', () => {
expect('ab').not.toFindMatch('a(?!b)');
expect([
'ac', 'a',
]).toFindMatch('a(?!b)');
});
it('should match fixed-length negative lookahead', () => {
expect('ab').not.toFindMatch('a(?!b)');
expect([
'ac', 'a',
]).toFindMatch('a(?!b)');
});

it('should match fixed-length repetition in lookahead', () => {
expect('abb').toFindMatch('a(?=b{2})');
expect('abb').toFindMatch('a(?=b{2,2})');
});
it('should match fixed-length repetition in lookahead', () => {
expect('abb').toFindMatch('a(?=b{2})');
expect('abb').toFindMatch('a(?=b{2,2})');
});

it('should match variable-length repetition in lookahead', () => {
expect('a').toFindMatch('a(?=b?)');
expect('a').toFindMatch('a(?=b*)');
expect('ab').toFindMatch('a(?=b+)');
expect('a').toFindMatch('a(?=b{0,2})');
expect('a').toFindMatch('a(?=b{0,})');
});
it('should match variable-length repetition in lookahead', () => {
expect('a').toFindMatch('a(?=b?)');
expect('a').toFindMatch('a(?=b*)');
expect('ab').toFindMatch('a(?=b+)');
expect('a').toFindMatch('a(?=b{0,2})');
expect('a').toFindMatch('a(?=b{0,})');
});

it('should match top-level variable-length alternatives in lookahead', () => {
expect([
'ab', 'acc',
]).toFindMatch('a(?=b|cc)');
expect([
'ac', 'a',
]).not.toFindMatch('a(?=b|cc)');
});
it('should match top-level variable-length alternatives in lookahead', () => {
expect([
'ab', 'acc',
]).toFindMatch('a(?=b|cc)');
expect([
'ac', 'a',
]).not.toFindMatch('a(?=b|cc)');
});

it('should match non-top-level variable-length alternatives in lookahead', () => {
expect([
'abc', 'abdd',
]).toFindMatch('a(?=b(?:c|dd))');
});
it('should match non-top-level variable-length alternatives in lookahead', () => {
expect([
'abc', 'abdd',
]).toFindMatch('a(?=b(?:c|dd))');
});
});

describe('lookbehind', () => {
it('should match fixed-length positive lookbehind', () => {
expect('ba').toFindMatch('(?<=b)a');
expect([
'ca', 'a',
]).not.toFindMatch('(?<=b)a');
});
describe('Assertion: lookbehind', () => {
it('should match fixed-length positive lookbehind', () => {
expect('ba').toFindMatch('(?<=b)a');
expect([
'ca', 'a',
]).not.toFindMatch('(?<=b)a');
});

it('should match fixed-length negative lookbehind', () => {
expect('ba').not.toFindMatch('(?<!b)a');
expect([
'ca', 'a',
]).toFindMatch('(?<!b)a');
});
it('should match fixed-length negative lookbehind', () => {
expect('ba').not.toFindMatch('(?<!b)a');
expect([
'ca', 'a',
]).toFindMatch('(?<!b)a');
});

it('should match fixed-length repetition in lookbehind', () => {
expect('bba').toFindMatch('(?<=b{2})a');
expect('bba').toFindMatch('(?<=b{2,2})a');
});
it('should match fixed-length repetition in lookbehind', () => {
expect('bba').toFindMatch('(?<=b{2})a');
expect('bba').toFindMatch('(?<=b{2,2})a');
});

it('should match variable-length repetition in lookbehind', () => {
expect('a').toFindMatch('(?<=b?)a');
expect('a').toFindMatch('(?<=b*)a');
expect('ba').toFindMatch('(?<=b+)a');
expect('a').toFindMatch('(?<=b{0,2})a');
expect('a').toFindMatch('(?<=b{0,})a');
});
it('should match variable-length repetition in lookbehind', () => {
expect('a').toFindMatch('(?<=b?)a');
expect('a').toFindMatch('(?<=b*)a');
expect('ba').toFindMatch('(?<=b+)a');
expect('a').toFindMatch('(?<=b{0,2})a');
expect('a').toFindMatch('(?<=b{0,})a');
});

it('should match top-level variable-length alternatives in lookbehind', () => {
expect([
'ba', 'cca',
]).toFindMatch('(?<=b|cc)a');
expect([
'ca', 'a',
]).not.toFindMatch('(?<=b|cc)a');
});
it('should match top-level variable-length alternatives in lookbehind', () => {
expect([
'ba', 'cca',
]).toFindMatch('(?<=b|cc)a');
expect([
'ca', 'a',
]).not.toFindMatch('(?<=b|cc)a');
});

it('should match non-top-level variable-length alternatives in lookbehind', () => {
expect([
'bca', 'bdda',
]).toFindMatch('(?<=b(?:c|dd))a');
});
it('should match non-top-level variable-length alternatives in lookbehind', () => {
expect([
'bca', 'bdda',
]).toFindMatch('(?<=b(?:c|dd))a');
});

describe('contents', () => {
it('should throw for invalid contents in positive lookbehind', () => {
// Invalid
[ '(?<=(?=))', // positive lookahead
'(?<=(?:a(?=)))', // positive lookahead; not direct child
'(?<=(?!))', // negative lookahead
'(?<=(?<!))', // negative lookbehind
].forEach(p => {
expect(() => toDetails(p)).toThrow();
});
// Valid
[ '(?<=(?<=))', // positive lookbehind
'(?<=())', // capturing group (unnamed)
'(?<=(?<n>))', // capturing group (named)
].forEach(p => {
expect(() => toDetails(p)).not.toThrow();
});
describe('contents', () => {
it('should throw for invalid contents in positive lookbehind', () => {
// Invalid
[ '(?<=(?=))', // positive lookahead
'(?<=(?:a(?=)))', // positive lookahead; not direct child
'(?<=(?!))', // negative lookahead
'(?<=(?<!))', // negative lookbehind
].forEach(p => {
expect(() => toDetails(p)).toThrow();
});
// Valid
[ '(?<=(?<=))', // positive lookbehind
'(?<=())', // capturing group (unnamed)
'(?<=(?<n>))', // capturing group (named)
].forEach(p => {
expect(() => toDetails(p)).not.toThrow();
});
});

it('should throw for invalid contents in negative lookbehind', () => {
// Invalid
[ '(?<!(?=))', // positive lookahead
'(?<!(?:a(?=)))', // positive lookahead; not direct child
'(?<!(?!))', // negative lookahead
'(?<!())', // capturing group (unnamed)
'(?<!(?<n>))', // capturing group (named)
].forEach(p => {
expect(() => toDetails(p)).toThrow();
});
// Valid
[ '(?<!(?<=))', // positive lookbehind
'(?<!(?<!))', // negative lookbehind
].forEach(p => {
expect(() => toDetails(p)).not.toThrow();
});
it('should throw for invalid contents in negative lookbehind', () => {
// Invalid
[ '(?<!(?=))', // positive lookahead
'(?<!(?:a(?=)))', // positive lookahead; not direct child
'(?<!(?!))', // negative lookahead
'(?<!())', // capturing group (unnamed)
'(?<!(?<n>))', // capturing group (named)
].forEach(p => {
expect(() => toDetails(p)).toThrow();
});
// Valid
[ '(?<!(?<=))', // positive lookbehind
'(?<!(?<!))', // negative lookbehind
].forEach(p => {
expect(() => toDetails(p)).not.toThrow();
});
});
});
Expand Down
20 changes: 5 additions & 15 deletions spec/match-search-start.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ beforeEach(() => {
jasmine.addMatchers(matchers);
});

describe('Assertion: Search start', () => {
describe('Assertion: search_start', () => {
// Note: See specs for option `rules.ignoreUnsupportedGAnchors` in `options.spec.js`

it('should be identity escape within a char class', () => {
expect('G').toExactlyMatch(r`[\G]`);
expect('\\').not.toFindMatch(r`[\G]`);
Expand All @@ -33,7 +35,6 @@ describe('Assertion: Search start', () => {
});

it('should throw if not used at the start of every top-level alternative', () => {
expect(() => toDetails(r`a\G`)).toThrow();
expect(() => toDetails(r`\Ga|b`)).toThrow();
expect(() => toDetails(r`a|\Gb`)).toThrow();
});
Expand Down Expand Up @@ -138,31 +139,20 @@ describe('Assertion: Search start', () => {

// Note: Could support by replacing `\G` with `(?!)`, but these forms aren't useful
it('should throw at unmatchable positions', () => {
expect(() => toDetails(r`a\G`)).toThrow();
expect(() => toDetails(r`a\Gb`)).toThrow();
expect(() => toDetails(r`(?<=a\Gb)`)).toThrow();
expect(() => toDetails(r`(?=a\Gb)`)).toThrow();
expect(() => toDetails(r`(?=ab\G)`)).toThrow();
});

it('should allow unsupported forms if allowing all search start anchors', () => {
const patterns = [
r`a\G`,
r`\Ga|b`,
r`(\G|a)b`,
];
patterns.forEach(pattern => {
expect(() => toDetails(pattern)).toThrow();
expect(() => toDetails(pattern, {rules: {ignoreUnsupportedGAnchors: true}})).not.toThrow();
});
});
});

describe('subclass strategies', () => {
// Leading `(^|\G)` and similar
it('should apply line_or_search_start', () => {
// Matches with `^` since not global
expect(toRegExp(r`(^|\G)a`).exec('b\na')?.index).toBe(2);
// Match the first 3 and last 1
// Matched `a`s are the first three and last one
expect('aaabaaacaa\na'.match(toRegExp(r`(^|\G)a`, {global: true}))).toEqual(['a', 'a', 'a', 'a']);
expect(toRegExp(r`(?:^|\G)a`).exec('b\na')?.index).toBe(2);
expect(toRegExp(r`(\G|^)a`).exec('b\na')?.index).toBe(2);
Expand Down
Loading

0 comments on commit c68bcb6

Please sign in to comment.