Skip to content
/ IntXeger Public

Generate unique strings from regular expressions.

License

Notifications You must be signed in to change notification settings

k15z/IntXeger

Repository files navigation

IntXeger

Build Status Documentation Code Coverage PyPI MIT

IntXeger (pronounced "integer") is a Python library for generating strings from regular expressions. Some of its core features include:

  • Support for most common regular expression operations.
  • Array-like indexing for mapping integers to matching strings.
  • Generator interface for sequentially sampling matching strings.
  • Sampling-without-replacement for generating a set of unique strings.

Compared to popular alternatives such as xeger and exrex, IntXeger is an order of magnitude faster at generating strings and offers unique functionality such as array-like indexing and sampling-without-replacement.

Installation

You can install the latest stable release of IntXeger by running:

pip install intxeger

Quick Start

Let's start with a simple example where our regex specifies a two-character string that only contains lowercase letters.

import intxeger
x = intxeger.build("[a-z]{2}")

You can check the number of strings that can be generated from this regex using the length attribute and generate the ith matching string using the get(i) method.

assert x.length == 26**2 # there are 676 unique strings which match this regex
assert x.get(15) == 'ap' # the 15th unique string is 'ap'

Furthermore, you can generate N unique strings which match this regex using the sample(N) method. Note that N must be less than or equal to the length.

print(x.sample(N=10))
# ['xt', 'rd', 'jm', 'pj', 'jy', 'sp', 'cm', 'ag', 'cb', 'yt']

Here's a more complicated regex which specifies a timestamp.

x = intxeger.build(r"(1[0-2]|0[1-9])(:[0-5]\d){2} (A|P)M")
print(x.sample(N=2))
# ['11:57:12 AM', '01:16:01 AM']

You can also print matches on the command line.

$ intxeger --order=desc "[a-c]"
c
b
a
$ python3 -m intxeger -0 'base/[ab]/[12]' | xargs -0 mkdir -p
$ tree base/
base
├── a
│   ├── 1
│   └── 2
└── b
    ├── 1
    └── 2

To learn more about the functionality provided by IntXeger, check out our documentation!

Benchmark

This table, generated by benchmark.py, shows the amount of time in milliseconds required to generate N examples of each regular expression using xeger and intxeger.

regex N xeger exrex intxeger
[a-zA-Z]+ 100 7.36 3.17 1.09
[0-9]{3}-[0-9]{3}-[0-9]{4} 100 11.59 6.25 0.8
[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4} 1000 208.62 91.3 18.28
/json/([0-9]{4})/([a-z]{4}) 1000 133.36 107.01 12.18

Have a regular expression that isn't represented here? Check out our Contributing Guide and submit a pull request!

About

Generate unique strings from regular expressions.

Resources

License

Stars

Watchers

Forks