Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test harness can't deal with Unicode regular expressions #395

Closed
otterley opened this issue Feb 14, 2020 · 2 comments · Fixed by #675
Closed

test harness can't deal with Unicode regular expressions #395

otterley opened this issue Feb 14, 2020 · 2 comments · Fixed by #675
Labels
contract tests I'll make you an offer you can't refuse schema processing

Comments

@otterley
Copy link

The test harness (cfn test) can't handle regex patterns in Resource Provider schemas that use Unicode matchers like \p{L} (Unicode letter).

Here's an example that is based on AWS::IAM::Role from the CloudFormation documentation:

        "Description": {
            "description": "A description of the role",
            "type": "string",
            "maxLength": 1000,
            "pattern": "[\\p{L}\\p{M}\\p{Z}\\p{S}\\p{N}\\p{P}]*"
        },

This is likely due to a limitation of Python's built-in re module.

Consider using the more modern https://pypi.org/project/regex/ library, which understands these.

Error follows:

source = <sre_parse.Tokenizer object at 0x10bbc6850>, escape = '\\p'

    def _class_escape(source, escape):
        # handle escape code inside character class
        code = ESCAPES.get(escape)
        if code:
            return code
        code = CATEGORIES.get(escape)
        if code and code[0] is IN:
            return code
        try:
            c = escape[1:2]
            if c == "x":
                # hexadecimal escape (exactly two digits)
                escape += source.getwhile(2, HEXDIGITS)
                if len(escape) != 4:
                    raise source.error("incomplete escape %s" % escape, len(escape))
                return LITERAL, int(escape[2:], 16)
            elif c == "u" and source.istext:
                # unicode escape (exactly four digits)
                escape += source.getwhile(4, HEXDIGITS)
                if len(escape) != 6:
                    raise source.error("incomplete escape %s" % escape, len(escape))
                return LITERAL, int(escape[2:], 16)
            elif c == "U" and source.istext:
                # unicode escape (exactly eight digits)
                escape += source.getwhile(8, HEXDIGITS)
                if len(escape) != 10:
                    raise source.error("incomplete escape %s" % escape, len(escape))
                c = int(escape[2:], 16)
                chr(c) # raise ValueError for invalid code
                return LITERAL, c
            elif c == "N" and source.istext:
                import unicodedata
                # named unicode escape e.g. \N{EM DASH}
                if not source.match('{'):
                    raise source.error("missing {")
                charname = source.getuntil('}', 'character name')
                try:
                    c = ord(unicodedata.lookup(charname))
                except KeyError:
                    raise source.error("undefined character name %r" % charname,
                                       len(charname) + len(r'\N{}'))
                return LITERAL, c
            elif c in OCTDIGITS:
                # octal escape (up to three digits)
                escape += source.getwhile(2, OCTDIGITS)
                c = int(escape[1:], 8)
                if c > 0o377:
                    raise source.error('octal escape value %s outside of '
                                       'range 0-0o377' % escape, len(escape))
                return LITERAL, c
            elif c in DIGITS:
                raise ValueError
            if len(escape) == 2:
                if c in ASCIILETTERS:
>                   raise source.error('bad escape %s' % escape, len(escape))
E                   re.error: bad escape \p at position 1

../../.pyenv/versions/3.8.1/lib/python3.8/sre_parse.py:349: error
@johnttompkins
Copy link
Contributor

Thanks for raising this. We are using hypothesis strategies to generate the examples. Let me see if the suggested library would play well with hypothesis.

@johnttompkins johnttompkins added the contract tests I'll make you an offer you can't refuse label Feb 18, 2020
@PatMyron
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contract tests I'll make you an offer you can't refuse schema processing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants