Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch common resource schema issues in cfn validate #675

Merged
merged 10 commits into from
Feb 5, 2021
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ include_trailing_comma = true
combine_as_imports = True
force_grid_wrap = 0
known_first_party = rpdk
known_third_party = boto3,botocore,cfnlint,colorama,docker,hypothesis,jinja2,jsonschema,ordered_set,pkg_resources,pytest,pytest_localserver,setuptools,yaml
known_third_party = boto3,botocore,cfnlint,colorama,docker,hypothesis,jinja2,jsonschema,nested_lookup,ordered_set,pkg_resources,pytest,pytest_localserver,setuptools,yaml

[tool:pytest]
# can't do anything about 3rd part modules, so don't spam us
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ def find_version(*file_paths):
"docker>=4.3.1",
"ordered-set>=4.0.2",
"cfn-lint>=0.43.0",
"nested-lookup",
],
entry_points={
"console_scripts": ["cfn-cli = rpdk.core.cli:main", "cfn = rpdk.core.cli:main"]
Expand Down
102 changes: 101 additions & 1 deletion src/rpdk/core/data_loaders.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import json
import logging
import os
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this type of testing. Get rid of some of the java patterns would be helpful. My question is the python re the one we want to standardize on? For instance ^[\d\w-_.+]*$ is technically valid but in python it needs to be ^[\d\w\-_.+]*$.

Copy link
Contributor Author

@PatMyron PatMyron Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, resource schema patterns should be valid and equivalent in both Python and Java (and more)

JSON schema itself recommends sticking to a minimal subset of regular expression syntax. Let's encourage the same

Just emitting warnings for patterns not valid in Python seems like a good start in that direction:

  23 Could not validate regular expression: ^([\p{L}\p{Z}\p{N}_.:/=+\-@]*)$
   6 Could not validate regular expression: ^[a-zA-Z0-9_\-\+\./\(\)\$\p{Zs}]+$
   4 Could not validate regular expression: ^[a-zA-Z0-9-]+{1,255}$
   3 Could not validate regular expression: ^([\p{L}\p{Z}\p{N}_.:=+\/\-@%]*)$
   3 Could not validate regular expression: [\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF  ]*
   2 Could not validate regular expression: ^[\/]+([^~]*(~[01])*)*{1,512}$
   2 Could not validate regular expression: ^((?!aws:)[\p{L}\p{Z}\p{N}_.:/=+\-@]*)$
   2 Could not validate regular expression: [\p{L}\p{M}\p{Z}\p{S}\p{N}\p{P}]*
   1 Could not validate regular expression: ^[a-zA-Z0-9_\-\+\./\(\)\p{Zs}]*$
   1 Could not validate regular expression: ^[a-zA-Z0-9\s-_()\[\]]+$
   1 Could not validate regular expression: ^[a-zA-Z-0-9-:\/]*{1,1000}$
   1 Could not validate regular expression: ^[a-zA-Z-0-9-:\/.]*{1,1000}$
   1 Could not validate regular expression: ^[\p{L}\p{Z}\p{N}_.:\/=+\-@%]*$
   1 Could not validate regular expression: ^[\p{L}\p{Z}\p{N}_.:/=+\-@]*$|resource-groups:Name
   1 Could not validate regular expression: ^[A-Za-z0-9._\-:\/#\+]+{1,255}$
   1 Could not validate regular expression: ^(?!aws:.*)([\p{L}\p{Z}\p{N}_.:=+\/\-@%]*)$
   1 Could not validate regular expression: ^(?!aws:)[\p{L}\p{Z}\p{N}_.:\/=+\-@%]*$
   1 Could not validate regular expression: [^_\p{Z}][\p{L}\p{M}\p{S}\p{N}\p{P}][^_\p{Z}]+
   1 Could not validate regular expression: [\p{L}\p{Z}\p{N}_.:\/=+\-@]+
   1 Could not validate regular expression: [\p{L}\p{Z}\p{N}_.:\/=+\-@\[\]\{\}\$\\"]*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON schema itself recommends sticking to a minimal subset of regular expression syntax. Let's encourage the same

Agreed completely here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golang has some interesting regex limitations even within that minimal subset:
hashicorp/terraform-provider-awscc#88
golang/go#7252

import shutil
from io import TextIOWrapper
from pathlib import Path
Expand All @@ -9,9 +10,12 @@
import yaml
from jsonschema import Draft7Validator, RefResolver
from jsonschema.exceptions import RefResolutionError, ValidationError
from nested_lookup import nested_lookup

from .exceptions import InternalError, SpecValidationError
from .jsonutils.flattener import JsonSchemaFlattener
from .jsonutils.inliner import RefInliner
from .jsonutils.utils import FlatteningError

LOG = logging.getLogger(__name__)

Expand Down Expand Up @@ -123,7 +127,7 @@ def get_file_base_uri(file):
return path.resolve().as_uri()


def load_resource_spec(resource_spec_file): # pylint: disable=R0912 # noqa: C901
def load_resource_spec(resource_spec_file): # pylint: disable=R # noqa: C901
"""Load a resource provider definition from a file, and validate it."""
try:
resource_spec = json.load(resource_spec_file)
Expand All @@ -141,6 +145,102 @@ def load_resource_spec(resource_spec_file): # pylint: disable=R0912 # noqa: C90
LOG.debug("Resource spec validation failed", exc_info=True)
raise SpecValidationError(str(e)) from e

min_max_keywords = {
"minimum",
"maximum",
"minLength",
"maxLength",
"minProperties",
"maxProperties",
"minItems",
"maxItems",
"exclusiveMinimum",
"exclusiveMaximum",
}
try: # pylint: disable=R
for _key, schema in JsonSchemaFlattener(resource_spec).flatten_schema().items():
for property_name, property_details in schema.get("properties", {}).items():
if property_name[0].islower():
LOG.warning(
"CloudFormation properties don't usually start with lowercase letters: %s",
property_name,
)
try:
property_type = property_details["type"]
property_keywords = property_details.keys()
for types, allowed_keywords in [
(
{"integer", "number"},
{
"minimum",
"maximum",
"exclusiveMinimum",
"exclusiveMaximum",
},
),
(
{"string"},
{
"minLength",
"maxLength",
},
),
(
{"object"},
{
"minProperties",
"maxProperties",
},
),
(
{"array"},
{
"minItems",
"maxItems",
},
),
]:
if (
property_type in types
and min_max_keywords - allowed_keywords & property_keywords
):
LOG.warning(
"Incorrect min/max JSON schema keywords for type: %s for property: %s",
property_type,
property_name,
)
except (KeyError, TypeError):
pass
except FlatteningError:
pass

for pattern in nested_lookup("pattern", resource_spec):
if "arn:aws:" in pattern:
LOG.warning(
"Don't hardcode the aws partition in ARN patterns: %s",
pattern,
)
try:
re.compile(pattern)
except re.error:
LOG.warning("Could not validate regular expression: %s", pattern)

for enum in nested_lookup("enum", resource_spec):
if len(enum) > 15:
LOG.warning(
"Consider not manually maintaining large constantly evolving enums like \
instance types, lambda runtimes, partitions, regions, availability zones, etc. that get outdated quickly: %s",
enum,
)

non_ascii_chars = re.findall(
r"[^ -~]", json.dumps(resource_spec, ensure_ascii=False)
)
if non_ascii_chars:
LOG.warning(
"non-ASCII characters found in resource schema: %s", non_ascii_chars
)

list_options = {
"maxresults",
"maxrecords",
Expand Down
14 changes: 14 additions & 0 deletions tests/data/schema/valid/invalid_lowercase_property.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"typeName" : "AWS::Service::Type",
"description" : "",
"additionalProperties" : false,
"properties" : {
"property" : {
"type" : "string"
}
},
"readOnlyProperties": [
"/properties/property"
],
"primaryIdentifier" : [ "/properties/property" ]
}
14 changes: 14 additions & 0 deletions tests/data/schema/valid/invalid_non_ascii_chars.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"typeName" : "AWS::Service::Type",
"description" : "🤠",
"additionalProperties" : false,
"properties" : {
"Property" : {
"type" : "string"
}
},
"readOnlyProperties": [
"/properties/Property"
],
"primaryIdentifier" : [ "/properties/Property" ]
}
17 changes: 17 additions & 0 deletions tests/data/schema/valid/invalid_property_constraints.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"typeName" : "AWS::Service::Type",
"description" : "",
"additionalProperties" : false,
"properties" : {
"Property" : {
"type" : "string",
"pattern" : "arn:aws:[A-Za-z0-9]+{1,255}",
"enum" : [ "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z" ],
"minimum" : 0
}
},
"readOnlyProperties": [
"/properties/Property"
],
"primaryIdentifier" : [ "/properties/Property" ]
}