Skip to content

Commit

Permalink
Support full reconstruction of HCL from output dictionary (#177)
Browse files Browse the repository at this point in the history
* Initial commit of a "reverse transformer" to turn HCL2 dicts into Lark trees

* Add tests for the reverse reconstructor

* Add different handling to the reverse reconstructor depending on data type

* Add support for multiple block labels

* Fix accidentally escaping quotes within interpolated strings

* Properly handle escapes within HCL strings (closes #171)

* Standardize string output from transformer within nested structures to match Terraform syntax instead of Python (fixes #172)

* Fix block labels and booleans during reconstruction

* Better handle nested interpolation (fixes #173)

* Begin refactor of whitespace handling (more to come)

* overhaul of whitespace handling, remove old logic.

* Fix Pylint warnings

* Fix a few formatting issues in reconstruction

* Add a "builder" class for constructing HCL files from Python

* Update the docs for reconstruction

* fix suggested by Nfsaavedra
#177 (comment)

* a bit of refactoring

* update interpolation test case to include long non-interpolated  substring

---------

Co-authored-by: Kamil Kozik <[email protected]>
  • Loading branch information
weaversam8 and kkozik-amplify authored Jan 16, 2025
1 parent eb2032a commit f8a2c88
Show file tree
Hide file tree
Showing 26 changed files with 1,215 additions and 190 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ with open('foo.tf', 'r') as file:

### Parse Tree to HCL2 reconstruction

With version 5.0.0 the possibility of HCL2 reconstruction from Lark Parse Tree was introduced.
With version 5.x the possibility of HCL2 reconstruction from the Lark Parse Tree and Python dictionaries directly was introduced.

Example of manipulating Lark Parse Tree and reconstructing it back into valid HCL2 can be found in [tree-to-hcl2-reconstruction.md](https://github.com/amplify-education/python-hcl2/blob/main/tree-to-hcl2-reconstruction.md) file.
Documentation and an example of manipulating Lark Parse Tree and reconstructing it back into valid HCL2 can be found in [tree-to-hcl2-reconstruction.md](https://github.com/amplify-education/python-hcl2/blob/main/tree-to-hcl2-reconstruction.md) file.

More details about reconstruction implementation can be found in this [PR](https://github.com/amplify-education/python-hcl2/pull/169).
More details about reconstruction implementation can be found in PRs #169 and #177.

## Building From Source

Expand Down
13 changes: 12 additions & 1 deletion hcl2/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,15 @@
except ImportError:
__version__ = "unknown"

from .api import load, loads, parse, parses, transform, writes, AST
from .api import (
load,
loads,
parse,
parses,
transform,
reverse_transform,
writes,
AST,
)

from .builder import Builder
23 changes: 18 additions & 5 deletions hcl2/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from typing import TextIO

from lark.tree import Tree as AST
from hcl2.parser import hcl2
from hcl2.parser import parser
from hcl2.transformer import DictTransformer


Expand All @@ -25,7 +25,7 @@ def loads(text: str, with_meta=False) -> dict:
# Lark doesn't support a EOF token so our grammar can't look for "new line or end of file"
# This means that all blocks must end in a new line even if the file ends
# Append a new line as a temporary fix
tree = hcl2.parse(text + "\n")
tree = parser().parse(text + "\n")
return DictTransformer(with_meta=with_meta).transform(tree)


Expand All @@ -42,11 +42,11 @@ def parses(text: str) -> AST:
"""
# defer this import until this method is called, due to the performance hit
# of rebuilding the grammar without cache
from hcl2.reconstructor import ( # pylint: disable=import-outside-toplevel
hcl2 as uncached_hcl2,
from hcl2.parser import ( # pylint: disable=import-outside-toplevel
reconstruction_parser,
)

return uncached_hcl2.parse(text)
return reconstruction_parser().parse(text)


def transform(ast: AST, with_meta=False) -> dict:
Expand All @@ -56,6 +56,19 @@ def transform(ast: AST, with_meta=False) -> dict:
return DictTransformer(with_meta=with_meta).transform(ast)


def reverse_transform(hcl2_dict: dict) -> AST:
"""Convert a dictionary to an HCL2 AST.
:param dict: a dictionary produced by `load` or `transform`
"""
# defer this import until this method is called, due to the performance hit
# of rebuilding the grammar without cache
from hcl2.reconstructor import ( # pylint: disable=import-outside-toplevel
hcl2_reverse_transformer,
)

return hcl2_reverse_transformer.transform(hcl2_dict)


def writes(ast: AST) -> str:
"""Convert an HCL2 syntax tree to a string.
:param ast: HCL2 syntax tree, output from `parse` or `parses`
Expand Down
63 changes: 63 additions & 0 deletions hcl2/builder.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""A utility class for constructing HCL documents from Python code."""

from typing import List, Optional


class Builder:
"""
The `hcl2.Builder` class produces a dictionary that should be identical to the
output of `hcl2.load(example_file, with_meta=True)`. The `with_meta` keyword
argument is important here. HCL "blocks" in the Python dictionary are
identified by the presence of `__start_line__` and `__end_line__` metadata
within them. The `Builder` class handles adding that metadata. If that metadata
is missing, the `hcl2.reconstructor.HCLReverseTransformer` class fails to
identify what is a block and what is just an attribute with an object value.
"""

def __init__(self, attributes: Optional[dict] = None):
self.blocks: dict = {}
self.attributes = attributes or {}

def block(
self, block_type: str, labels: Optional[List[str]] = None, **attributes
) -> "Builder":
"""Create a block within this HCL document."""
labels = labels or []
block = Builder(attributes)

# initialize a holder for blocks of that type
if block_type not in self.blocks:
self.blocks[block_type] = []

# store the block in the document
self.blocks[block_type].append((labels.copy(), block))

return block

def build(self):
"""Return the Python dictionary for this HCL document."""
body = {
"__start_line__": -1,
"__end_line__": -1,
**self.attributes,
}

for block_type, blocks in self.blocks.items():

# initialize a holder for blocks of that type
if block_type not in body:
body[block_type] = []

for labels, block_builder in blocks:
# build the sub-block
block = block_builder.build()

# apply any labels
labels.reverse()
for label in labels:
block = {label: block}

# store it in the body
body[block_type].append(block)

return body
30 changes: 22 additions & 8 deletions hcl2/hcl2.lark
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
start : body
body : (new_line_or_comment? (attribute | block))* new_line_or_comment?
attribute : identifier EQ expression
block : identifier (identifier | STRING_LIT)* new_line_or_comment? "{" body "}"
block : identifier (identifier | STRING_LIT | string_with_interpolation)* new_line_or_comment? "{" body "}"
new_line_and_or_comma: new_line_or_comment | "," | "," new_line_or_comment
new_line_or_comment: ( NL_OR_COMMENT )+
NL_OR_COMMENT: /\n[ \t]*/ | /#.*\n/ | /\/\/.*\n/ | /\/\*(.|\n)*?(\*\/)/
Expand All @@ -22,12 +22,26 @@ conditional : expression "?" new_line_or_comment? expression new_line_or_comment
binary_op : expression binary_term new_line_or_comment?
!binary_operator : BINARY_OP
binary_term : binary_operator new_line_or_comment? expression
BINARY_OP : "==" | "!=" | "<" | ">" | "<=" | ">=" | "-" | "*" | "/" | "%" | "&&" | "||" | "+"
BINARY_OP : DOUBLE_EQ | NEQ | LT | GT | LEQ | GEQ | MINUS | ASTERISK | SLASH | PERCENT | DOUBLE_AMP | DOUBLE_PIPE | PLUS
DOUBLE_EQ : "=="
NEQ : "!="
LT : "<"
GT : ">"
LEQ : "<="
GEQ : ">="
MINUS : "-"
ASTERISK : "*"
SLASH : "/"
PERCENT : "%"
DOUBLE_AMP : "&&"
DOUBLE_PIPE : "||"
PLUS : "+"

expr_term : "(" new_line_or_comment? expression new_line_or_comment? ")"
| float_lit
| int_lit
| STRING_LIT
| string_with_interpolation
| tuple
| object
| function_call
Expand All @@ -42,11 +56,10 @@ expr_term : "(" new_line_or_comment? expression new_line_or_comment? ")"
| for_tuple_expr
| for_object_expr


STRING_LIT : "\"" (STRING_CHARS | INTERPOLATION)* "\""
STRING_CHARS : /(?:(?!\${)([^"\\]|\\.))+/+ // any character except '"" unless inside a interpolation string
NESTED_INTERPOLATION : "${" /[^}]+/ "}"
INTERPOLATION : "${" (/(?:(?!\${)([^}]))+/ | NESTED_INTERPOLATION)+ "}"
STRING_LIT : "\"" STRING_CHARS? "\""
STRING_CHARS : /(?:(?!\${)([^"\\]|\\.))+/ // any character except '"'
string_with_interpolation: "\"" (STRING_CHARS)* interpolation_maybe_nested (STRING_CHARS | interpolation_maybe_nested)* "\""
interpolation_maybe_nested: "${" expression "}"

int_lit : DECIMAL+
!float_lit: DECIMAL+ "." DECIMAL+ (EXP_MARK DECIMAL+)?
Expand Down Expand Up @@ -77,8 +90,9 @@ get_attr : "." identifier
attr_splat : ".*" get_attr*
full_splat : "[*]" (get_attr | index)*

FOR_OBJECT_ARROW : "=>"
!for_tuple_expr : "[" new_line_or_comment? for_intro new_line_or_comment? expression new_line_or_comment? for_cond? new_line_or_comment? "]"
!for_object_expr : "{" new_line_or_comment? for_intro new_line_or_comment? expression "=>" new_line_or_comment? expression "..."? new_line_or_comment? for_cond? new_line_or_comment? "}"
!for_object_expr : "{" new_line_or_comment? for_intro new_line_or_comment? expression FOR_OBJECT_ARROW new_line_or_comment? expression "..."? new_line_or_comment? for_cond? new_line_or_comment? "}"
!for_intro : "for" new_line_or_comment? identifier ("," identifier new_line_or_comment?)? new_line_or_comment? "in" new_line_or_comment? expression new_line_or_comment? ":" new_line_or_comment?
!for_cond : "if" new_line_or_comment? expression

Expand Down
40 changes: 33 additions & 7 deletions hcl2/parser.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""A parser for HCL2 implemented using the Lark parser"""
import functools
from pathlib import Path

from lark import Lark
Expand All @@ -7,10 +8,35 @@
PARSER_FILE = Path(__file__).absolute().resolve().parent / ".lark_cache.bin"


hcl2 = Lark.open(
"hcl2.lark",
parser="lalr",
cache=str(PARSER_FILE), # Disable/Delete file to effect changes to the grammar
rel_to=__file__,
propagate_positions=True,
)
@functools.lru_cache()
def parser() -> Lark:
"""Build standard parser for transforming HCL2 text into python structures"""
return Lark.open(
"hcl2.lark",
parser="lalr",
cache=str(PARSER_FILE), # Disable/Delete file to effect changes to the grammar
rel_to=__file__,
propagate_positions=True,
)


@functools.lru_cache()
def reconstruction_parser() -> Lark:
"""
Build parser for transforming python structures into HCL2 text.
This is duplicated from `parser` because we need different options here for
the reconstructor. Please make sure changes are kept in sync between the two
if necessary.
"""
return Lark.open(
"hcl2.lark",
parser="lalr",
# Caching must be disabled to allow for reconstruction until lark-parser/lark#1472 is fixed:
#
# https://github.com/lark-parser/lark/issues/1472
#
# cache=str(PARSER_FILE), # Disable/Delete file to effect changes to the grammar
rel_to=__file__,
propagate_positions=True,
maybe_placeholders=False, # Needed for reconstruction
)
Loading

0 comments on commit f8a2c88

Please sign in to comment.