Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar Breaks between versions for TF and HCL specifically #163

Open
DevGumbo opened this issue Jun 7, 2024 · 5 comments
Open

Grammar Breaks between versions for TF and HCL specifically #163

DevGumbo opened this issue Jun 7, 2024 · 5 comments

Comments

@DevGumbo
Copy link

DevGumbo commented Jun 7, 2024

I have been trying to create a tool that will crawl all the hcl in our terragrunt directories and ran into an interesting issue.
When i update to version 4.3.3 the hcl parser throws an error as follows.

I also understand that i am doing odd things with a change out of the question mark. That was on my journey to figure out what was going on with the parser.

terragrunt.hcl: Unexpected token Token('__ANON_3', 'protocol') at line 31, column 7.
Expected one of: 
        * MORETHAN
        * __ANON_4
        * RBRACE
        * __ANON_1
        * __ANON_9
        * LESSTHAN
        * PERCENT
        * STAR
        * SLASH
        * __ANON_7
        * __ANON_0
        * COMMA
        * QMARK
        * __ANON_5
        * __ANON_6
        * __ANON_8
        * __ANON_2
        * MINUS
        * PLUS
        ```
        This is from code like
        ```hcl
        locals {
  common_vars  = read_terragrunt_config(find_in_parent_folders("index1.hcl"))
  account_vars = read_terragrunt_config(find_in_parent_folders("index2.hcl"))
  region_vars  = read_terragrunt_config(find_in_parent_folders("index3.hcl"))
  vpc_vars     = read_terragrunt_config(find_in_parent_folders("index4.hcl"))

  defaults = local.common_vars.locals.defaults

  # Get the rules from defaults.yaml
  default_inbound_nacl_rules = {
    for index, rule in local.defaults.vpc_app_inbound_nacl_rules : "default_rule_${index}" => {
      client_cidr_block = rule["client_cidr_block"]
      rule_number       = 10 + index
      protocol          = rule["protocol"]
      from_port         = rule["from_port"]
      to_port           = rule["to_port"]
      icmp_code         = rule["icmp_code"]
      icmp_type         = rule["icmp_type"]
    }
  }
  default_outbound_nacl_rules = {
    for index, rule in local.defaults.vpc_app_outbound_nacl_rules : "default_rule_${index}" => {
      client_cidr_block = rule["client_cidr_block"]
      rule_number       = 10 + index
      protocol          = rule["protocol"]
      from_port         = rule["from_port"]
      to_port           = rule["to_port"]
      icmp_code         = rule["icmp_code"]
      icmp_type         = rule["icmp_type"]
    }
  }

}

however i am able to run terraform parser effeciently.

When i downgrade to 4.3.0 to make the hcl work, i get this error for the terraform

test_vpc.tf after replacement: Unexpected token Token('DECIMAL', '1') at line 129, column 71.
Expected one of: 
        * /[a-zA-Z_][a-zA-Z0-9_-]*/
        * EQUAL
        * LBRACE
        * STRING_LIT
Previous tokens: [Token('__ANON_3', '_QUESTION_MARK_')]

terraform is as follows

locals {
  destination_route_tables = compact(concat(
    [module.vpc.public_subnet_route_table_id],
    module.vpc.private_app_subnet_route_table_ids,
    module.vpc.private_persistence_route_table_ids,
  ))

  # Ideally, this will be length(local.destination_route_tables) but terraform has restrictions around count depending
  # on resources that don't exist, so we have to rely on summation logic using information that is available at plan
  # time. Note that this requires some knowledge of route table logic in the vpc module.
  # TODO: expose additional outputs in vpc module to help simplify this.
  num_destination_route_tables = (
    local.num_public_route_tables + local.num_private_app_route_tables + local.num_private_persistence_route_tables
  )
  # 1 route table for all public subnets
  LINE 129 >> num_public_route_tables = var.create_public_subnets ? 1 : 0
  # 1 route table for each AZ for private app and private persistence subnet tiers
  num_private_app_route_tables         = var.create_private_app_subnets ? module.vpc.num_availability_zones : 0
  num_private_persistence_route_tables = var.create_private_persistence_subnets ? module.vpc.num_availability_zones : 0

}

Here is the code for the tf parser

import hcl2
import os
import lark

def parse_tf_file(file_path):
    try:
        with open(file_path, 'r') as file:
            content = file.read()
        return hcl2.loads(content)
    except lark.exceptions.UnexpectedToken as e:
        if '?' in content:
            # Replace '?' with a placeholder
            content = content.replace('?', '_QUESTION_MARK_')
            try:
                return hcl2.loads(content)
            except Exception as inner_e:
                print(f"Error parsing Terraform file {file_path} after replacement: {inner_e}")
                return None
        else:
            print(f"Error parsing Terraform file {file_path}: {e}")
            return None
    except Exception as e:
        print(f"Error parsing Terraform file {file_path}: {e}")
        return None

def main():
    tf_file_path = './test_vpc.tf'  # Change this to the path of your .tf file
    parsed_data = parse_tf_file(tf_file_path)
    if parsed_data:
        print("Parsed data successfully:")
        print(parsed_data)
    else:
        print("Failed to parse the file.")

if __name__ == "__main__":
    main()

Here is the code for the hcl parser

import hcl2
import lark

def parse_hcl_file(file_path):
    try:
        with open(file_path, 'r') as file:
            content = file.read()
        return hcl2.loads(content)
    except lark.exceptions.UnexpectedToken as e:
        print(f"Error parsing HCL file {file_path}: {e}")
        return None
    except Exception as e:
        print(f"Error parsing HCL file {file_path}: {e}")
        return None

def main():
    hcl_file_path = 'erragrunt.hcl'  # Change this to the path of your .hcl file
    parsed_data = parse_hcl_file(hcl_file_path)
    if parsed_data:
        print("Parsed data successfully:")
        print(parsed_data)
    else:
        print("Failed to parse the file.")

if __name__ == "__main__":
    main()
@DevGumbo
Copy link
Author

DevGumbo commented Jun 7, 2024

so i am comparing the lark file and going back and forth between the functions.
If i change line 16 in hte larke file from

binary_op : expression binary_term new_line_or_comment?

to

binary_op : expression binary_term

the terraform specific parser will work but the hcl specific parsing does not.

I leave both those settings as they are on the newest version, the HCL proper parser will work but the terrafrom parser will break.

@DevGumbo
Copy link
Author

DevGumbo commented Jun 7, 2024

so in order to make it work for me, i just forked the version and instantiated different modules , one for tf and one for hcl.

This get me where i want to go and you guys are doing great work. There is something unique in the lark file on lines

start : body
body : (new_line_or_comment? (attribute | block))* new_line_or_comment?
attribute : identifier "=" expression

LINE 3/4 changes

block : identifier (identifier | STRING_LIT)* new_line_or_comment? "{" body "}" ## << 4.3.3 WORKS FOR HCL PROPER*##
OR block : identifier (identifier | STRING_LIT)* "{" body "}" *## <<4.3.0 DOESNT WORK FOR TF PROPER FILES*## 

new_line_and_or_comma: new_line_or_comment | "," | "," new_line_or_comment
new_line_or_comment: ( /\n/ | /#.\n/ | ///.\n/ )+

identifier : /[a-zA-Z_][a-zA-Z0-9_-]*/

?expression : expr_term | operation | conditional

conditional : expression "?" new_line_or_comment? expression new_line_or_comment? ":" new_line_or_comment? expression

?operation : unary_op | binary_op
!unary_op : ("-" | "!") expr_term

Line 16/15 changes

binary_op : expression binary_term new_line_or_comment?## << 4.3.3 WORKS FOR HCL PROPER##
OR binary_op : expression binary_term ## <<4.3.0 DOESNT WORK FOR  TF PROPER FILES##

!binary_operator : "==" | "!=" | "<" | ">" | "<=" | ">=" | "-" | "*" | "/" | "%" | "&&" | "||" | "+"
binary_term : binary_operator new_line_or_comment? expression

expr_term : "(" new_line_or_comment? expression new_line_or_comment? ")"
| float_lit
| int_lit
| STRING_LIT
| tuple
| object
| function_call
| index_expr_term
| get_attr_expr_term
| identifier
| heredoc_template
| heredoc_template_trim
| attr_splat_expr_term
| full_splat_expr_term
| for_tuple_expr
| for_object_expr

STRING_LIT : """ (STRING_CHARS | INTERPOLATION)* """
STRING_CHARS : /(?:(?!${)([^"\\]|\.))+/+ // any character except '"" unless inside a interpolation string
NESTED_INTERPOLATION : "${" /[^}]+/ "}"
INTERPOLATION : "${" (/(?:(?!${)([^}]))+/ | NESTED_INTERPOLATION)+ "}"

int_lit : DECIMAL+
!float_lit: DECIMAL+ "." DECIMAL+ (EXP_MARK DECIMAL+)?
| DECIMAL+ ("." DECIMAL+)? EXP_MARK DECIMAL+
DECIMAL : "0".."9"
EXP_MARK : ("e" | "E") ("+" | "-")?

tuple : "[" (new_line_or_comment* expression new_line_or_comment* ",")* (new_line_or_comment* expression)? new_line_or_comment* "]"
object : "{" new_line_or_comment? (object_elem (new_line_and_or_comma object_elem )* new_line_and_or_comma?)? "}"
object_elem : (identifier | expression) ("=" | ":") expression

heredoc_template : /<<(?P[a-zA-Z][a-zA-Z0-9.-]+)\n(?:.|\n)*?(?P=heredoc)/
heredoc_template_trim : /<<-(?P<heredoc_trim>[a-zA-Z][a-zA-Z0-9.
-]+)\n(?:.|\n)*?(?P=heredoc_trim)/

function_call : identifier "(" new_line_or_comment? arguments? new_line_or_comment? ")"
arguments : (expression (new_line_or_comment* "," new_line_or_comment* expression)* ("," | "...")? new_line_or_comment*)

index_expr_term : expr_term index
get_attr_expr_term : expr_term get_attr
attr_splat_expr_term : expr_term attr_splat
full_splat_expr_term : expr_term full_splat
index : "[" new_line_or_comment? expression new_line_or_comment? "]" | "." DECIMAL+
get_attr : "." identifier
attr_splat : "." get_attr
full_splat : "[]" (get_attr | index)

!for_tuple_expr : "[" new_line_or_comment? for_intro new_line_or_comment? expression new_line_or_comment? for_cond? new_line_or_comment? "]"
!for_object_expr : "{" new_line_or_comment? for_intro new_line_or_comment? expression "=>" new_line_or_comment? expression "..."? new_line_or_comment? for_cond? new_line_or_comment? "}"
!for_intro : "for" new_line_or_comment? identifier ("," identifier new_line_or_comment?)? new_line_or_comment? "in" new_line_or_comment? expression new_line_or_comment? ":" new_line_or_comment?
!for_cond : "if" new_line_or_comment? expression

%ignore /[ \t]+/
%ignore //*(.|\n)*?(*/)/

@kkozik-amplify
Copy link
Collaborator

What do you mean by terraform specific parser and hcl specific parser?
The library uses only one parser:

hcl2 = Lark.open(
"hcl2.lark",
parser="lalr",
cache=str(PARSER_FILE), # Disable/Delete file to effect changes to the grammar
rel_to=__file__,
propagate_positions=True,
)
.tf files are written in HCL2 language.

@RPitt
Copy link

RPitt commented Jul 5, 2024

Just to note that we also recently ran into problems parsing terraform projects after upgrading from 4.3.0 to 4.3.4

The problem can be reproduced in 4.3.4 using a simple hcl format data file like so:

somedata = {
  number = 8 * 1024
  number2 = 4
}

which results in: Unexpected token Token('__ANON_3', 'number2') at line 3, column 3.

So presumably the issue is that recent changes broke the parsing of arithmetic expressions such as 8 * 1024

update: it's not just arithmetic expressions, see link added below

@DevGumbo
Copy link
Author

DevGumbo commented Jul 6, 2024

What do you mean by terraform specific parser and hcl specific parser? The library uses only one parser:

hcl2 = Lark.open(
"hcl2.lark",
parser="lalr",
cache=str(PARSER_FILE), # Disable/Delete file to effect changes to the grammar
rel_to=__file__,
propagate_positions=True,
)

.tf files are written in HCL2 language.

I mean, that in reading HCL within terraform and its objects breaks on the specified version but will make the terragrunt hybrid HCL work.

so when on on the older versions of this language pack, i am able to work with the terragrunt hybrid HCL/GO componenets with no issue, but the TF HCL has problems with question marks and ternaries.

If if upgrade, the newer version will be able to handle the hybrid terraform HCL better, but now the Terragrunt Hybrid HCL\GO object blocks wont parse and throw the errors stated above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants