Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizing columns: add some checks #138

Merged
merged 4 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions times_reader/datatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ class Tag(str, Enum):
uc_sets = "~UC_SETS"
uc_t = "~UC_T"
# This is used by Veda for unit conversion when displaying results
# unitconversion = "~UNITCONVERSION"
unitconversion = "~UNITCONVERSION"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olejandro is it okay to uncomment this? This is needed for the check below.

Copy link
Member

@olejandro olejandro Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely okay to uncomment it. I am wondering how we should go about this one though, because we won't use the info provided by this type of an input table (so there is no need to normalise or do anything else about it). At the same time, it is good for it to appear in raw_tables.txt for diffing purposes.


@classmethod
def has_tag(cls, tag):
Expand Down Expand Up @@ -270,15 +270,28 @@ def _read_mappings(filename: str) -> List[TimesXlMap]:
def _read_veda_tags_info(
veda_tags_file: str,
) -> Tuple[Dict[Tag, Dict[str, str]], Dict[Tag, Dict[str, list]]]:
def to_tag(s: str) -> Tag:
# The file stores the tag name in lowercase, and without the ~
return Tag("~" + s.upper())

# Read veda_tags_file
with resources.open_text("times_reader.config", veda_tags_file) as f:
veda_tags_info = json.load(f)

# Check that all the tags we use are present in veda_tags_file
tags = {to_tag(tag_info["tag_name"]) for tag_info in veda_tags_info}
for tag in Tag:
if tag not in tags:
print(
f"WARNING: datatypes.Tag has an unknown Tag {tag} not in {veda_tags_file}"
)

column_aliases = {}
row_comment_chars = {}

for tag_info in veda_tags_info:
if "tag_fields" in tag_info:
# The file stores the tag name in lowercase, and without the ~
tag_name = "~" + tag_info["tag_name"].upper()
tag_name = to_tag(tag_info["tag_name"])
# Process column aliases:
column_aliases[tag_name] = {}
names = tag_info["tag_fields"]["fields_names"]
Expand Down
4 changes: 4 additions & 0 deletions times_reader/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,10 @@ def normalize_column_aliases(
)
else:
print(f"WARNING: could not find {table.tag} in config.column_aliases")
if len(set(table.dataframe.columns)) > len(table.dataframe.columns):
raise ValueError(
f"Table has duplicate column names (after normalization): {table}"
)
return tables


Expand Down