Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizing columns: add some checks #138

Merged
merged 4 commits into from
Dec 1, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion times_reader/datatypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ class Tag(str, Enum):
uc_sets = "~UC_SETS"
uc_t = "~UC_T"
# This is used by Veda for unit conversion when displaying results
# unitconversion = "~UNITCONVERSION"
unitconversion = "~UNITCONVERSION"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olejandro is it okay to uncomment this? This is needed for the check below.

Copy link
Member

@olejandro olejandro Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely okay to uncomment it. I am wondering how we should go about this one though, because we won't use the info provided by this type of an input table (so there is no need to normalise or do anything else about it). At the same time, it is good for it to appear in raw_tables.txt for diffing purposes.


@classmethod
def has_tag(cls, tag):
Expand Down Expand Up @@ -258,6 +258,8 @@ def _read_veda_tags_info(veda_tags_file: str) -> Dict[Tag, Dict[str, str]]:
if "tag_fields" in tag_info:
# The file stores the tag name in lowercase, and without the ~
tag_name = "~" + tag_info["tag_name"].upper()
if not Tag.has_tag(tag_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddharth-krishna I am thinking of veda_tags.json as of reference which may have more information that is needed / used by the tool. In this case I am wondering whether the direction of the warning should change? I.e. the tool should alert if a tag is defined, but the info about it is not present in veda_tags.json?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Btw if I add a warning in the other direction, we are at present missing the following:

WARNING: datatypes.Tag has an unknown Tag Tag.comagg not in veda-tags.json
WARNING: datatypes.Tag has an unknown Tag Tag.comemi not in veda-tags.json
WARNING: datatypes.Tag has an unknown Tag Tag.tfm_fill_r not in veda-tags.json
WARNING: datatypes.Tag has an unknown Tag Tag.tfm_ins_txt not in veda-tags.json
WARNING: datatypes.Tag has an unknown Tag Tag.time_slices not in veda-tags.json
WARNING: datatypes.Tag has an unknown Tag Tag.tradelinks not in veda-tags.json
WARNING: datatypes.Tag has an unknown Tag Tag.tradelinks_dins not in veda-tags.json
WARNING: datatypes.Tag has an unknown Tag Tag.uc_sets not in veda-tags.json

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool! At least one of them will be gone after merging.

raise ValueError(f"{veda_tags_file} has an unknown Tag {tag_name}")
column_aliases[tag_name] = {}
names = tag_info["tag_fields"]["fields_names"]
aliases = tag_info["tag_fields"]["fields_aliases"]
Expand Down
4 changes: 4 additions & 0 deletions times_reader/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -525,6 +525,10 @@ def normalize_column_aliases(
)
else:
print(f"WARNING: could not find {table.tag} in config.column_aliases")
if len(set(table.dataframe.columns)) > len(table.dataframe.columns):
raise ValueError(
f"Table has duplicate column names (after normalization): {table}"
)
return tables


Expand Down