Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused code, add Discussions URL and other minor changes #233

Merged
merged 7 commits into from
Mar 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ to <[email protected]>.

> If you want to ask a question, we assume that you have read the available [Documentation](https://xl2times.readthedocs.io/).

Before you ask a question, it is best to search for existing [Issues](https://github.com/etsap-TIMES/xl2times/issues) that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to search the internet for answers first.
Before you ask a question, it is best to check [Discussions](https://github.com/etsap-TIMES/xl2times/discussions) and to search for existing [Issues](https://github.com/etsap-TIMES/xl2times/issues) that might help you. In case you have found a suitable issue and still need clarification, you can write your question in this issue. It is also advisable to search the internet for answers first.

If you then still feel the need to ask a question and need clarification, we recommend the following:

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,4 +143,4 @@ python -m twine upload dist/*

## Contributing

This project welcomes contributions and suggestions. See [CODE_OF_CONDUCT.md](https://github.com/etsap-TIMES/xl2times/blob/main/CODE_OF_CONDUCT.md) and [CONTRIBUTING.md](https://github.com/etsap-TIMES/xl2times/blob/main/CONTRIBUTING.md) for more details.
This project welcomes contributions and suggestions. See [Code of Conduct](https://github.com/etsap-TIMES/xl2times/blob/main/CODE_OF_CONDUCT.md) and [Contributing](https://github.com/etsap-TIMES/xl2times/blob/main/CONTRIBUTING.md) for more details.
17 changes: 10 additions & 7 deletions xl2times/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ def merge_tables(
if not all(index):
for _, row in df[~index].iterrows():
region, sets, process = row[["region", "sets", "process"]]
print(
logger.warning(
f"WARNING: Unknown process set {sets} specified for process {process}"
f" in region {region}. The record will be dropped."
)
Expand Down Expand Up @@ -1384,7 +1384,8 @@ def _process_comm_groups_vectorised(

Returns
-------
Processed DataFrame with a new column "DefaultVedaPCG" set to True for the default pcg in eachregion/process/io combination.
Processed DataFrame with a new column "DefaultVedaPCG" set to True for the default pcg in
each region/process/io combination.
"""

def _set_default_veda_pcg(group):
Expand Down Expand Up @@ -2292,11 +2293,13 @@ def _match_wildcards(
result_col
Name of the column to store the matched results in.
explode
Whether to explode the results_col ('process'/'commodities') column into a long-format table. (Default value = False)
Whether to explode the results_col ('process'/'commodities') column into a long-format table.
(Default value = False)

Returns
-------
The table with the wildcard columns removed and the results of the wildcard matches added as a column named `results_col`
The table with the wildcard columns removed and the results of the wildcard matches added as a
column named `results_col`
"""
wild_cols = list(col_map.keys())

Expand Down Expand Up @@ -2331,7 +2334,7 @@ def _match_wildcards(
.drop(columns=wild_cols)
)

# TODO TFM_UPD has existing (but empty) 'process' and 'commodity' columns here. Is it ok to drop existing columns here?
# TODO TFM_UPD has existing (but empty) 'process' and 'commodity' columns. Is it ok to drop existing columns here?
if f"{result_col}_old" in df.columns:
if not df[f"{result_col}_old"].isna().all():
logger.warning(
Expand Down Expand Up @@ -2549,8 +2552,8 @@ def explode_process_commodity_cols(
"""Explodes the process and commodity columns in the tables that contain them as
lists after process_wildcards.

We store wildcard matches for these columns as lists and explode them late here for performance reasons - to avoid row-wise processing that
would otherwise need to iterate over very long tables.
We store wildcard matches for these columns as lists and explode them late here for performance
reasons - to avoid row-wise processing that would otherwise need to iterate over very long tables.
"""
for tag in tables:
df = tables[tag]
Expand Down
54 changes: 3 additions & 51 deletions xl2times/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@
import gzip
import os
import pickle
import re
import sys
from collections.abc import Iterable
from dataclasses import replace
from math import floor, log10
from pathlib import Path, PurePath
Expand Down Expand Up @@ -173,53 +171,6 @@ def merge_columns(
return numpy.concatenate(columns)


def apply_wildcards(
df: DataFrame, candidates: Iterable[str], wildcard_col: str, output_col: str
):
"""Apply wildcards values to a list of candidates. Wildcards are values containing
'*'. For example, a value containing '*SOLID*' would include all the values in
'candidates' containing 'SOLID' in the middle.

TODO unused. Remove?

Parameters
----------
df
Dataframe containing all values.
candidates
List of candidate strings to apply the wildcard to.
wildcard_col
Name of column containing the wildcards.
output_col
Name of the column to dump the wildcard matches to.

Returns
-------
type
A dataframe containing all the wildcard matches on its 'output_col' column.
"""
wildcard_map = {}
all_wildcards = df[wildcard_col].unique()
for wildcard_string in all_wildcards:
if wildcard_string is None:
wildcard_map[wildcard_string] = None
else:
wildcard_list = wildcard_string.split(",")
current_list = []
for wildcard in wildcard_list:
if wildcard.startswith("-"):
w = wildcard[1:]
regexp = re.compile(w.replace("*", ".*"))
current_list = [s for s in current_list if not regexp.match(s)]
else:
regexp = re.compile(wildcard.replace("*", ".*"))
additions = [s for s in candidates if regexp.match(s)]
current_list = sorted(set(current_list + additions))
wildcard_map[wildcard_string] = current_list

df[output_col] = df[wildcard_col].map(wildcard_map)


def missing_value_inherit(df: DataFrame, colname: str) -> None:
"""For each None value in the specifed column of the dataframe, replace it with the
last non-None value. If no previous non-None value is found leave it as it is. This
Expand Down Expand Up @@ -381,7 +332,7 @@ def get_logger(log_name: str = default_log_name, log_dir: str = ".") -> loguru.L
Parameters
----------
log_name
Name of the log. Corresponding log file will be called {log_name}.log in the . (Default value = default_log_name)
Name of the log. Corresponding log file will be called {log_name}.log. (Default value = default_log_name)
log_dir
Directory to write the log file to. Default is the current working directory.

Expand Down Expand Up @@ -452,7 +403,8 @@ def compare_df_dict(
df_after
the second dictionary of DataFrames to compare
sort_cols
whether to sort the columns before comparing. Set True if the column order is unimportant. (Default value = True)
whether to sort the columns before comparing. Set True if the column order
is unimportant. (Default value = True)
context_rows
number of rows to show around the first difference (Default value = 2)
"""
Expand Down
Loading