-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create trade links #167
Create trade links #167
Conversation
@siddharth-krishna, any chance you could take a look at the changed added transforms? |
raise ValueError(f"TFM_INS-TS table already has Year column: {table}") | ||
# TODO: can we remove this hacky shortcut? Or should it be also applied to the AT variant? | ||
if set(df.columns) & query_columns == {"cset_cn"} and has_no_wildcards( | ||
df["cset_cn"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olejandro do you understand why we have this condition? Unless there's a compelling performance reason, it's probably better to have one general way of handling all TFM_INS-TS tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's handle this in a subsequent PR.
for table in tables: | ||
if table.tag == datatypes.Tag.tradelinks: | ||
df = table.dataframe | ||
comm = df.columns[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure that the first column is always what we want? The danger of this kind of integer indexing is silent failures if someone adds another transform in the middle that changes column orders. Perhaps better to fetch column by name, which will raise an understandable error if that column isn't present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, yes. We have to rely on the position to retrieve the column name. We could add a check later that the name is on the commodities list.
df = table.dataframe | ||
comm = df.columns[0] | ||
destinations = list(df.columns).remove(comm) | ||
df.rename(columns={comm: "origin"}, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to avoid using the inplace
keyword whenever possible (see this proposal for more details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the commodity is renamed to origin here, but later in the code origin becomes reg1
which sounds like a region? I'm confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should address the inplace
issue systematically then? Maybe through a subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the renaming issue, it is due to the table format. What we have is like this:
elc | reg1 | reg 2 |
---|---|---|
REG1 | 1 | |
REG2 | 1 |
And what we want to get is this:
origin | destination | comm1 | comm 2 | comm | process | tradelink |
---|---|---|---|---|---|---|
REG1 | REG2 | ELC | ELC | ELC | TB_ELC_REG1_REG2_01 | b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olejandro : Do you know what is the purpose of the third comm? I have always wondered about that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see! I misread the comm = df.columns[0]
as though comm
was the entire column, not just the column label. So this essentially turns an adjacency matrix for one type of edge into an adjacency list with edge labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olejandro : Do you know what is the purpose of the third comm? I have always wondered about that...
No... Tbh, I was planning to ask you. :-)
xl2times/transforms.py
Outdated
model: datatypes.TimesModel, | ||
) -> List[datatypes.EmbeddedXlTable]: | ||
""" | ||
Transform tradelinks table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transform tradelinks table | |
Transform tradelinks table to tradelinks_dins |
xl2times/transforms.py
Outdated
df.drop(columns=["tradelink"], inplace=True) | ||
df = df.merge(td_type, how="inner", on="regions") | ||
# Drop tradelink (bidirectional) duplicates | ||
df.drop_duplicates(subset=["regions"], keep="last", inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's funny that we drop bidirectional links here but reconstruct them in generate_trade
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's because we want to get rid of the same bidirectional tradelinks that have different process names. Otherwise we would get 4 tradelinks in generate_trade
instead of 2. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's only if you put a b
in the tradelinks
column. :)
I was wondering if it's possible to simplify the code and leave both directions of links here, if it allows removing the code in generate_trade
that "doubles" the rows with a b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if we don't put b in the column, the links will be processed as unidirectional. :-) This may have an impact on how some other attributes / constraints are applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if it's possible to simplify the code and leave both directions of links here, if it allows removing the code in
generate_trade
that "doubles" the rows with ab
?
Not that I see...
xl2times/transforms.py
Outdated
td_type = df.groupby(["regions"])["tradelink"].agg("count").reset_index() | ||
td_type.replace({"tradelink": {1: "u", 2: "b"}}, inplace=True) | ||
df.drop(columns=["tradelink"], inplace=True) | ||
df = df.merge(td_type, how="inner", on="regions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block above looks good but I do wonder if this sort of thing (melt + count links + create new DF) would actually be faster in plain Python -- it might also be more understandable! It's okay for now I think but I'd be curious to test it sometime. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have other instances of code blocks like this. Maybe worthwhile investigating!
Co-authored-by: Siddharth Krishna <[email protected]>
@Antti-L, any chance you could share your opinion on this PR? Namely, currently, the changes don't take into account the sheetname a |
I think it is important if one wants that with xl2times one should be able to reproduce a model that has been built under VEDA. I think that was indeed originally the idea, even though options with more flexibility / better functionality could be also supported by xl2times. But if you would treat two unidirectional links as a single bi-directional link, the model would be different, and any process parameters defined for either would become erroneous, no? The model solution would most likely be different and possibly notably incorrect. BTW: In trade link matrices one should also be able to specify the name of the process in each cell having a link. I am using that heavily in my models. Is that supported already by Xl2times? |
I just happened to see this statement in the code (part "Do not create trade links for not included regions"): |
Thanks @Antti-L! This does not yet take into account #162. Only when regions are specifically filtered (e.g. similar to specifying regions to include in Veda case manager). |
Not yet, but it definitely could be. I would probably open a separate issue to clarify all the details though. |
Among the benchmarks available to us, it is only an issue for TIMES-NZ, but you are right... |
This doesn't yet allow generating
|
I am not sure how it is defined in VEDA, if that's what you ask. I have always hoped for a FI_Process table for the trade processes as well, where I would be able to define the PCG and TSLVL, process subtype, units and vintaging. But no, KanOrs has thus far declined it. I suggest you test it under VEDA to see how VEDA works. Possibly the level is taken as the finest among the commodities traded?
Again, I am not sure how it is defined in VEDA, if there are several commodity types traded. But I think you could assume that they are all of the same type (I have never used mixed types in trade processes myself). So you could just take the first commodity type of the traded commodities, or if you like, use some priority order as deemed best or based on VEDA testing.
I guess that is indeed what VEDA does. |
No description provided.