Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev-Meeting working on results #1138

Open
AntonellaGia opened this issue Nov 22, 2024 · 13 comments
Open

Dev-Meeting working on results #1138

AntonellaGia opened this issue Nov 22, 2024 · 13 comments

Comments

@AntonellaGia
Copy link
Contributor

Hey everyone,

as discussed in the dev meeting, we want to meet as a group of developers, which are interested in extend the options of results.

One idea is, to meet a view days before the next user-meeting (in spring 2025)? Maybe arriving on monday evening and coding on tuesday and wednesday morging.

Onother option is to meet before for a view days.

By liking this issue you give a non-binding commitment to participate at the small results-dev-meeting.

Feel free to share your opinion on the meeting ideas or suggest your own ideas. Or send this link to people, who might be also interested.

Best regards,

Antonella

@AntonellaGia
Copy link
Contributor Author

for some reason i could not add you @Bachibouzouk

@Bachibouzouk
Copy link
Contributor

for some reason i could not add you @Bachibouzouk

@AntonellaConsolinno - I believe this is because I am not part of the repo or the orga (I would be happy to if it is ok :))

@fwitte
Copy link
Member

fwitte commented Nov 22, 2024

@AntonellaConsolinno - I believe this is because I am not part of the repo or the orga (I would be happy to if it is ok :))

Sent you an invite!

@lensum
Copy link
Contributor

lensum commented Nov 23, 2024

I think it is probably the easiest to plan for all those involved if we simply meet for a few days before the next user meeting. This way we already have a location and a time and can avoid to extra trips (to and from another location). We could even ask the organizers of the next user meeting if they have rooms available, come up with own ideas for meeting spaces or simply meet at an AirBnB/Hotel. I would like to keep the organizational overhead as small as possible and, if possible and acceptable by everyone, try to work in a small-ish, flexible and (where possible) fast moving team.

@lensum
Copy link
Contributor

lensum commented Nov 23, 2024

As a side-note: I already have an idea for how the results could look like after we work on it (though there is still a lot of room for alternatives, I think). If you want, I could aready share it here, so we could discuss it before we meet and focus on the implementation and compatibility during our development meeting?

@p-snft
Copy link
Member

p-snft commented Nov 23, 2024

I'd agreed to extend the next dev/user meeting if possible. Still, we might have an earlier date if we meet on-line. (Alternatively: I'll also be at RET.Con, which is in February.)

Sunke from DLR already implemented something some time ago. Unluckily, he did not manage to share it at the meeting. You can have a look at his fork: https://github.com/scus1/oemof-solph/tree/feature/result_processing

@Bachibouzouk
Copy link
Contributor

The meeting's summary notes : https://cloud.oemof.org/s/pT47XJdAe7peY2R?dir=undefined&openfile=10425

My take on the result handling: rl-institut/oemof-tabular-plugins#10 and https://github.com/rl-institut/oemof-tabular-plugins/blob/e00531ea920339d305f35c3508c3aaba616359eb/src/oemof_tabular_plugins/datapackage/post_processing.py#L647 (this is not generalised to all possible oemof uses, and would need to be, adapted, thus at this point I would say the discussion is more interesting)

@Bachibouzouk
Copy link
Contributor

Bachibouzouk commented Nov 24, 2024

@lensum
Copy link
Contributor

lensum commented Nov 25, 2024

Proposal and benefits

I would propose to use 'pure' dictionaries for the results. With 'pure' I mean that they shouldn't contain more complex datatypes, like the currently used pandas.DataFrame; I assume that string and float (and nested dictionaries) would suffice. The benefits I see from using this limited amount of datatypes are:

B1) Easy access and handling in Python
B2) Good convertability to other datatypes (pandas.DataFrame)
B3) Easily storable as a JSON file
B4) Both, dictionaries and JSON are (afaik) not subject to large changes, making them robust for the future and more or less independent from software versions (major python version changes like python2 to python3 maybe excluded)
B5) Great options for inter-operability with other programs / programming languages, as JSON is widely adopted

Exemplary results

The results could look, for example, like this:

{
    "chp": {
        "el_bus": {
            "flow": {
                "2020-01-01 00:00:00": 0.0,
                "2020-01-01 00:15:00": 0.0,
                "2020-01-01 00:30:00": 0.0,
                ...,
            },
            "status": {
                "2020-01-01 00:00:00": 0.0,
                "2020-01-01 00:15:00": 0.0,
                "2020-01-01 00:30:00": 0.0,
                ...,
           },
        },
        "th_bus":{
            ...,
        },
    },
    "boiler": {
        ...,
    },
    ...,
}

The structure here (if not obvious) would be res[from_node][to_node][type_key][timestamp] (see Shortcomings below for alternative), with:

from_node: str
    The name of the node from which e.g. a flow flows
to_node: str
    The name of the node to which e.g. the flow flows.
type_key: str
    The name of the variable ("flow", "status",
    "status_nominal").
timestamp: str
    The timestamp of the value (e.g. "2020-01-01 00:15:00")

Converting the timestamp to a string allows to store the dictionary as a json-file (see benefits above).

Experimental code

The example below depicts the results handling for a single node (the el_bus). So in practical terms we would need to loop over all nodes of the energy system. While this might seem inefficient, the try-except-branching should prevent doubled entries and (I hope), make the procedure a little bit more efficient. Obviously, the ValueError would need to be handled differently than in this example.

Also, this is based on the current results structure. We could access the results coming from pyomo directy and mould it into this form.

# get examplary node results
foo = solph.views.node(res, "el_bus")["sequences"].to_dict()


# define function to add entry to a nested dict
def add_nested_entry(d, keys, value):
    for key in keys[:-1]:
        d = d.setdefault(key, {})
    d[keys[-1]] = value


# this holds the results
new_results = {}

# loop to write all results to the nested dict
for ((input_key, output_key), type_key), value_dict in foo.items():
    for timestamp, value in value_dict.items():
        timestamp = str(timestamp)
        try:
            existing_value = new_results[input_key][output_key][type_key][timestamp]
            raise ValueError(
                "Entry for [{}][{}][{}][{}] already exists: {}".format(
                    input_key, output_key, type_key, timestamp, existing_value
                )
            )
        except KeyError:
            keys = [input_key, output_key, type_key, timestamp]
            add_nested_entry(new_results, keys, value)

Shortcomings and open questions

In this example, the scalar values and meta-results are not stored in the new dictionary structure, but I believe that to be easily implementable (I was just lazy). This opens up the question, if the proposed structure is well-chosen. I could also imagine that res[from_node][type_key][to_node][timestamp] would be more practical, where [to_node][timestamp] only applies to the flow related, time-indexed variables and something like the investment would be listed as res[from_node][investment][capacity]. If we decide for this approach (the 'pure' dictionaries) in general, we can discuss which structure makes the most sense.

I haven't checked backwards compatibility, but it might be possible to offer this type of result dict as an alternative next to the old results and mark the old results as deprecated or something like this. Because the old results dict was somewhat complicated and I know at least 3 different methods to access the old results, I'm not sure that we can offer a way that is backwards compatible for every access method. However, the (to my knowledge) most-widely used solph.views.node() method could be adjusted, ensuring a somewhat smooth transition for what I believe to be most of the users.

Open questions from my side are:

Q1) Do you have questions concerning this approach?
Q2) Do you see problems with the general idea of using dicts of str and float type for the results?
Q3) Do you agree that this kind of dictionary would be easier to grasp for (new) users than the current structure, or am I on the wrong track?

@srhbrnds srhbrnds pinned this issue Nov 25, 2024
@srhbrnds srhbrnds unpinned this issue Nov 25, 2024
@p-snft
Copy link
Member

p-snft commented Nov 25, 2024

Thanks for the clean suggestion. For the transition, I'd just add a new result function and keep the old one side by side for a while. The structure might also solve the problem that storage requires a None in the key, which is rather confusing.

My suggestion, however, would be to use DataFrames in a flat structure. We'd need one DataFrame per Index (Timesteps, Timepoints, Periods) and hierarchical indexes for the columns in the Flow DataFrame.

@Bachibouzouk
Copy link
Contributor

The summary of the meeting was that many (the majority) of the people transform the output into a single/double DataFrame. To me it would make sense to deliver in DataFrame format, and to discuss and describe the format as clearly as possible (this can then directly be used in the RTD to let the users know what to expect out of the output)

@lensum
Copy link
Contributor

lensum commented Nov 25, 2024

The summary of the meeting was that many (the majority) of the people transform the output into a single/double DataFrame. To me it would make sense to deliver in DataFrame format, and to discuss and describe the format as clearly as possible (this can then directly be used in the RTD to let the users know what to expect out of the output)

Imo, it wouldn't be difficult to add a function that returns a DataFrame with preselected values. Using pd.DataFrame.from_dict(res, orient="index) should do the trick. I see this similar to what the oemof.solph.views.node function returns. But I might be off here.

@AntonellaGia
Copy link
Contributor Author

Hey everybody,
so the date and place of the next dev meeting is fix now ( Flensburg from 26.02.25 - 28.02.25) . What do you think of meeting already at the 25.02.25? If you are interessted please comment or like this comment. If a view (or more) people are interessted, I would contact Malte and Jonas and try to organise a room.
Best regards and Merry Christmas
Antonella

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants