Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include optimization parameter #94

Merged
merged 19 commits into from
Oct 3, 2024
Merged

Conversation

glatterf42
Copy link
Member

@glatterf42 glatterf42 commented Jun 28, 2024

This is the final PR bringing all changes regarding optimization.Parameter together. Originally, the goal was to have this PR be very quick because all precursors had been reviewed, but this didn't happen. After merging #82, I had to rework all these branches because for some reason, the propagation of changes upon rebasing did not happen as I expected. During that reworking phase, some errors were introduced and I didn't want to fix these errors individually on all precursor PRs, which I why I collapsed all of them to this one PR, that is now quite large (again), sorry for that.

To spare you the trip to the old PRs, let me summarize some discussions here:

  • There is a DeprecationWarning arising from the httpx code, it seems. When I discovered it, I thought it should be fixed by starlette 0.37.2, which was allowed by fastapi 0.111.0, but this doesn't seem to be the case. I'm not entirely sure why or how urgent it is to remove this warning.
  • ixmp.__init__ gains some imports, which @danielhuppmann did not like too much. We can potentially remove all except for Platform and Datapoint, but this is probably going into subsequent removal-PRs (see also Clean up top-level namespace #84).
  • Table and Parameter are very similar, as will be Variable and Equation. For every single layer (but @danielhuppmann asked specifically for the Core layer first), these could inherit from each other (or be composed elegantly) to avoid repetition of very similar or identical code. I agree that this is a very useful step and I already have an idea for the Core layer. I would like to implement this pretty much right away, but in a new PR, to keep it manageable.

Working on the transport tutorial, I noticed that I had completely forgotten about adding parameters = db.relationship() in the Run DB-table. I have even run migrations for these other PRs, but no error came up. I'm wondering if we even need these relationship() declarations at all.

@glatterf42 glatterf42 added the enhancement New feature or request label Jun 28, 2024
@glatterf42 glatterf42 self-assigned this Jun 28, 2024
@glatterf42 glatterf42 force-pushed the include/optimization-parameter branch from 77d9fc8 to 0debc9f Compare August 22, 2024 13:47
@glatterf42
Copy link
Member Author

Let me quickly explain why I raised the minimum pandas version here: since we're using pandas for the upsert-functionality of parameter.add_data(), the workflow is like this:

        # Gather all indexsets/columns
        index_list = [column.name for column in parameter.columns]
        # Convert existing data to DataFrame
        existing_data = pd.DataFrame(parameter.data)
        # Set the index to be constructed as each "key", i.e. row of indexset-values
        # This allows quick comparison of keys for existence and only updates what we actually want to update, i.e. values and units.
        # Can't do that for empty dataframes, though (i.e. first addition of data to a parameter)
        if not existing_data.empty:
            existing_data.set_index(index_list, inplace=True)
        # Upsert the new data
        parameter.data = (
            data.set_index(index_list).combine_first(existing_data).reset_index()
        ).to_dict(orient="list")

I don't see any issue with that code, but pandas < 2.1.0 does: if existing_data is empty, we can't set an index on it. But if the indices of the two dataframes in combine_first() don't at least overlap, pandas doesn't know how to combine them.
In our case, this should not be an issue, though: existing data will always have the same multiindex as new data, we validate all data to ensure precisely this. The one exception, of course, is if existing_data is empty, but in that case, pandas should just take all values from the new data and don't bother about upserting -- which it does, but only as of 2.1.0.
This version was released a year ago, so I'm fine with bumping our minimum version. If you have a better idea of how to resolve this, I'm happy to hear it, too :)

(Going to make exactly the same commit on #97 and #98, too.)

Copy link
Contributor

@meksor meksor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright so this lgtm from a code point of view. I noticed Im starting to have trouble discerning all the boilerplate code from the important stuff.
I hope I can get to some bigger refactoring soon, the tests now take very long to run and its probably smart to encapsulate some stuff/move it to scse-toolkit instead of testing it for each data type (im thinking filters and docs maybe).

@glatterf42
Copy link
Member Author

Thanks for the review :)
I agree that a lot of code seem to be boilerplate, which is one of the reasons I opened #96. So I'm happy to help with some refactoring, if I can :)

I would likely merge this PR then, if you change your review to "approve", as @danielhuppmann doesn't seem to have enough time to review it. No one will immediately start using it, anyway, so I think the most time-efficient version for @danielhuppmann is to review #101 and/or #108, which showcase most of the added syntax. If this is not to our liking, we can still introduce clean-up PRs before merging the tutorials to main.
Besides, I've tried to make the syntax consistent with what we already have, so I think it's unlikely that there are major problems hiding in there.

Finally, @meksor, may I recommend reviewing #97 and #98 soon, too, if time permits? Both PRs are similar to this one and even more so to one another, so doing it soon would keep you from having to chew through all the boilerplate again as you probably still remember the parts that are important.

glatterf42 and others added 19 commits September 30, 2024 10:40
* Covers:
* run__id, data, name, uniqueness of name together with run__id
* Adapts tests since default order of columns changes
* Make Column generic enough for multiple parents
* Introduce optimization.Parameter
* Add tests for add_data
* Enable remaining parameter tests (#86)
* Enable remaining parameter tests
* Include optimization parameter api layer (#89)
* Bump several dependency versions
* Let api/column handle both tables and parameters
* Make api-layer tests pass
* Include optimization parameter core layer (#90)
* Enable parameter core layer and test it
* Fix things after rebase
* Ensure all intended changes survive the rebase
* Adapt data validation function for parameters
* Allow tests to pass again
@glatterf42 glatterf42 force-pushed the include/optimization-parameter branch from acc783f to a9e5ac8 Compare September 30, 2024 09:17
@glatterf42 glatterf42 merged commit 769a198 into main Oct 3, 2024
10 checks passed
@glatterf42 glatterf42 deleted the include/optimization-parameter branch October 3, 2024 08:19
glatterf42 added a commit that referenced this pull request Jan 16, 2025
* Include optimization parameter basis (#79)
* Fix references to DB filters in docs
* Streamline naming in tests
* Fix and test parameter list and tabulate for specific runs
* Make indexset-creation a test utility
* Incorporate changes from #110
* Use pandas for updated add_data behaviour
* Raise minimum pandas version to enable add_data upsert
* Generalize UsageError for more optimization items
* Use generalized UsageError for Table
* Use own errors for Parameter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants