Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the nexus module #106

Merged
merged 57 commits into from
Sep 12, 2024
Merged

Update the nexus module #106

merged 57 commits into from
Sep 12, 2024

Conversation

awais307
Copy link
Contributor

@awais307 awais307 commented Jun 19, 2023

This PR addresses issue#93 and updates data to the nexus module, addresses the tests issue

The code has been restructured to be more test-friendly

two tests has been marked with @pytest.mark.xfail(reason="Temporary, for #106"), namely
test_report_full in test_report.py
and test_build in test_build.py
as they both require a fully parametrized scenario as a starting point, which is not the case for the "bare" scenario for tests.

Since we are about to do major revision and update to the module, we decided to postpone investing further time in those tests.

all tests seem to pass for py3.12, not for py3.11

How to review

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both ✅
  • Add, expand, or update documentation.
  • Update doc/whatsnew.

@awais307 awais307 self-assigned this Jun 19, 2023
@awais307 awais307 added the water MESSAGEix-Nexus (water) variant label Jun 19, 2023
@glatterf42
Copy link
Member

Currently, the linting test is failing (among others). Please see our guide for contributing code to see what code style we expect. Please also note that black, flake8, mypy, and isort can all be configured to be applied automatically whenever you save your code (the details of which can vary, but we can work it out together if you want to). This helps passing theses tests tremendously.

@glatterf42
Copy link
Member

The current issue with the other tests seems to be that they still rely on code in message_data (which they shouldn't, as explained before, since all code here should be open source):

message_ix_models/tests/model/water/test_nexus.py:5: in
from message_data.model.water import build
E ModuleNotFoundError: No module named 'message_data'

@glatterf42
Copy link
Member

Also to make sure: how large are these data files that you are updating here? If they are too large (and there are lots of them), it might be better to upload them to Zenodo or so (see the second point in the comment by @khaeru).

@khaeru khaeru marked this pull request as draft June 21, 2023 08:44
@khaeru khaeru changed the title [WIP] Updates to the nexus module Update the nexus module Jun 21, 2023
@khaeru
Copy link
Member

khaeru commented Jun 21, 2023

  • Changed to "draft" status; prefer this over putting "[WIP] " in the title, as it means the same thing.
  • I would recommend to update the branch by "Update with rebase" and then git pull --rebase on your local.

@glatterf42
Copy link
Member

The tests are mostly working, despite what it looks like here. There are some general issues however that we need to address to fully see their potential:

  • The most generic problems arise from the Code quality check. This tells us that the code is not formatted correctly and that there are some issues like unsorted import blocks and unused variables. As mentioned before, tools like ruff (superseding isort and flake8) and black can be set up to automatically run whenever you save a file, eliminating most of these problems.
  • Some of the Code quality problems are not as easy to resolve. Things like unused variables need manual attention, even though they are simple: if you truly don't need the return value of a function, you can use an underscore: _ = some_function(). The mypy errors can be more tedious to resolve as we need to understand why the mentioned modules don't have the missing attributes and whether they should have or where else they should get them from.
  • As mentioned before, the tests only contain a limited number of problems:
ERROR message_ix_models/tests/model/water/test_cooling.py
ERROR message_ix_models/tests/model/water/test_utils.py
FAILED message_ix_models/tests/model/water/test_irrigation.py::test_add_irr_structure - AttributeError: 'dict' object has no attribute 'regions'
FAILED message_ix_models/tests/model/water/test_water_supply.py::test_map_basin_region_wat - AttributeError: 'dict' object has no attribute 'time'
FAILED message_ix_models/tests/model/water/test_water_supply.py::test_add_water_supply - ModuleNotFoundError: No module named 'your_module'
FAILED message_ix_models/tests/model/water/test_water_supply.py::test_add_e_flow - ModuleNotFoundError: No module named 'your_module'
  • For the first two FAILED, the issue is the same: the actual function is using the context object incorrectly:
    info = context["water build info"]

    if "year" in context.time:

Are the lines from the actual code. I don't know if this should work, but it's treating context in two different ways: the first reminds me of a dictionary, the second of class. In your tests, you set up a mock Context object (which would normally be a class) as a dictionary, which is why the info = ... line works, but the second one complains: you cannot access dictionary data that way. Since I doubt you want to rewrite your actual code function to use both dict and Context for the context argument, I suggest to study how other tests in this repo solve this problem. You should be able to use a test_context out of the box which you can then adapt.

  • The last two FAILED result from trying to call a module called your_module, which doesn't exist. You probably want to replace your_module with message_ix_model.model.water.data in both cases.
  • Finally, both ERRORs are due to missing imports. Both of these issues can probably be solved by addressing all mypy complaints, although one of them is trying to import message_data. This should not happen in message-ix-models and might be a sign that the clean-up process is still incomplete.

@awais307 awais307 marked this pull request as ready for review November 16, 2023 21:42
@awais307 awais307 requested a review from adrivinca November 16, 2023 21:42
@adrivinca
Copy link
Contributor

@awais307 can you please update on what your commits did, if anything is still missing and how you expect others to review? thanks

@glatterf42
Copy link
Member

@awais307, @adrivinca: the tests are almost working, you're on a good path there! Almost all errors seem to originate from the same problem: that you are not using a 'real' Context object. As pointed out above, using those is preferable since it makes the tests easier to understand and maintain and since you won't have to rewrite your actual code to deal with both Contexts and dicts.
With the latest commit, I adapted the test in tests/model/water/test_cooling.py to show you what I mean by that. We do have an object called test_context and we can use it just like that our test functions. Please take a look and try to adapt your other tests in a similar way, I suspect it will solve most of the remaining errors :)
If you don't want to write the same setup of the text_context all the time, you could introduce another fixture like it e.g. in the tests/model/water/__init__.py: you can define a function that takes the test_context, does whatever setup all your tests frequently share, and return the modified test_context. You can then use the pytest decorator @pytest.fixture(scope="function") to make this function work identical to test_context. Then, you can use it in all your other tests as well :)

One test error might remain, though, and that is this one:

___________________________________________ test_map_yv_ya_lt ___________________________________________

    def test_map_yv_ya_lt():
        periods = (2010, 2020, 2030, 2040)
        lt = 20
        ya = 2020
    
        expected = pd.DataFrame(
            {"year_vtg": [2010, 2020, 2020, 2030], "year_act": [2020, 2020, 2030, 2040]}
        )
    
        result = map_yv_ya_lt(periods, lt, ya)
    
>       pd.testing.assert_frame_equal(result, expected)
E       AssertionError: DataFrame are different
E       
E       DataFrame shape mismatch
E       [left]:  (8, 2)
E       [right]: (4, 2)

message_ix_models/tests/model/water/test_utils.py:124: AssertionError

The shape of the compared DataFrames won't be fixed by the different context, I assume. Please check that expected is really what we expect, e.g. year_vtg contains 2020 twice, is that on purpose?

One final note: I removed a few markers that told rust to ignore functions being too complex. Please take another look at those and try to reduce their complexity! In most cases, it is enough to take a part of the function that includes a lot of if ... elif ... else ... and move that to a new function that takes some input data and returns different output depending on the condition.

Please let me know if you need help with any of the above :)

@glatterf42
Copy link
Member

Hi @adrivinca and @awais307, I've taken another look at this PR. I started by addressing all code quality issues: applying some formatting and reducing code complexity. The latter worked quite well in parts because you already had all the puzzle pieces present: e.g. in water_for_ppl.py, your immensely complex cool_tech() function was in part only complex because you defined various helper functions inside that same function. So by moving the definitions outside, the code quickly became less complex.
Unfortunately, the same approach did not work as well for infrastructure.py. While moving stuff outside the main function also resolved the issue, there's still much room for improvements. Almost all steps in the main function that work with the input DataFrame seem to be very similar except for tiny variations, so according to the DRY principle (Don't Repeat Yourself), this code is most likely generalisable.

Next, I took a look at the failing tests. For some of them, I was able to fix them myself, though even in those cases, there are some open questions. Please check e.g. the TODOs in test_map_yv_ya_lt().
For some cases, trying to use the test_context or otherwise fixing the tests led to problems to such a degree that I didn't know how to fix them. Please search for FIXME to find descriptions of these errors with probable causes and determine how you want to fix them.

Finally, your reporting.py contains these lines:

try:
from message_data.tools.post_processing.iamc_report_hackathon import (
report as legacy_reporting,
)
except ImportError: # message_data not installed
legacy_reporting = None
log = logging.getLogger(__name__)
def run_old_reporting(sc=False):
mp2 = sc.platform
log.info(
" Start reporting of the global energy system (old reporting scheme)"
f"for the scenario {sc.model}.{sc.scenario}"
)
legacy_reporting(
mp=mp2,
scen=sc,
merge_hist=True,
merge_ts=False,
)

run_old_reporting() is part of your main workflow (via report_full()), so legacy_reporting is actually required to be there. My version of mypy is already marking this as an error because legacy_reporting really cannot be None, so your code still requires message_data to be present, although no code in this repo should. Please find a solution for that, e.g. by not using the legacy_reporting at all and instead using the reporting system that message_ix_models provides. At the very least, you should add a clause/flag to report_full so that run_old_reporting() is only run if message_data is present. For this, you could reuse the HAS_MESSAGE_DATA flag that already exists in the code:
HAS_MESSAGE_DATA = True

@adrivinca
Copy link
Contributor

adrivinca commented Mar 18, 2024

Thanks a lot @glatterf42 for spending time on this and for the thorough overview.
I am going through the tests and the code, here the main changes I did

  • Taking from another commit I did in the leap-re-wks branch, I replaced all the .append() with pd.concat([]) and also another couple of parts where the code was obsolete
  • concerning test_map_yv the test and functions work as intended. I just don't know why, when running locally the reset_index() in the main function does not seem to work when called from another module.
  • I used if HAS_MESSAGE_DATA: in the reporting.py script to use the legacy reporting. Not sure the the reporting features in message_ix_models are outputting the same as legacy

Some FIXME are related to making a more general configuration for the module, and also maybe a testing config.
I would like to work on it for the SSP update, where I need anyway to change the code and for instance add the SSP config.
Therefore, I left some FIXME for the future, and I would merge them with the FIXME memo for now
I refer to the comments in test_irrigation, test_water_supply.

Other points are also open questions to me and I would like @awais307 to please intervene and fix.
in test_water_supply:

  • test_add_e_flow function does not make sense to me, how is the function supposed to work with fictitious regions if the files that get loaded are data files?

currently, the function is checking to match this df

   Region  value  year  time  Unnamed: 0  BCU_name
0  test_Region      1  2020  year           0  test_BCU

and df_sw in the function read_water_availability() before it breaks

     Region     years        value
0  test_BCU    Region  test_Region
1  test_BCU     value            1
2  test_BCU      year         2020
3  test_BCU      time         year
4  test_BCU  BCU_name     test_BCU

I think these functions need to be tested with actual existing data

  • similar for test_add_water_supply(), please look at it, because it breaks, ut not sure if it is supposed to work with the input data that you give
  • same for test_map_basin_region_wat()

unfortunately I am not familiar with the with patch and mocking approach @awais307 have you tried those tests you wrote?
When running pytest in the folder the al break
please also have a look to the annotation issue in test_utils.test_add_commodity_and_level(). and test_utils.test_read_config

@glatterf42
Copy link
Member

glatterf42 commented Apr 2, 2024

@awais307 I rebased this PR on top of the latest main branch to keep the linear commit history and used this opportunity to make two functions less complex. You should now be able to use the latest commit here, make changes and merge to main without 'out-of-date' issues; please let me know if you can't.

@glatterf42 glatterf42 force-pushed the nexus-upd branch 2 times, most recently from 18b3648 to c3701f5 Compare April 3, 2024 06:36
@glatterf42
Copy link
Member

glatterf42 commented Apr 3, 2024

@awais307, I'm trying to keep the commit history linear by using git rebase main. A linear history means it will be very easy to understand how and why which changes were introduced and the changes will integrate with the rest of the code base neatly. So if you see that this branch is out-of-date with the main branch, I'd appreciate if you could also use git rebase instead of git merge :)
If you don't feel comfortable using git rebase, please check out e.g. this tutorial.

@awais307
Copy link
Contributor Author

awais307 commented Apr 7, 2024

@awais307, I'm trying to keep the commit history linear by using git rebase main. A linear history means it will be very easy to understand how and why which changes were introduced and the changes will integrate with the rest of the code base neatly. So if you see that this branch is out-of-date with the main branch, I'd appreciate if you could also use git rebase instead of git merge :) If you don't feel comfortable using git rebase, please check out e.g. this tutorial.

okay, for some reason, my local tries to merge it whenever I fetch the changes.

@glatterf42
Copy link
Member

@awais307, I'm trying to keep the commit history linear by using git rebase main. A linear history means it will be very easy to understand how and why which changes were introduced and the changes will integrate with the rest of the code base neatly. So if you see that this branch is out-of-date with the main branch, I'd appreciate if you could also use git rebase instead of git merge :) If you don't feel comfortable using git rebase, please check out e.g. this tutorial.

okay, for some reason, my local tries to merge it whenever I fetch the changes.

Could you please try to configure your local git interaction not to do that, then? If you tell me how you interact with git locally, I might be able to help you. Do you use Github Desktop? GitKraken? The command line?

To get the changes from up here to your system and overwrite the changes you have locally, you might need to perform something like a "force pull", which you can do following these instructions. Please let me know if you need any help with that :)

@adrivinca adrivinca merged commit ae7b63f into main Sep 12, 2024
28 checks passed
@adrivinca adrivinca deleted the nexus-upd branch September 12, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
water MESSAGEix-Nexus (water) variant
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants