Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Marktstammdatenregister (MaStR) #165

Merged
merged 9 commits into from
Jan 10, 2025
Merged

Add Marktstammdatenregister (MaStR) #165

merged 9 commits into from
Jan 10, 2025

Conversation

lkstrp
Copy link
Member

@lkstrp lkstrp commented Jun 10, 2024

Closes #16

Change proposed in this Pull Request

Adds Marktstammdatenregister via open-MaStR.

There are a few issues:

  • open-mastr provides a bulk download of all the cleaned datasets on zenodo. But as a .zip, so we have to download everything. We could use the API instead, but then the user has to pass a token.
  • These datasets are huge with many small power plants. I have now filtered out all plants with a capacity of less than 1 MW. Otherwise powerplant.aggregate_units() takes too long. Solar and wind are also currently not included.
    • Performance can be improved I think, but the main bottleneck is probably Duke and not on ours side
  • Validation is not done yet, I wait for the ENTSOE token to run compare-with-entsoe-stats.py, but below is a first plot

Dataset

File Name Number of entrys Entrys with less than 1 MW capacity
_biomass.csv 22284 21240 (95.32%)
_combustion.csv 85424 81776 (95.73%)
_nuclear.csv 6 0 (0.00%)
_hydro.csv 8657 7859 (90.78%)
_wind.csv 34798 6729 (19.34%)

output

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have added a note to release notes doc/release_notes.rst.
  • I have used pre-commit run --all to lint/format/check my contribution
  • I have documented the effects of my code changes in the documentation doc/.
  • I have adjusted the docstrings in the code appropriately.

@FlorianK13
Copy link

Hi @lkstrp and other devs from powerplantmatching, I'm one of the developers of open-mastr. I like your work in harmonizing different sources for one european dataset. If there are issues from your side that are of concern for the open-mastr development, I'm happy to discuss them.

One remark on your comment above:
"We could use the API instead, but then the user has to pass a token."
This is not really a good idea. With the API you are limited to a small number of requests per day, so using it to get large data takes a long time. You could however run the bulk download to get an sqlite or postgres database and extract relevant information from there.

from open_mastr import Mastr

db = Mastr()
db.download()
# if you want csv files then also run
db.to_csv()

@lkstrp
Copy link
Member Author

lkstrp commented Aug 6, 2024

Hey @FlorianK13,
Thanks for reaching out!

So far the idea was to basically just use the zenodo download you provide, which is quite time consuming to download.

from open_mastr import Mastr

db = Mastr()
db.download()
# if you want csv files then also run
db.to_csv()

Does this approach have any advantages over the zenodo download? E.g. runs faster, allows downloading only selected data? The API reference reads like it downloads the same zip in bulk, but allows data selection. Which means it downloads everything and just strips away unselected data?

@FlorianK13
Copy link

When using the python download method, you will get the most recent data (from the day before). On zenodo you will get the data from our last update, which is a few month old. However with zenodo your code is reproducible, as the python download changes every day as the dataset from BNetzA changes every day. To achieve reproducibilty with python, you would need to specify date="existing" (Reference) after you have downloaded the dataset once so that you use your existing local dataset from there on.

Both approaches take rather long, as you need to download the whole dataset. Afterwards you can specify which data you are interested to parse. So you are right with your last sentence 'Which means it downloads everything and just strips away unselected data.'

@fneum
Copy link
Member

fneum commented Aug 23, 2024

open-mastr provides a bulk download of all the cleaned datasets on zenodo. But as a .zip, so we have to download everything. We could use the API instead, but then the user has to pass a token.

Based on the discussion above, let's take the zenodo releases. If that's updated at least on an annual basis, that's fine. I am also not too worried about the large download size, as it is usually not a frequent action to update it and it's cached locally as well. @FlorianK13, it could be an option for upcoming releases to upload the individual CSV files unzipped into the zenodo repository, which would allow selective downloads (even though you lose the ZIP compression). This could be additional to the ZIP.

These datasets are huge with many small power plants. I have now filtered out all plants with a capacity of less than 1 MW. Otherwise powerplant.aggregate_units() takes too long. Solar and wind are also currently not included.

Yes, that's also what Global Energy Monitor does. Perhaps they will also integrate open-MaStR, then we wouldn't have to.

Validation is not done yet, I wait for the ENTSOE token to run compare-with-entsoe-stats.py, but below is a first plot

I got one on the same day I requested it today.

@FlorianK13
Copy link

@fneum I created OpenEnergyPlatform/open-MaStR#558 to discuss if we can upload single files at zenodo.

@fneum
Copy link
Member

fneum commented Jan 3, 2025

Pretty good (if using 50 kW as threshold):

image

Total solar capacity would be 71 GW (for August 2023), but the missing 30 GW are units < 50 kW (rooftop PV).

Has much better coverage for wind than GEM.

Adds 7 GW of biogas we had been missing.

When selectively reading columns, the performance issues also disappear for the cleaning (the matching I did not test yet).

@fneum
Copy link
Member

fneum commented Jan 3, 2025

Ok, @lkstrp! This could go in. I also checked the updated powerplants.csv after the matching process.

As a future TODO: Going from 1000 kW threshold to 50 kW threshold or even lower would have a large benefit for the total of the solar capacities in Germany closely matching those from https://openenergytracker.org/docs/germany/electricity. Maybe we can add a config that certain (parts of) data sets that are marked as "fully included sources" are not included in the matching process.

@fneum fneum marked this pull request as ready for review January 10, 2025 09:33
@lkstrp
Copy link
Member Author

lkstrp commented Jan 10, 2025

@fneum Okay sounds good to me! Feel free to merge

@fneum fneum enabled auto-merge (squash) January 10, 2025 10:03
@fneum fneum merged commit fa8b827 into PyPSA:master Jan 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce/Include MaSTR data
3 participants