-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add options for automated sub-national aggregation #46
Comments
Hi @FLomb |
Good idea, shouldn't be too difficult to implement: Just a sketch out of the head: assuming import geopandas as gpd
merged = gpd.sjoin(gdf, regions, how="inner", op='within')
merged.groupby('index_right').sum() # or more specific aggregation strategies One could have this as a frontend function where the user passes the |
Ok, here's how it could look in more detail. @FLomb could you check whether this would fit your use case or what would be missing? It should allow any GeoDataFrame. import numpy as np
import geopandas as gpd
import powerplantmatching as pm
def assign_to_shape(df, shapes, index_col):
"""
Group powerplants by shapes of
E.g. NUTS2, NUTS1, GADM.
Parameters
----------
df : pd.DataFrame
power plant list with coordinates 'lat', 'lon'
shapes : gpd.GeoDataFrame
GeoDataFrame with polygons as geometry,
e.g. NUTS2, NUTS1, GADM
index_col : str
column of shapes to group by
"""
CRS = 'EPSG:4326'
gdf = gpd.GeoDataFrame(df,
geometry=gpd.points_from_xy(df.lon, df.lat),
crs=CRS
)
merged = gpd.sjoin(gdf, shapes, how="inner", op='within').to_crs(CRS)
strategies = {
'Capacity': np.sum,
'Efficiency': np.mean,
'Duration': np.mean,
'Volume_Mm3': np.sum,
'DamHeight_m': np.mean,
'lat': np.mean,
'lon': np.mean,
'DateIn': np.mean,
'DateRetrofit': np.mean,
}
groupers = [index_col, "Fueltype", "Technology", "Set", 'Country']
return merged.groupby(groupers, as_index=False).agg(strategies) This can be run as: df = pm.powerplants(from_url=True)
nuts0 = gpd.read_file("nuts/NUTS_RG_01M_2016_4326_LEVL_0.geojson")
df = assign_to_shape(df, nuts0, 'NUTS_ID') which would output:
If merged, it should probably go into https://github.com/FRESNA/powerplantmatching/blob/master/powerplantmatching/export.py |
Hi @fneum, thanks for the quick reply! At first sight, yes, this seems pretty much what I was looking for! In your example you are still outputting at NUTS0, so I should possibly test it myself and see what happens when applied to some NUTS2/GADM shapefile. I'll try to do so asap |
Yes, that would be good if you test it. Just used NUTS0 to compare with the Country column. You could use NUTS_RG_01M_2016_4326_LEVL_2.geojson from https://gisco-services.ec.europa.eu/distribution/v2/nuts/download/#nuts21 |
Ok, I tested it but it looks like I'm having some issues. I've tried both using NUTS0 data for all EU and GADM data for a single country. In both cases, it kinda works but it skips several plants in the aggregation, e.g. all renewables and bioenergy. Not sure if it could be due to the fact that these techs thend to have a "nan" technology type, unlike others like Hydro. |
Ok, quick follow-up: I had some time to debug the problem and I managed to make it work by indeed avoiding any nan values for techs whose Technology type was 'nan'. This said, there are still some issues when adopting different shapefiles. For instance, there is a Hydro-Reservoir power plant in Portugal (id: 5468, name: "Foz tua") whose coordinates (erroneously) lie outside the inland area of Portugal. Now, when using the EU-NUTS0 shapefile, the merging makes it still fall into the "PT" total capacity computation; instead, when using a GADM shapefile of Portugal, the latter is killed and skipped from the computation of total capacity of the given GADM region and hence of Portugal itself. While this is one example, there could be tons of similar ones elsewhere; any ideas for fixing this kind of issues? Thanks |
You could create a small buffer around each polygon - but then you will have to deal with the issue of power plants lying in more than one region. |
Hmm, yes the buffer is an option but could be very fiddly. Was it a coarse shapefile (e.g. 60M)? Does it occur with a highly-resolved shapefile? 10m or 1m? I would see the coordinates as the ground truth rather than the country label. Could you share how you addressed the Technology nans? |
Yeah, I had thought myself of creating a buffer around polygons, which would work in this case because the plant in question is erroneously placed in the sea by its coordinates, while it should be instead a hydro plant inland. Yet, for a broader application, this trick might easily lead to problems with neighbouring regions/countries. As far as the shapefile, I have tried a couple of different ones, the latest being the official file from the GADM website (not sure what's the resolution, they don't really seem to say it explicity as NUTS does; do they?). None changes the outcome. As far as Technology nans, what I did, as a quick workaround, was to just fill all Technology nans with Fueltype values. This is because most of the Technology nans (at least in the subset of countries I was considering) were related to Wind/Solar, with a few Hydro plants. In this way, you get Wind / Solar as a Technology for Fueltype Wind / Solar rather than nan. And, for Hydro, you get a generic "Hydro" as a placeholder for you to eventually figure out which type of plants those are or how you want to allocate them to the rest of Hydro plant types. |
It would be cool to have an automated option for aggregating power plants into sub-national clusters within each country, based on standard sub-national units.
For instance, it would be really nice if the user could choose, as an option, the level of spatial aggregation, e.g.:
So, instead of having as an output just the aggregate capacity of each European country, one could have the aggregate capacity of each sub-national region of interest. This would facilitate a lot the functional coupling of the project to any power system model.
The text was updated successfully, but these errors were encountered: