Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options for automated sub-national aggregation #46

Open
FLomb opened this issue Jun 30, 2021 · 10 comments
Open

Add options for automated sub-national aggregation #46

FLomb opened this issue Jun 30, 2021 · 10 comments

Comments

@FLomb
Copy link

FLomb commented Jun 30, 2021

It would be cool to have an automated option for aggregating power plants into sub-national clusters within each country, based on standard sub-national units.

For instance, it would be really nice if the user could choose, as an option, the level of spatial aggregation, e.g.:

  • NUTS2
  • NUTS3
  • GADM

So, instead of having as an output just the aggregate capacity of each European country, one could have the aggregate capacity of each sub-national region of interest. This would facilitate a lot the functional coupling of the project to any power system model.

@kais-siala
Copy link

Hi @FLomb
Not sure whether this feature is in the pipeline of powerplantmatching, but you could check this tool:
https://github.com/tum-ens/pyPRIMA/
which I created specifically to be able to aggregate data (in particular power plant capacities from here) as I wish.

@fneum
Copy link
Member

fneum commented Jun 30, 2021

Good idea, shouldn't be too difficult to implement:

Just a sketch out of the head: assuming gdf is powerplantmatching database as geopandas DataFrame and regions the geopandas Series with the NUTS2/3, GADM shapefiles it should just be:

import  geopandas as gpd
merged = gpd.sjoin(gdf, regions, how="inner", op='within')
merged.groupby('index_right').sum() # or more specific aggregation strategies

One could have this as a frontend function where the user passes the regions shapes. I wouldn't necessarily have the shapefiles itself built into powerplantmatching.

@fneum
Copy link
Member

fneum commented Jun 30, 2021

Ok, here's how it could look in more detail.

@FLomb could you check whether this would fit your use case or what would be missing? It should allow any GeoDataFrame.

import numpy as np
import geopandas as gpd
import powerplantmatching as pm

def assign_to_shape(df, shapes, index_col):
    """
    Group powerplants by shapes of 
    E.g. NUTS2, NUTS1, GADM.
    
    Parameters
    ----------
    df : pd.DataFrame
        power plant list with coordinates 'lat', 'lon'
    shapes : gpd.GeoDataFrame
        GeoDataFrame with polygons as geometry,
        e.g. NUTS2, NUTS1, GADM
    index_col : str
        column of shapes to group by
    """
    
    CRS = 'EPSG:4326'
    
    gdf = gpd.GeoDataFrame(df,
        geometry=gpd.points_from_xy(df.lon, df.lat),
        crs=CRS
    )

    merged = gpd.sjoin(gdf, shapes, how="inner", op='within').to_crs(CRS)

    strategies = {
        'Capacity': np.sum,
        'Efficiency': np.mean,
        'Duration': np.mean,
        'Volume_Mm3': np.sum,
        'DamHeight_m': np.mean,
        'lat': np.mean,
        'lon': np.mean,
        'DateIn': np.mean,
        'DateRetrofit': np.mean,
    }
    groupers = [index_col, "Fueltype", "Technology", "Set", 'Country']
    return merged.groupby(groupers, as_index=False).agg(strategies)

This can be run as:

df = pm.powerplants(from_url=True)

nuts0 = gpd.read_file("nuts/NUTS_RG_01M_2016_4326_LEVL_0.geojson")

df = assign_to_shape(df, nuts0, 'NUTS_ID')

which would output:

NUTS_ID Fueltype Technology Set Country Capacity Efficiency Duration Volume_Mm3 DamHeight_m lat lon DateIn DateRetrofit
0 AT Hard Coal CCGT PP Austria 704 nan 0 0 0 48.3269 15.9198 1987 1987
1 AT Hard Coal Steam Turbine CHP Austria 246 nan 0 0 0 46.9082 15.4922 1986 1986
2 AT Hard Coal Steam Turbine PP Austria 287.539 nan 0 0 0 48.0034 13.2309 1970 1987
3 AT Hydro Pumped Storage PP Austria 389 nan 0 0 0 46.9684 10.0599 1943 2018
4 AT Hydro Pumped Storage Store Austria 3852.3 nan 78.6592 361.35 55.3 47.0987 11.8791 1984.17 1997.5

If merged, it should probably go into https://github.com/FRESNA/powerplantmatching/blob/master/powerplantmatching/export.py

@FLomb
Copy link
Author

FLomb commented Jul 1, 2021

Hi @fneum, thanks for the quick reply!

At first sight, yes, this seems pretty much what I was looking for! In your example you are still outputting at NUTS0, so I should possibly test it myself and see what happens when applied to some NUTS2/GADM shapefile. I'll try to do so asap

@fneum
Copy link
Member

fneum commented Jul 1, 2021

Yes, that would be good if you test it. Just used NUTS0 to compare with the Country column. You could use NUTS_RG_01M_2016_4326_LEVL_2.geojson from https://gisco-services.ec.europa.eu/distribution/v2/nuts/download/#nuts21

@FLomb
Copy link
Author

FLomb commented Jul 5, 2021

Ok, I tested it but it looks like I'm having some issues. I've tried both using NUTS0 data for all EU and GADM data for a single country. In both cases, it kinda works but it skips several plants in the aggregation, e.g. all renewables and bioenergy. Not sure if it could be due to the fact that these techs thend to have a "nan" technology type, unlike others like Hydro.

@FLomb
Copy link
Author

FLomb commented Jul 12, 2021

Ok, quick follow-up: I had some time to debug the problem and I managed to make it work by indeed avoiding any nan values for techs whose Technology type was 'nan'.

This said, there are still some issues when adopting different shapefiles. For instance, there is a Hydro-Reservoir power plant in Portugal (id: 5468, name: "Foz tua") whose coordinates (erroneously) lie outside the inland area of Portugal. Now, when using the EU-NUTS0 shapefile, the merging makes it still fall into the "PT" total capacity computation; instead, when using a GADM shapefile of Portugal, the latter is killed and skipped from the computation of total capacity of the given GADM region and hence of Portugal itself.

While this is one example, there could be tons of similar ones elsewhere; any ideas for fixing this kind of issues?

Thanks

@kais-siala
Copy link

You could create a small buffer around each polygon - but then you will have to deal with the issue of power plants lying in more than one region.

@fneum
Copy link
Member

fneum commented Jul 12, 2021

Hmm, yes the buffer is an option but could be very fiddly.

Was it a coarse shapefile (e.g. 60M)? Does it occur with a highly-resolved shapefile? 10m or 1m? I would see the coordinates as the ground truth rather than the country label.

Could you share how you addressed the Technology nans?

@FLomb
Copy link
Author

FLomb commented Jul 13, 2021

Yeah, I had thought myself of creating a buffer around polygons, which would work in this case because the plant in question is erroneously placed in the sea by its coordinates, while it should be instead a hydro plant inland. Yet, for a broader application, this trick might easily lead to problems with neighbouring regions/countries. As far as the shapefile, I have tried a couple of different ones, the latest being the official file from the GADM website (not sure what's the resolution, they don't really seem to say it explicity as NUTS does; do they?). None changes the outcome.

As far as Technology nans, what I did, as a quick workaround, was to just fill all Technology nans with Fueltype values. This is because most of the Technology nans (at least in the subset of countries I was considering) were related to Wind/Solar, with a few Hydro plants. In this way, you get Wind / Solar as a Technology for Fueltype Wind / Solar rather than nan. And, for Hydro, you get a generic "Hydro" as a placeholder for you to eventually figure out which type of plants those are or how you want to allocate them to the rest of Hydro plant types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants