Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate existing local scenarios to derived datasets #2319

Closed
ChaelKruip opened this issue Jan 10, 2017 · 20 comments
Closed

Migrate existing local scenarios to derived datasets #2319

ChaelKruip opened this issue Jan 10, 2017 · 20 comments
Assignees

Comments

@ChaelKruip
Copy link

@AlexanderWirtz can you identify and list the important local scenario's in this issue please?

@grdw @ploh please reserve some time in your project to migrate the relevant local scenario's to be based on their own datasets rather than rely on the national one.

@AlexanderWirtz
Copy link
Contributor

@ChaelKruip I will do my best to track these down asap.

I am curious as to how this 'migration' will work or what you mean by them. Please have a look at these question below.

  1. How will said migrated scenario refer to the 'dataset'? Will this be a local dataset or just an NL_2013
    dataset
  2. Will all input statements that were used to 'tweak' things (all present and both input statements at the very least) be moved elsewhere to work their magic, or will they still be in the original scaled scenario 'user_values'
  3. Will people who base their scenario on such a (publically available) scaled scenario use the same independent dataset?

@grdw
Copy link
Contributor

grdw commented Jan 11, 2017

How will said migrated scenario refer to the 'dataset'? Will this be a local dataset or just an NL_2013 dataset.

The area code for those scenarios can hopefully be set to the local dataset; however, this might depend on how custom all these local scenario's have become. If there's a single 'best' 'Groningen' or a single 'best' 'Ameland' than that will be great.

Will all input statements that were used to 'tweak' things (all present and both input statements at the very least) be moved elsewhere to work their magic, or will they still be in the original scaled scenario 'user_values'

Can be both set from inside etsource or through the user_values. Preferably they would be put in the .ad file that accompanies a local dataset.

Will people who base their scenario on such a (publically available) scaled scenario use the same independent dataset?

How do you mean exactly?

@ploh
Copy link

ploh commented Jan 11, 2017

Will people who base their scenario on such a (publically available) scaled scenario use the same independent dataset?

Yes, they will! From a scenario's perspective a local/derived dataset is not different from a full dataset. Especially, it hast to be created in ETSource (at least for stage 0), i.e. there will not be too many of them and they will not automatically be created when a scenario is created.

@ploh ploh changed the title Make existing local scenario's independent of national dataset Migrate existing local scenario's to derived datasets Jan 16, 2017
@ploh ploh changed the title Migrate existing local scenario's to derived datasets Migrate existing local scenarios to derived datasets Jan 16, 2017
@ploh
Copy link

ploh commented Jan 16, 2017

Thinks to keep in mind:

@ploh
Copy link

ploh commented Jan 26, 2017

From @AlexanderWirtz:
These are teh scenarios on etengine staging that I would like to have migrated (for Paddepoel):
607975
607980
607984
607505

@ploh
Copy link

ploh commented Jan 26, 2017

Talked to @ChaelKruip about making the older scaled scenarios (IABR, GEA, Ameland) independent of the nl dataset: We can create a nl2013 dataset. No objections 😄

@jorisberkhout
Copy link
Member

jorisberkhout commented Jan 26, 2017

We can create a nl2013 dataset. No objections 😄

My minor objection to creating an nl2013 dataset is that there is a flaw in the dataflow going from ETDataset to ETSource. This has to do with the fact that each nl dataset on ETDataset (i.e. the one for 2011, 2012 and 2013) contains a file called nl.ad (obviously). Exporting one of the older datasets is not automated. Currently you have to export an older dataset to ETSource, rename the dataset folder to nl20xx, rename the corresponding nl.ad to nl20xx.ad and update the attribute area in this nl20xx.ad to nl20xx. I have forgotten the latter an number of times leading to very frustrating debugging. As discussed this morning, we do not update the energy data of older datasets, but we do change the structure such that it is compatible with changes to the graph.

Long story short, would it be possible to come up with a more robust solution to maintaining older datasets on ETSource? Shooting from the hip I can imagine a structure like this:

datasets
|
-- nl
   |
   -- 2012
   -- 2013
   -- 2014

Where only the latest dataset can be selected from the front-end and older datasets are there to support derived datasets that rely on these older datasets.

What do you think, @ploh , @ChaelKruip , @antw ?

@grdw
Copy link
Contributor

grdw commented Jan 26, 2017

What do you think, @ploh , @ChaelKruip , @antw ?

If I can throw in my 2 cents. Can't we solve this with git tags? This will obviously take some changes for ETEngine. I.e. if you select a different start year for NL the correct git tag needs to correspond to the correct packed datafile 🤔 but it might a cool project todo though.

The reason I'm suggesting this is because all the files for an nl dataset will be the same + they'll persist. If the structure of the graph for instance changes than that wouldn't be noticeable in the old nl dataset because the git tag points to the correct commit in the git history.

So you'd have (much like you'd have now):

datasets
-- nl

Except you can do a git checkout tags/nl2012 or something like that.

It might take some people a lesson in advanced git which is a downside to this approach.

@antw
Copy link
Contributor

antw commented Jan 26, 2017

If I can throw in my 2 cents. Can't we solve this with git tags? [...] If the structure of the graph for instance changes than that wouldn't be noticeable in the old nl dataset because the git tag points to the correct commit in the git history.

This sounds confusing to me. If the structure of the graph were to change, what would the workflow look like to update the NL2012 dataset?

@grdw
Copy link
Contributor

grdw commented Jan 26, 2017

This sounds confusing to me. If the structure of the graph were to change, what would the workflow look like to update the NL2012 dataset?

Good question. Isn't it now an issue that the graph 'does' change? If it should change than it's going to be annoying. Not impossible though, but annoying (checking out branch with tag, updating graph, moving tag.. ).

In Joris's setup you don't have that problem.

@antw
Copy link
Contributor

antw commented Jan 27, 2017

Shooting from the hip I can imagine a structure like this:

datasets
-- nl
   -- 2012
   -- 2013
   -- 2014

I quite like this, provided it is applied consistently to all datasets. i.e. the directory structure for datasets becomes: :dataset_key/:analysis_year. I imagine it is fairly easy to support this in the VBA scripts?

datasets
├ de
│ └ 2013
├ nl
│ ├ 2012
│ ├ 2013
│ └ 2014
└ uk
  └ 2013

In an ideal world, ETEngine's API would not differentiate between "nl" and "nl2012", but would instead take an area code an optional start/analysis year, and would map that to the correct dataset in the backend.

@jorisberkhout
Copy link
Member

I quite like this, provided it is applied consistently to all datasets. i.e. the directory structure for datasets becomes: :dataset_key/:analysis_year. I imagine it is fairly easy to support this in the VBA scripts?

I think no changes to the VBA scripts are required. On ETDataset, the very same directory structure already exists. The only thing that needs to be changed is the rake import task, which currently only exports those defined in datasets.yml

In an ideal world, ETEngine's API would not differentiate between "nl" and "nl2012", but would instead take an area code an optional start/analysis year, and would map that to the correct dataset in the backend.

Love it! All in favour of this.

@grdw
Copy link
Contributor

grdw commented Jan 27, 2017

[..] provided it is applied consistently to all datasets.

I have one question; what if for example the onshore_suitable_for_wind is going to change for the nl dataset for all start years? Would you than need to update all the .ad (so 2012/nl.ad, 2013/nl.ad, etc.) files individually? Not that that is of my concern at all, but I'm just wondering how that would be achieved. Would that be a VBA script from an Excel just replacing all those values in all the loose .ad files in each start year folder?

If that is the case than this sounds fine to me. 👍

@jorisberkhout
Copy link
Member

I have one question; what if for example the onshore_suitable_for_wind is going to change for the nl dataset for all start years? Would you than need to update all the .ad (so 2012/nl.ad, 2013/nl.ad, etc.) files individually?

This would never happen. As said before, we never update any data for old datasets (onshore_suitable_for_wind is data), but only maintain them to be in line with changes to the graph. To do this ETDataset, or more specifically, the analysis_manager.xlsm has some nice features to make the users life easier.

@ploh
Copy link

ploh commented Feb 1, 2017

Note to self:

@ChaelKruip
Copy link
Author

@ploh what is the status here?

@ploh
Copy link

ploh commented Feb 2, 2017

@ploh what is the status here?

Yesterday, I tried re-running my Gea trial-migration from last week - now that we have made the initializer inputs separate from the normal inputs. I ran into some trouble with non-existing init. inputs. I will try the same for a Paddepoel scenario next and see how hard it is to fix the potential problems there.

@ploh
Copy link

ploh commented Feb 4, 2017

Update: Even before trying to migrate user_values to init. inputs, I encountered problems when trying to base Paddepoel scenario 607984 on a new derived dataset (without ETE scenario scaling) instead of on nl (with ETE scenario scaling): #2343

@ploh
Copy link

ploh commented Feb 4, 2017

Like I described in #2343 (comment), there are some issues that should probably be addressed before the Paddepoel migration is continued.

@AlexanderWirtz @grdw @ChaelKruip Alternatively, you could just try to execute the migration rake task and inside of it tinker around with the user_values / init. inputs manually (as described in #2343 (comment)) until you think that the remaining gquery differences are small enough.

I am handing this over to you, now. But I will gladly answer questions and explain my findings or the migration rake task in more detail.

@ploh ploh removed their assignment Feb 4, 2017
@AlexanderWirtz
Copy link
Contributor

THis issue is now purely technical and has moved beyond its original scope. I cannot tell if it should remain open. unassigning myself

@AlexanderWirtz AlexanderWirtz removed their assignment Jun 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants