Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add projections on the level of individuals #9

Open
dkopasker opened this issue Nov 23, 2022 · 7 comments
Open

ENH: add projections on the level of individuals #9

dkopasker opened this issue Nov 23, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@dkopasker
Copy link
Contributor

Hi @vkhodygo

Here is a data sample for the individual-level dataset.

heed_mortality data sample_23nov22dk.xlsx

@vkhodygo vkhodygo changed the title Individual-level dataset ENH: add projections on the level of individuals Nov 23, 2022
@vkhodygo vkhodygo self-assigned this Nov 23, 2022
@vkhodygo vkhodygo added enhancement New feature or request question Further information is requested labels Nov 23, 2022
@vkhodygo
Copy link
Member

Household numbers

@vkhodygo
Copy link
Member

@dkopasker I'd like to ask you a favour: please, do not distribute any data in *.xlsx files unless it contains some specific formatting and you can't avoid it. It's a relatively minor inconvenience, but still.

I also think we could drop some data. We don't actually need the prob_death column since it has too many empty entries. In addition, if this column is a negation of survive, some of them must be there but are missing, and two columns correlate with each other. Reducing the number of dimensions is always a good thing.

Other than that, we group the data by hh_id and count the total number of entries in each of them. That produces the total number of people per one household. Their respective sum gives you the population/sample size. What are my next steps?

@dkopasker
Copy link
Contributor Author

dkopasker commented Nov 28, 2022

The prob_death column is intended to be populated by estimates from the mortality data you have formed. I think it is worth keeping to error check the code. This can also be checked against counts from the survive column. Having ways to check the data and code is more important, at least at this stage, than reducing dimensions.

The next steps are to assign hh_id to your mortality model such that the relevant sums equal estimates from external data.

@vkhodygo
Copy link
Member

The next steps are to assign hh_id to your mortality model such that the relevant sums equal estimates from external data.

Can I get this data first? At least for Wales/NI as they are relatively small.

@dkopasker
Copy link
Contributor Author

@vkhodygo
Copy link
Member

Those are relevant, but far from complete. I can make some guesses and use those aggregates, but I have no knowledge about the actual household composition. We can easily assume that one-person households are comprised of a single person aged 18 and above. For two people that becomes increasingly more difficult:

  • a couple;
  • a parent and a kid;
  • a parent and an adult kid.

This is clearly the case of combinatorial explosion which requires some external limits to be introduced.

What I meant was something similar to what you had provided originally.

@dkopasker
Copy link
Contributor Author

"Households by type of household and family, regions of England and GB constituent countries" gives more detail on household composition. Beyond this you can make and document assumptions.

ONS data is usually the best quality for population-level statistics, but you could look for other data sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants