Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puma argument not working with 5-year 2018-2022 ACS PUMS #555

Open
walkerke opened this issue Jan 26, 2024 · 8 comments
Open

puma argument not working with 5-year 2018-2022 ACS PUMS #555

walkerke opened this issue Jan 26, 2024 · 8 comments

Comments

@walkerke
Copy link
Owner

The 2022 PUMS uses the new 2020 PUMAs for the first time. This will be an issue for the next few years, as the samples prior to 2022 use the 2010 PUMAs, but the samples 2022 and later use the 2020 PUMAs.

Census doesn't reconcile this in the data; instead it marks unused PUMA definitions as -0009. See here:

image

For prior years, we just throw an error message when PUMAs are requested and don't attempt to deal with it. See here: https://github.com/walkerke/tidycensus/blob/master/R/pums.R#L99-L102

This feels unsatisfactory to me as this will impact everything through the 2021-2025 ACS 5-year (which will be released in 2027!).

I'd like to think through how to handle this appropriately.

For users: you can still pass a vector of PUMAs using variables_filter to either PUMA10 or PUMA20 in the new data. Though it may make sense for you to pull first by state (which I know is a hefty download for TX / CA) then filter carefully within R.

@walkerke
Copy link
Owner Author

Handling with an error for now: https://github.com/walkerke/tidycensus/blob/master/R/pums.R#L99-L106. Need to think through a better solution, though.

@mtworth
Copy link

mtworth commented Feb 1, 2024

This is a bummer. I submitted a ticket to Census. Perhaps we can encourage others? This effectively makes microdata at the PUMA level useless for the next several years, as I understand it.

@walkerke
Copy link
Owner Author

walkerke commented Feb 1, 2024

Yeah - I'd like to come up with some sort of solution, though it seems like it'd be a novel one as I don't see PUMA reconciliation done by Census or by IPUMS. We possibly could come up with a crosswalk between the two. The tricky thing is that while this is easier for faster-growing areas (PUMAs are typically split into multiple new PUMAs) to go backwards from 2020 to 2010, it is harder for slower-growing areas where PUMAs are consolidated or re-organized (common in rural areas).

@mtworth
Copy link

mtworth commented Feb 1, 2024

How about using Geocorr?

https://mcdc.missouri.edu/applications/geocorr2022.html

@mtworth
Copy link

mtworth commented Feb 8, 2024

@walkerke see Census guidance here. Page 14, section F.

I can try to take a stab at a PR in the next month if that's of interest.

@elisemarie1120
Copy link

Hey @walkerke.
I've noticed a problem when trying to simply to create a data frame of housing only variables when I include the VACS (vacancy) variable. When I add VACS, or even when I try to ONLY pull VACS, it throws an error:
Downloading: 2 MB Error in dplyr::mutate():
! Can't transform a data frame with duplicate names.
Run rlang::last_trace() to see where the error occurred.

When I'm trying to pull any other variable, either alone or in a combination up to 30 variables and I do NOT include VACS, it downloads just fine. But when I add VACS and even when I add return_vacant=true, it still won't run.
I'm wondering if this issue is one of many with the new PUMA geographies being used in 2022?
When I select year=2021, VACS works and gives me options b - 7.

Just flagging this. I am a pretty new user so I thought it was something I was doing, but after a day of investigation, I think the problem is with the new geographic designations.

@walkerke
Copy link
Owner Author

walkerke commented Mar 4, 2024

@elisemarie1120 This doesn't have anything to do with PUMA geographies I don't think; see #560 .

Try re-installing from GitHub with remotes::install_github("walkerke/tidycensus") and see if your code now runs, I just pushed a couple fixes.

@kirstinhuiber-ac
Copy link

Handling with an error for now: https://github.com/walkerke/tidycensus/blob/master/R/pums.R#L99-L106. Need to think through a better solution, though.

@walkerke, hello! I work for a county-level local health department, and because get_pums doesn't have a "county" argument, I have been filtering PUMS data using the "puma" argument (regardless of whether I map or summarize to a PUMA level). And because a county is a smaller geography and we want to look at race subpopulations, we almost exclusively use the 5-year surveys. So this issue is affecting us greatly; I worry that we won't be able to see 5-year data specific to our county more recent than 2017-2021 until 2026, which isn't ideal.

Is there a way you could add a "county" argument to "get_pums"? Is there a geography nesting problem? I would imagine that PUMAs are nested within county boundaries, but I don't know that. Relying on filtering on the PUMA values when they are not consistent across the 5-year period for half of every decade delays us being able to access data (particularly around social determinants of health) that are important to us.

I appreciate any other advice you have to give as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants