Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retired characteristic names - corrected for WQX data but not USGS data #307

Closed
6 tasks
ehinman opened this issue Jul 10, 2023 · 4 comments · Fixed by #365
Closed
6 tasks

Retired characteristic names - corrected for WQX data but not USGS data #307

ehinman opened this issue Jul 10, 2023 · 4 comments · Fixed by #365
Assignees
Labels
Good First Issue Good issue for first time contributors Ref: Harmonization

Comments

@ehinman
Copy link
Contributor

ehinman commented Jul 10, 2023

Describe the bug
There are several characteristic names existing in WQX-submitted data that have been retired in favor of more concise or clearer names. These characteristic names are denoted by [old name]retired use [new name], and have the corrected target name (new name) in the WQX QAQC validation table. The TADA team recently created a new function that automatically substitutes the target name into the TADA.CharacteristicName field. However, I noticed that USGS data may still carry the old name, and without the retired use [new name] suffix in the CharacteristicName field, they do not join to the WQX QAQC validation table and thus do not get corrected.

To Reproduce
I found this issue with the characteristic names: Inorganic nitrogen (nitrate and nitrite) and Inorganic nitrogen (nitrate and nitrite) retireduse Nitrate + Nitrite

Code to reproduce the behavior:

library(TADA)

test = TADA_DataRetrieval(startDate = "2022-10-01", statecode = "OR")
test = subset(test, grepl("Inorganic nitrogen",test$CharacteristicName))
unique(test$CharacteristicName)
unique(test$TADA.CharacteristicName)

Expected behavior
All of the results in the example above should be converted to TADA.CharacteristicName "NITRATE + NITRITE" for data harmonization purposes.

Reminders for TADA contributors addressing this issue

New features should include all of the following work:

  • Create the function/code.

  • Document all code using comments to describe what is does.

  • Create tests in tests folder.

  • Create help file using roxygen2 above code.

  • Create working examples in help file (via roxygen2).

  • Add to appropriate vignette (or create new one).

@ehinman
Copy link
Contributor Author

ehinman commented Aug 17, 2023

Proposed solution: Use WQX characteristic domain table to identify retired characteristics, grepl everything before the retired characters, and join to NWIS data to convert NWIS characteristics to current WQX characteristic structures. This would happen in autoclean.

@cristinamullin cristinamullin added the Good First Issue Good issue for first time contributors label Nov 13, 2023
@hillarymarler
Copy link
Collaborator

I'd like to work on this issue.

@hillarymarler
Copy link
Collaborator

hillarymarler commented Dec 4, 2023

This solution does work (tried on the OR example above as well as with random state and national datasets), but I am not sure if it is similar to what you had in mind originally. I used TADA_GetCharacteristicRef() to create a data frame and then filter for "retired" CharacteristicNames and then modified them to be compatible with the NWIS names. These additional CharacteristicName options can then be joined to the original ref.table of deprecated CharacteristicNames.

 # read in characteristic reference table with deprecation information, filter to deprecated terms and for "retired" in CharactersticName.
  # remove all characters after first "*" in CharacteristicName and remove any leading or trailing white space to make compatible with deprecated NWIS CharactersticName.
  nwis.table <- TADA_GetCharacteristicRef() %>%
    dplyr::filter(
      Char_Flag == "Deprecated",
      grepl("retired", CharacteristicName)
    ) %>%
    dplyr::mutate(CharacteristicName = trimws(stringr::str_split(CharacteristicName, "\\*", simplify = T)[, 1]))

  # read in characteristic reference table with deprecation information and filter to deprecated terms.
  # join with deprecated NWIS CharacteristicName data.frame.
  ref.table <- TADA_GetCharacteristicRef() %>%
    dplyr::filter(Char_Flag == "Deprecated") %>%
    full_join(nwis.table)

  rm(nwis.table)

@cristinamullin
Copy link
Collaborator

This solution makes sense to me. When you are ready you can create a pull request and then I can review and test the code before we merge it in. Happy to walk though this process together on our call later this week.

@hillarymarler hillarymarler linked a pull request Dec 6, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Good issue for first time contributors Ref: Harmonization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants