Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting all combinations of characteristic-fraction-speciation-unit in harmonization table #319

Open
6 tasks
ehinman opened this issue Aug 4, 2023 · 7 comments

Comments

@ehinman
Copy link
Contributor

ehinman commented Aug 4, 2023

Describe the bug

We want to make the harmonization table as complete as possible for nutrients and priority parameters, so that we are providing users with the most support in terms of synonyms and harmonizing data. We used a char-frac-spec combination spreadsheet Kevin pulled from WQX to see the most common combinations of all three, but I created a test for checking to make sure we weren't missing combos using random datasets and found many new combinations that weren't in the WQX spreadsheet. At first, I thought this was because WQX does not account for NWIS char-frac-spec combinations, but I realized that this might be more extensive than just the NWIS data stream: the WQX combinations (and the WQX char validation table) do not consider blanks or NA's in any of the columns. Thus, for characteristics for which a fraction and speciation are NOT required, we would need to separately add all of those combinations to the harmonization template.

To Reproduce

Code to reproduce the behavior:

remotes::install_github("USEPA/TADA", ref = "develop")
library(TADA)

 test = TADA_RandomTestingSet()
  test1 = TADA_RunKeyFlagFunctions(test)
  ref = TADA_GetSynonymRef()
  ref_chars = unique(ref$TADA.CharacteristicName)
  test_chars = unique(subset(test1, test1$TADA.CharacteristicName%in%ref_chars)[,c("TADA.CharacteristicName","TADA.ResultSampleFractionText","TADA.MethodSpecificationName","TADA.ResultMeasure.MeasureUnitCode")])
  test_chars_ref = merge(test_chars, ref, all.x = TRUE)
  new_combos = subset(test_chars_ref, is.na(test_chars_ref$HarmonizationGroup))[,c("TADA.CharacteristicName","TADA.ResultSampleFractionText","TADA.MethodSpecificationName","TADA.ResultMeasure.MeasureUnitCode")]
  if(dim(new_combos)[1]>0){
    print("New combinations found in random dataset test:")
    print(new_combos)
  }

Expected behavior

Ideally, we could pull all combinations of these characteristics from the water quality PORTAL, where NWIS and WQX mix with all allowable values.

Reminders for TADA contributors addressing this issue

New features should include all of the following work:

  • Create the function/code.

  • Document all code using comments to describe what is does.

  • Create tests in tests folder.

  • Create help file using roxygen2 above code.

  • Create working examples in help file (via roxygen2).

  • Add to appropriate vignette (or create new one).

@cristinamullin
Copy link
Collaborator

cristinamullin commented Aug 7, 2023 via email

@ehinman
Copy link
Contributor Author

ehinman commented Aug 7, 2023 via email

@ehinman
Copy link
Contributor Author

ehinman commented Aug 7, 2023

We could also change all NONEs in the validation table to NA, too. That might be a solid solution.

@cristinamullin
Copy link
Collaborator

cristinamullin commented Aug 7, 2023 via email

@ehinman
Copy link
Contributor Author

ehinman commented Aug 7, 2023

Ok, got it. Sounds good. Thanks!

@ehinman
Copy link
Contributor Author

ehinman commented Aug 15, 2023

TADA_Autoclean converts all NONE to NA in the fraction and speciation columns prior to validation, and the validation table has NONE set to INVALID. This still does not address additional combos coming in from NWIS (NONE vs NA aside).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants