-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting all combinations of characteristic-fraction-speciation-unit in harmonization table #319
Labels
Comments
This won't fix this issue, but it would help reduce total combinations. We could change all NONE to NA for speciation and fraction as part of autoclean, assuming NONE is equivalent to NA.
Cristina A Mullin, PhD (she/her)
Water Data Integration Branch
Watershed Restoration, Assessment, and Protection Division
Office of Wetlands, Oceans and Watersheds
US EPA|Office of Water
***@***.******@***.***>
From: Elise H. ***@***.***>
Sent: Friday, August 4, 2023 5:32 PM
To: USEPA/TADA ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [USEPA/TADA] Getting all combinations of characteristic-fraction-speciation-unit in harmonization table (Issue #319)
Describe the bug
We want to make the harmonization table as complete as possible for nutrients and priority parameters, so that we are providing users with the most support in terms of synonyms and harmonizing data. We used a char-frac-spec combination spreadsheet Kevin pulled from WQX to see the most common combinations of all three, but I created a test for checking to make sure we weren't missing combos using random datasets and found many new combinations that weren't in the WQX spreadsheet. At first, I thought this was because WQX does not account for NWIS char-frac-spec combinations, but I realized that this might be more extensive than just the NWIS data stream: the WQX combinations (and the WQX char validation table) do not consider blanks or NA's in any of the columns. Thus, for characteristics for which a fraction and speciation are NOT required, we would need to separately add all of those combinations to the harmonization template.
To Reproduce
Code to reproduce the behavior:
remotes::install_github("USEPA/TADA", ref = "develop")
library(TADA)
test = TADA_RandomTestingSet()
test1 = TADA_RunKeyFlagFunctions(test)
ref = TADA_GetSynonymRef()
ref_chars = unique(ref$TADA.CharacteristicName)
test_chars = unique(subset(test1, test1$TADA.CharacteristicName%in%ref_chars)[,c("TADA.CharacteristicName","TADA.ResultSampleFractionText","TADA.MethodSpecificationName","TADA.ResultMeasure.MeasureUnitCode")])
test_chars_ref = merge(test_chars, ref, all.x = TRUE)
new_combos = subset(test_chars_ref, is.na(test_chars_ref$HarmonizationGroup))[,c("TADA.CharacteristicName","TADA.ResultSampleFractionText","TADA.MethodSpecificationName","TADA.ResultMeasure.MeasureUnitCode")]
if(dim(new_combos)[1]>0){
print("New combinations found in random dataset test:")
print(new_combos)
}
Expected behavior
Ideally, we could pull all combinations of these characteristics from the water quality PORTAL, where NWIS and WQX mix with all allowable values.
Reminders for TADA contributors addressing this issue
New features should include all of the following work:
* [ ] Create the function/code.
* [ ] Document all code using comments to describe what is does.
* [ ] Create tests in tests folder.
* [ ] Create help file using roxygen2 above code.
* [ ] Create working examples in help file (via roxygen2).
* [ ] Add to appropriate vignette (or create new one).
-
Reply to this email directly, view it on GitHub<#319>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALGLGYG64KQBBA3JIC5KIPLXTVS53ANCNFSM6AAAAAA3EW4DDY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>
|
I cringe a little but if we change all NA to NONE, it would be represented in the validation tables.
Elise Hinman, Ph.D. (she/her)
ORISE Participant
Water Data Integration Branch
Watershed Restoration, Assessment, and Protection Division
Office of Wetlands, Oceans and Watersheds
US EPA|Office of Water
***@***.***
…________________________________
From: Cristina Mullin ***@***.***>
Sent: Monday, August 7, 2023 9:18 AM
To: USEPA/TADA ***@***.***>
Cc: Hinman, Elise (she/her/hers) ***@***.***>; Author ***@***.***>
Subject: Re: [USEPA/TADA] Getting all combinations of characteristic-fraction-speciation-unit in harmonization table (Issue #319)
This won't fix this issue, but it would help reduce total combinations. We could change all NONE to NA for speciation and fraction as part of autoclean, assuming NONE is equivalent to NA.
Cristina A Mullin, PhD (she/her)
Water Data Integration Branch
Watershed Restoration, Assessment, and Protection Division
Office of Wetlands, Oceans and Watersheds
US EPA|Office of Water
***@***.******@***.***>
From: Elise H. ***@***.***>
Sent: Friday, August 4, 2023 5:32 PM
To: USEPA/TADA ***@***.***>
Cc: Subscribed ***@***.***>
Subject: [USEPA/TADA] Getting all combinations of characteristic-fraction-speciation-unit in harmonization table (Issue #319)
Describe the bug
We want to make the harmonization table as complete as possible for nutrients and priority parameters, so that we are providing users with the most support in terms of synonyms and harmonizing data. We used a char-frac-spec combination spreadsheet Kevin pulled from WQX to see the most common combinations of all three, but I created a test for checking to make sure we weren't missing combos using random datasets and found many new combinations that weren't in the WQX spreadsheet. At first, I thought this was because WQX does not account for NWIS char-frac-spec combinations, but I realized that this might be more extensive than just the NWIS data stream: the WQX combinations (and the WQX char validation table) do not consider blanks or NA's in any of the columns. Thus, for characteristics for which a fraction and speciation are NOT required, we would need to separately add all of those combinations to the harmonization template.
To Reproduce
Code to reproduce the behavior:
remotes::install_github("USEPA/TADA", ref = "develop")
library(TADA)
test = TADA_RandomTestingSet()
test1 = TADA_RunKeyFlagFunctions(test)
ref = TADA_GetSynonymRef()
ref_chars = unique(ref$TADA.CharacteristicName)
test_chars = unique(subset(test1, test1$TADA.CharacteristicName%in%ref_chars)[,c("TADA.CharacteristicName","TADA.ResultSampleFractionText","TADA.MethodSpecificationName","TADA.ResultMeasure.MeasureUnitCode")])
test_chars_ref = merge(test_chars, ref, all.x = TRUE)
new_combos = subset(test_chars_ref, is.na(test_chars_ref$HarmonizationGroup))[,c("TADA.CharacteristicName","TADA.ResultSampleFractionText","TADA.MethodSpecificationName","TADA.ResultMeasure.MeasureUnitCode")]
if(dim(new_combos)[1]>0){
print("New combinations found in random dataset test:")
print(new_combos)
}
Expected behavior
Ideally, we could pull all combinations of these characteristics from the water quality PORTAL, where NWIS and WQX mix with all allowable values.
Reminders for TADA contributors addressing this issue
New features should include all of the following work:
* [ ] Create the function/code.
* [ ] Document all code using comments to describe what is does.
* [ ] Create tests in tests folder.
* [ ] Create help file using roxygen2 above code.
* [ ] Create working examples in help file (via roxygen2).
* [ ] Add to appropriate vignette (or create new one).
-
Reply to this email directly, view it on GitHub<#319>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALGLGYG64KQBBA3JIC5KIPLXTVS53ANCNFSM6AAAAAA3EW4DDY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.******@***.***>>
—
Reply to this email directly, view it on GitHub<#319 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A5B72SWOR5AV3L63LKLVYG3XUDTJVANCNFSM6AAAAAA3EW4DDY>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
We could also change all NONEs in the validation table to NA, too. That might be a solid solution. |
That would be my preference, to change NONE to NA in autoclean and in the reference table.
From: Elise H. ***@***.***>
Sent: Monday, August 7, 2023 9:40 AM
To: USEPA/TADA ***@***.***>
Cc: Mullin, Cristina (she/her/hers) ***@***.***>; Comment ***@***.***>
Subject: Re: [USEPA/TADA] Getting all combinations of characteristic-fraction-speciation-unit in harmonization table (Issue #319)
We could also change all NONEs in the validation table to NA, too. That might be a solid solution.
-
Reply to this email directly, view it on GitHub<#319 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALGLGYDBJ77AI5VPD2JQ3YTXUDV4BANCNFSM6AAAAAA3EW4DDY>.
You are receiving this because you commented.Message ID: ***@***.******@***.***>>
|
Ok, got it. Sounds good. Thanks! |
TADA_Autoclean converts all NONE to NA in the fraction and speciation columns prior to validation, and the validation table has NONE set to INVALID. This still does not address additional combos coming in from NWIS (NONE vs NA aside). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
We want to make the harmonization table as complete as possible for nutrients and priority parameters, so that we are providing users with the most support in terms of synonyms and harmonizing data. We used a char-frac-spec combination spreadsheet Kevin pulled from WQX to see the most common combinations of all three, but I created a test for checking to make sure we weren't missing combos using random datasets and found many new combinations that weren't in the WQX spreadsheet. At first, I thought this was because WQX does not account for NWIS char-frac-spec combinations, but I realized that this might be more extensive than just the NWIS data stream: the WQX combinations (and the WQX char validation table) do not consider blanks or NA's in any of the columns. Thus, for characteristics for which a fraction and speciation are NOT required, we would need to separately add all of those combinations to the harmonization template.
To Reproduce
Code to reproduce the behavior:
Expected behavior
Ideally, we could pull all combinations of these characteristics from the water quality PORTAL, where NWIS and WQX mix with all allowable values.
Reminders for TADA contributors addressing this issue
New features should include all of the following work:
Create the function/code.
Document all code using comments to describe what is does.
Create tests in tests folder.
Create help file using roxygen2 above code.
Create working examples in help file (via roxygen2).
Add to appropriate vignette (or create new one).
The text was updated successfully, but these errors were encountered: