-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add water type + speed up fetchATTAINS(); {arcgislayers}-ify fetchNHD #536
Conversation
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
WaterType update + {arcgislayers}-ification
Minor mods based on MB suggestions
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
Co-authored-by: Matt Brousil <[email protected]>
- adds missing dependencies - clarify TADA_GetATTAINS and TADA_MakeSpatial docs - small improvements to TADAModule2.Rmd, including removal of eval = F on certain chunks (this part can be reverted if needed)
Added two commits to bring in upstream changes from the develop branch and in addition:
|
TADA_GetATTAINS may cause the dataframe to grow (contain more rows than the original TADA dataframe) because Water Quality Portal observations can fall within an NHD catchment that contains more than one ATTAINS assessment unit. The new, “index”, column links these duplicate observations for sites that fall within more than one AU. This is important. Should we print a message for users when this occurs? I can foresee users running into subsequent analysis issues if they are not aware (don’t really read the documentation). Also, is the column name, “index”, too vague? Any ideas on how to make this more user friendly? How about TADA.SameResultMultipleAUs ? Currently, it is placed in at the start of the df, is that where we want it to go? |
Some topics that @cristinamullin and I discussed to think through:
|
Are the sites falling within more than one AU? Or just falling within a catchment that contains more than one AU? I am wondering if we could modify the function so that the AUID/catchment are added, but no duplicates are created. I am still working on my review of this PR, but I'll keep thinking about this as I do. |
I agree that renaming "index" to something more explanatory could help. Would this sort of identification-tracking be useful across all of TADA, and therefore something that could be added to all TADA WQP pulls? Something like "TADA.ID" that is the very first column... just a thought! Otherwise, I think moving the column to be with the other ATTAINS columns makes sense. I also really like the idea of printing a message when this duplicate observation-situation happens and would be an easy thing to add! |
To reply quickly/shortly to this - you are correct that this happens when sites are falling within a catchment that contains more that one AU, not necessarily that the site is falling into multiple raw ATTAINS features. |
@katiehealy @wokenny13 @hillarymarler |
Actually now that you mention it, there is a WQP column, "ResultIdentifier", that includes a unique ID for every observation. I believe it should serve the same function as the new "index" column. |
I originally assumed that this occurs where states have overlapping AU's (overlapping raw ATTAINS features). That also does occur. However this scenario, when sites are falling within a catchment that contains more that one AU, could happen frequently (e.g. at tributaries). I think is a point in the workflow where we need a flag of some sort, so a user is able to decide what to do with a site that could be matched with multiple AU's (same catchment). A) Do they pick one of the multiple AU's to assign that site to (would the current function facilitate being able to do that in a future R Shiny app?)), B) Do they reuse the data for both AUs (that seems to be what the current functionality would enable). Note: We plan to leverage these functions for the development of a companion R Shiny app in the future that would help users review/QC these associations. |
I have been thinking quite a bit about whether we should be incorporating the catchment information from the ATTAINS webservices as the default option in TADA. As the catchments themselves are not involved in the assessment process (states are free to define assessment units at the catchment level, but this would then be reflected in the assessment unit geometry). I can understand how having the catchment info available would be useful for other purposes, but don't think it belongs in the assessment workflow, especially if it is creating duplicate records that need to be identified and dealt with. I chatted about this briefly with Wendy and she verified that catchments are used for the purposes of summarizing assessment data, but it is a behind the scenes process and should not interfere with state assessment methods. |
I feel that using the catchments to grab AUs is a good way to be more confident that the raw AUs are within the same drainage area/more likely associated with the same NHD feature as the WQP observation. But, I do agree that the raw ATTAINS features are the most important features returned in
A completely different approach could be to get rid of the use of catchments altogether. In this workflow we could grab the nearest raw ATTAINS features to each WQP observation based on some search radius (maybe user-defined?) and/or all the AUs within that search radius, then create a column identifying their distances away like in the approach above... Also happy to discuss this over a call if that would be easier! |
@kathryn-willi I think your proposed solution of a return_nearest = TRUE argument sounds great, as it would both keep the benefits of using catchments to grab AUs and remove the issue of duplicate results when a WQP observation falls in a catchment with multiple AUs. Most of our assessment automating users will want only the nearest AU returned, so this sets things up nicely for them while still allowing other users to return the additional catchment/AU information if they would like to for another purpose. @cristinamullin - what do you think? |
I agree and think we should proceed with the proposed solutions! In addition, we should make sure to capture our logic/decisions made here either in the function documentation or the Module 2 vignette. Thank you all for the review & discussion! |
Should we merge this PR and I can submit a new one with these updates, or should I try to incorporate them into this one? |
kathryn-willi
Hi EPA TADA team!
This PR addresses the following:
Reverted fetchNHD() code back to using {arcgislayers}. This CRAN package should now be stable, and speeds up the download quite drastically!
Added an argument to fetchATTAINS() to only download raw ATTAINS features if explicitly asked for. This speeds up instances in TADA_GetATTAINS() where that raw data is not requested.
Included new code in fetchATTAINS() that adds water type info to each assessment unit (closes #520).
Ensured {test_that} tests work across our geospatial functions.
Looking forward to your review!
Katie