Fix wqp pulls #11

limnoliver · 2019-09-26T21:59:55Z

This PR fixes old issues where a subset of calls to WQP fail. Implemented the POST script from Laura/Jordan, which seems to fix the issues. The POST solution is faster - each pull to WQP is 1-2 minutes (times 365 pulls), which seemed more like 3-4 minutes with dataRetrieval::readWQPdata.

Additionally, the workflow now uses a combiner, and does not put intermediate files into the shared cache. The combined file is now stored here.

… if this is a conservative estimate.

…umn for lat/long. Also generalized the "bad orgs" that are tripping up calls to WQP

…les.

…e limit. Now, it filters partition to those with n sites < 1000 and then finds the partition with the smallest record count. Partitions with >1000 sites were tripping the POST call up.

…POST to retrieve data if that fails.

…n tries readWQPdata.

…rance and upping n obs allowed in single call.

…s. Now WQP data in one file.

…e out of date according to remake, but are essentially up to date and do not need a repull. Used sc_bless!

… done yet, particularly for WQP, but this is first stab. End of 5_munge step puts everything into single file, except ignores wqp with depth measures for now.

limnoliver · 2019-10-01T17:17:59Z

@aappling-usgs -- okay, ready to go! I've added a 5_munge step to QA (first rough cut), reduce to dailies (first rough cut), and combine all of the data together. The final combined file is daily_temperatures.rds.

aappling-usgs · 2019-10-01T18:07:42Z

1_wqp_pull/cfg/wqp_partition_config.yml

@@ -3,5 +3,5 @@
 target_inv_size: 1000

 # Approximate maximum number of records to pull in each call to WQP.
-# Recommended number is 500000 unless that causes exceedance of max URL size allowed by the WAF
+# Recommended number is 25000 unless that causes exceedance of max URL size allowed by the WAF


Wow, that's a lot smaller! Is this still required even after the improvements to WQP this year?

Oh, and I note that below it's 250,000 whereas in this comment it's 25,000. 250,000 isn't that much smaller.

Yeah this was originally just an oversight on my part (copied over from NWIS) but I don't think we actually know what this number is.

aappling-usgs

Looks good! I have only one issue of real concern below (filtering temps before converting F to C), but there's enough stuff to talk about that I'll leave this PR open while we discuss.

aappling-usgs · 2019-10-01T18:08:10Z

1_wqp_pull/cfg/wqp_partition_config.yml

@@ -3,5 +3,5 @@
 target_inv_size: 1000

 # Approximate maximum number of records to pull in each call to WQP.
-# Recommended number is 500000 unless that causes exceedance of max URL size allowed by the WAF
+# Recommended number is 25000 unless that causes exceedance of max URL size allowed by the WAF


Oh, and I note that below it's 250,000 whereas in this comment it's 25,000. 250,000 isn't that much smaller.

aappling-usgs · 2019-10-01T18:12:03Z

1_wqp_pull/src/wqp_inventory.R


  # filter out site types that are not of interest

  wqp_inventory <- wqp_inventory %>%
    filter(!(ResolvedMonitoringLocationTypeName %in% wqp_pull_params$DropLocationTypeName))

+  # filter out bad org names


these are bad names because WQP queries can't handle them, right? might want to add a comment on that if you still have documentation or memory of what errors you were getting.

Yep - cause WQP to fail. Have added a comment.

aappling-usgs · 2019-10-01T20:47:40Z

1_wqp_pull/src/wqp_pull.R

+    wqp_dat_time <- tryCatch(
+      {
+        time_start <- Sys.time()
+        wqp_dat <- wqp_POST(wqp_args)


Did you end up getting any errors with wqp_POST? And if so, did they also show up with readWQPdata?

Yes - sometimes you do. It's mostly because the call times out (according to Jim) - but is successful once the call is made again.

It didn't look like any of the calls were made with readWQPdata.

aappling-usgs · 2019-10-01T20:51:15Z

1_wqp_pull/src/wqp_pull.R

+  file1 <- tempdir()
+  doc <- utils::unzip(temp, exdir=file1)
+  unlink(temp)
+  retval <- suppressWarnings(read_delim(doc, 


FWIW, my personal preference is to add the newline right after the parenthesis so that subsequent lines can start with a 2-character indent, which leaves them a lot more room to be complete lines themselves. I think both spacing/indent patterns are pretty common on our team, but just in case you've deep down always wanted to do it my way, here's my vote of support =)

Yeah I think I like your style. This was just copy and pasted :) so thanks for paying attention to style!

aappling-usgs · 2019-10-01T20:55:37Z

1_wqp_pull/src/wqp_pull.R

+            httr::write_disk(temp))
+
+  headerInfo <- httr::headers(x)
+  file1 <- tempdir()


temp, file1, and doc are pretty darn hacky variable names

Agreed. Modified to be more descriptive. (Also forced me to really understand what was going on here!)

aappling-usgs · 2019-10-01T21:14:45Z