-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
match_area / area_lookup get out of date #71
Comments
Not me I'm afraid - @jvillacampa wrote the SPARQL. At the time we decided not to have the API query built into the function because we were concerned about performance. The query can be quite slow to run at times on the server and there are the expected timeout issues with the VPN on desktop. It doesn't solve the longer-term issue of how to keep the area lookup up to date but maybe a one-off update of it would provide a temporary fix. |
Yeah, the function is a really nice idea but keeping the data behind it current is a really thorny issue! We could probably set up a GitHub action to regularly (daily/weekly/monthly) run the scripts in data-raw then commit or open a PR to commit changes... I've used actions a bit before on some other repos / packages but never anything that 'custom'. |
Good point raised! I remember this was one of the things Jack and I discussed the most when creating the function. I think the idea of the GitHub action sounds very good (but never done it I am afraid). We could add a warning message to advice users to update their version of the package when using this function to ensure they have the latest lookup? |
I have no time right now but will hopefully have some over Christmas... would you want to look at a warning if I looked at a GH action (I'd first look at opening a monthly PR I think). |
@jvillacampa I was feeling inspired so I was having a look at creating a GH action for this. However, I first tried running the script (as it would have to be totally automated). I noticed quite a few of the areas have new names, including some which are affected by the unicode issue you highlighted in the script. I found a site on the error - https://www.i18nqa.com/debug/utf8-debug.html which also has a handy lookup table - unicode_fix_lookup.csv (I made the csv). I tried to get a nicer way to fix the replacement but this was the best I could do: area_lookup %<>%
dplyr::mutate(area_name =
stringr::str_replace_all(area_name, "ì", "ì") %>%
stringr::str_replace_all("ò", "ò") %>%
stringr::str_replace_all("è", "è") %>%
stringr::str_replace_all("Ù", "Ù") %>%
stringr::str_replace_all("È", "È") %>%
stringr::str_replace_all("Ã\\s", "a") %>%
stringr::str_replace_all("Â", "") %>%
stringr::str_replace_all("’", "'") %>%
stringr::str_squish()) So only slightly more 'hands-free' than your solution... It seems the encoding issue should probably be solved further upstream, either the Scot gov open data is returning the wrong encoding, or |
@Tina815 I tried running the script to get / update the data |
Probably one for @jackhannah95 as he wrote all the fancy SPARQL code!
The intention with area_lookup was that it should be updated with every package release but there haven't been any package releases. There's also the issue of people not updating to the latest version of
phsmethods
.Could the solution be to have
match_area
do the API query live? Ideally, it could cache data so only one call would usually be needed. Another option (not sure how possible these are) could be to include data with the package then check how old it is (or even better check if there is a new version available at the API). If the data in the package is old; use the live call instead.The text was updated successfully, but these errors were encountered: