Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data download is unreliable and sometimes (!) yields incomplete data #32

Open
arne1921KF opened this issue Nov 1, 2020 · 6 comments
Open

Comments

@arne1921KF
Copy link

Today (2020-01-11), timeseries data downloaded via usual get_RKI_timeseries() with standard parameter url = https://opendata.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0.csv" delivers only some data from Hamburg, Schleswig-Holstein and Niedersachsen.

The page https://hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0 informs they are currently changing the DL options, and https://www.arcgis.com/home/item.html?id=f10774f1c63e40168479a1feb6c7ca74 should currently be used.

The DL link there is currently hidden on the page behind the links/buttons.

@nevrome
Copy link
Owner

nevrome commented Nov 1, 2020

@stschiff already observed a similar issue last week. Has solved itself overnight. Maybe we have to switch to the alternative download option eventually, but for now I suggest to wait once more.

@nevrome
Copy link
Owner

nevrome commented Nov 2, 2020

So right now it seems to work again:

> rki_timeseries <- get_RKI_timeseries()
> unique(rki_timeseries$Bundesland)
 [1] "Brandenburg"            "Bayern"                
 [3] "Niedersachsen"          "Nordrhein-Westfalen"   
 [5] "Baden-Württemberg"      "Saarland"              
 [7] "Rheinland-Pfalz"        "Schleswig-Holstein"    
 [9] "Hessen"                 "Hamburg"               
[11] "Bremen"                 "Sachsen"               
[13] "Thüringen"              "Berlin"                
[15] "Mecklenburg-Vorpommern" "Sachsen-Anhalt" 

@arne1921KF
Copy link
Author

....and gone again. Now they changed something in the data itself, it seems. I get parsing failures. Looks like the date columns changed. That breaks your code.

I hate it when data providers do this.

@nevrome nevrome changed the title download of timeseries data seems to be moving Data download is unreliable and sometimes (!) yields incomplete data Nov 4, 2020
@nevrome
Copy link
Owner

nevrome commented Nov 4, 2020

Hm - can't confirm right now. Seems to work again.

But I get the feeling this download feature breaks multiple times a day. Maybe it's because the file grew to >55mb and the way we download it is just not suitable any more.

Maybe we should copy it automatically to an extra branch here on github once a day and point the default path of get_RKI_timeseries to our mirror.

@nevrome nevrome pinned this issue Nov 4, 2020
@arne1921KF
Copy link
Author

Aaaaand dead again. Only Schleswig-Holstein present in the timeseries. Has been like this at 5 am, when my bot tried to pull the current data. Is still the case at 9 am.

A git of the data would be rad. I seriously would like to know why the RKI isn't doing this themselves: just pushing the data to github, as soon as it is in. Like that, the dataset would even be transparent for monitoring changes directly using versioning.

@nevrome
Copy link
Owner

nevrome commented Nov 16, 2020

I merged #34 now to permanently enable the download from the alternative source. This seems to be more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants