-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem pulling more than 10,000 clientIDs #283
Comments
One other note: I have another view that saves the clientID to via a customTask to custom dimension 19. If I use that instead of clientID, I am able to get 51,436 results when I pull with ga:dimension19, ga:pagePath. |
I can confirm this, and its weird. Can you see if it occurs via another library such as GA query explorer: https://ga-dev-tools.appspot.com/query-explorer/ If so, then its an API bug to be reported to Google |
Query explorer doesn't have clientID dimension yet either. But I tried it also with a property that is capturing clientId in dimension75 and it downloaded it no issue 1million entries (max total) but same viewId only 10,000 available via clientId dimension. For now stick to using the dimension if you have it, but I think this is a bug to report to Google. |
I put it on this issue, if you have more details please put them there https://issuetracker.google.com/issues/142795352 |
The second page of the API always seems to only return 1 row, and the total rows is always 10001 Change the viewId and try on your own viewId here: |
Thanks for looking into this! When I run that wiht a different ViewID I definition get more data and it ends with nextPageToken": "19999" |
Hello. A have same problem. Any changes here? |
No updates yet, its within the API the issue occurs so need to wait for any updates there. |
@MarkEdmondson1234 thanks. Do you have link to issue in API? Is working on this issue started? |
Not really, the private thread at the link is just with an example. It is officially in beta, so if it comes out of beta it should appear in the GA API news feed or then be available in the query explorer online. |
It looks like this may be a limit per API call, so if the API call is broken down into calls under 10k rows you can get all data. If this is confirmed I'll auto do this in the function. To do this something like below should work: ga_call <- function(date_range, ...){
per_day <- seq(date_range[[1]]), date_range[[2]], by = 1)
calls <- lapply(per_day, function(x){
message("Fetching: ", x)
google_analytics(..., date_range = c(x, x))
})
Reduce(rbind, calls)
}
my_date_range <- c(as.Date("2019-01-01"), as.Date("2020-01-01"))
per_call(date_range = my_date_range, {put other google_analytics() arguments here}) It may be doable by using google_analytics({etc}, slow_fetch=TRUE) This won't help if you have more than 10k users a day. That will need to wait for the API to update. |
Awesome. Thank you for continuing to look at this, @MarkEdmondson1234 ! I look forward to hearing if you get confirmation and are able to update the function as you note. |
Maciej Franas has this work around if you need more than 10k a day:
for(i in 1:length(z)){
day <- as.list(rep(0, 24))
for(j in 1:length(hours)){
day[[j]] <- google_analytics(view,
date_range =c(z[i], z[i]),
dimensions = c('ga:deviceCategory',
'ga:clientId',
'ga:hour'),
metrics = c('ga:users'),
filtersExpression = paste0('ga:users>0;ga:hour==',hours[j]),
anti_sample = T
)
}
output[[i]]<- do.call('rbind', day)
}
|
Thanks to the work around for this! |
Anyone try this lately? I have tried it a couple of times tonight and I have not been running into the limit. I wonder if this has been magically fixed!! |
Agreed, I'm no longer hitting the 10,000 row limit when pulling clientId. No official update published in the API changelog though: https://developers.google.com/analytics/devguides/changelog |
I notice that I still run into the limit at certain times - I don’t have a great feel for when it works and when it doesn’t. |
@everleazy is still getting the limit: Hello. Faced a problem - I can't get more than 10,000 rows when requesting data from a client. Tried with the parameter "max" (-1), anti_sample. |
I am finding that I get limited out if I pull today or yesterday’s client IDs. But if I pull 2 or more days in the past, it seems to work great. Not sure why this happens. |
Hi! I've always downloaded client id and got limitless results, but suddenly I just got 10,000 lines. |
For the past 9 months or so, I have been able to pull clientId without much problem. However, starting yesterday, it appears like I am hitting the 10k limit once again. |
I have same problem, and one interesting thing, when remove from my code "campaign", code work just fine. |
@PedjaV Great find! Just tried without campaign code and it appears to work well. Bummer I cant get campaign code but easier than pulling everything by the hour for now. |
I still can't get more than 10k rows when using just |
Yes. Its not really a bug more a beta feature that may or may not be supported in the future. Best is to put your client.id into a custom dimension and pull that instead. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The only thing that mostly helps me to overcome this limit was to write my R-code like this:
|
So looping over hours works? |
@paullevchuk , I do not speak R, but do I understand correctly, you are using an hour-based filter, collecting data for each hour 0-23, and merging everything? |
From what I learned working with GA API there is a limit: 10K clientId per request call. So switching from day granularity to hour helps me to get accurate data. But even in such case during the day I have one hour when my data has more than 10K clientId. When that's the case googleAnalyticsR package unfortunately just did only 2 request and I got something like 10001 records. |
Yes. But this is not a filter, it's dimension |
It's a similar approach as a comment up in the thread. It's an api issue so the same problem/solution should work with other GA SDKs |
What goes wrong
I am trying to pull the number of page views that each clientID has over a period of time (longer-term goal is to get ga:clientID and ga:pagePath together, but I run into this issue in that report, too), I'm getting just a result of 10,000 rows. I get this no matter the date range and GA tells me I have more users than that in the time period. I'm able to get dimensions to report more rows, but for some reason this one is limiting me?
I had max=-1 in before and just tried 999999 based on previous posts.
Steps to reproduce the problem
Expected output
Actual output
##Before you run your code, please run:
2019-10-15 07:47:11> Multi-call to API 2019-10-15 07:47:12> Calling APIv4.... 2019-10-15 07:47:12> Multiple v4 batch 2019-10-15 07:47:12> Fetching v4 data batch... 2019-10-15 07:47:12> Request: https://analyticsreporting.googleapis.com/v4/reports:batchGet/ 2019-10-15 07:47:12> Body JSON parsed to: {"reportRequests":[{"viewId":"ga:1911XXXXX","dateRanges":[{"startDate":"2019-10-12","endDate":"2019-10-14"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:clientID"}],"metrics":[{"expression":"ga:pageviews","alias":"ga:pageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"0","pageSize":10000,"includeEmptyRows":true},{"viewId":"ga:1911XXXXX","dateRanges":[{"startDate":"2019-10-12","endDate":"2019-10-14"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:clientID"}],"metrics":[{"expression":"ga:pageviews","alias":"ga:pageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"10000","pageSize":10000,"includeEmptyRows":true},{"viewId":"ga:1911XXXXX","dateRanges":[{"startDate":"2019-10-12","endDate":"2019-10-14"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:clientID"}],"metrics":[{"expression":"ga:pageviews","alias":"ga:pageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"20000","pageSize":10000,"includeEmptyRows":true},{"viewId":"ga:1911XXXXX","dateRanges":[{"startDate":"2019-10-12","endDate":"2019-10-14"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:clientID"}],"metrics":[{"expression":"ga:pageviews","alias":"ga:pageviews","formattingType":"METRIC_TYPE_UNSPECI....[{"expression":"ga:pageviews","alias":"ga:pageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"980000","pageSize":10000,"includeEmptyRows":true},{"viewId":"ga:1911XXXXX","dateRanges":[{"startDate":"2019-10-12","endDate":"2019-10-14"}],"samplingLevel":"DEFAULT","dimensions":[{"name":"ga:clientID"}],"metrics":[{"expression":"ga:pageviews","alias":"ga:pageviews","formattingType":"METRIC_TYPE_UNSPECIFIED"}],"pageToken":"990000","pageSize":9999,"includeEmptyRows":true}]} 2019-10-15 07:49:22> Downloaded [10000] rows from a total of [10001].
Session Info
Please run
sessionInfo()
so we can check what versions of packages you have installed`Session Info
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] factoextra_1.0.5 cluster_2.0.7-1 skmeans_0.2-11
[4] forcats_0.4.0 stringr_1.4.0 dplyr_0.8.2
[7] purrr_0.3.2 readr_1.3.1 tidyr_0.8.3
[10] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.2.1
[13] googleAnalyticsR_0.6.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 cellranger_1.1.0 pillar_1.4.2 compiler_3.4.4 googleAuthR_0.8.0
[6] tools_3.4.4 digest_0.6.20 packrat_0.5.0 clue_0.3-57 lubridate_1.7.4
[11] jsonlite_1.6 memoise_1.1.0 nlme_3.1-137 gtable_0.3.0 lattice_0.20-38
[16] pkgconfig_2.0.2 rlang_0.4.0 cli_1.1.0 rstudioapi_0.10 curl_3.3
[21] ggrepel_0.8.1 haven_2.1.1 withr_2.1.2 xml2_1.2.2 httr_1.4.0
[26] askpass_1.1 generics_0.0.2 hms_0.4.2 grid_3.4.4 tidyselect_0.2.5
[31] glue_1.3.1 R6_2.4.0 readxl_1.3.1 modelr_0.1.4 magrittr_1.5
[36] backports_1.1.4 scales_1.0.0 rvest_0.3.4 assertthat_0.2.1 colorspace_1.4-1
[41] stringi_1.4.3 openssl_1.4 lazyeval_0.2.2 munsell_0.5.0 slam_0.1-45
[46] broom_0.5.2 crayon_1.3.4
The text was updated successfully, but these errors were encountered: