Skip to content
This repository has been archived by the owner on Jul 12, 2020. It is now read-only.

Pagination with MAX_ITEM_COUNT -1 #115

Open
Mortana89 opened this issue Sep 27, 2019 · 3 comments
Open

Pagination with MAX_ITEM_COUNT -1 #115

Mortana89 opened this issue Sep 27, 2019 · 3 comments

Comments

@Mortana89
Copy link

Currently Cosmonaut does not allow fetching data with pagination without defining an initial page size (as first request). This is against a recommendation we received from Cosmos DB:
You are using the query page size of 100 for queries for your Azure Cosmos DB container. We recommend using a page size of -1 for faster scans

Upon following the following article: https://docs.microsoft.com/nl-nl/azure/cosmos-db/sql-api-query-metrics#max-item-count

It's clear that it's better to specify -1 as our max_item_count for the first fetch, so Cosmos DB can determine what's the max response set. Further down the road we can then use continuation token and follow normal path.

However, the SDK currently does not support specifying -1 due to the following lines:

Can these be changed to also support -1 as page size, as such cosmos DB can determine the max size?

@Elfocrash
Copy link
Owner

This best practice must be new. It definately wasn't the recommendation before. I've personally seen scenarios where setting this to -1 can cause some serious performance issues.. I am willing to change that but if you change the pageSize to -1 but I will also need to benchmark the behavior. Can you give me a hand by branching Cosmonaut and testing it with the -1 pageSize?

@Mortana89
Copy link
Author

Mortana89 commented Sep 30, 2019

I've done some tests. It's difficult to say and surely depends upon scenario:

MaxItemCount 100   Items 1665
         
Call Time (ms) Items RU  
1 331 100 14,37  
2 260 100 14,78  
3 529 100 12,61  
4 270 100 13,85  
5 211 100 12,08  
6 182 100 12,13  
7 182 100 12,05  
8 250 100 11,93  
9 197 100 11,76  
10 459 100 12,29  
11 259 100 12,47  
12 192 100 12,46  
13 314 100 12,18  
14 207 100 12,52  
15 257 100 12,48  
16 257 100 12,26  
17 324 65 8,94  
  4681 1665 211,16  
         
    Spread out RU (Total RU / time in seconds) 45,11002  
         
         
MaxItemCount -1      
1 694 1665 167,48  

The advantage of using fixed bucket size is of course the spread in RU consumption.
When using the maxitemcount of -1, you'll get bigger RU spikes which could also cause performance issues if the RU spike is higher than the available RU on the collection, which causes throttling etc etc...

An advantage of using the second approach, is the network call time adds up when a user would request many records in a listpage. But in our case it would take more than 300 items before it gets interesting to collect more data at once.

PS; those are local development timings ;)

@Elfocrash
Copy link
Owner

Thanks for this @Mortana89.

When using the maxitemcount of -1, you'll get bigger RU spikes which could also cause performance issues if the RU spike is lower than the available RU on the collection, which causes throttling etc etc

This is the reason why I disabled that in the first place. I could remove the -1 limitation to allow for this usecase but I am not sure if that's the right approach to this.
I will re-wire the page extensions to use the added Skip and Take support in the SDK so I think that this whole issue will probably go away.

I will put this on hold until I settle on the approach.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants