Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Proposal] Compute cursor for each document #156

Open
simararneja opened this issue Apr 22, 2020 · 18 comments
Open

[Feature Proposal] Compute cursor for each document #156

simararneja opened this issue Apr 22, 2020 · 18 comments

Comments

@simararneja
Copy link

Feature request & proposal

Highlighting the relay specification for the cursor based pagination which specifies a cursor for each document. As things are today, this library returns pagination metadata as separate fields (next, previous, etc..) which points to the last document of the list.

I propose to add an additional field _cursor for each document which will allow the user to move in forward/backward direction from any node in a list.

Example -

{
 "results":
   [ 
      { 
        " _id" : "580fd16aca2a6b271562d8bb", 
        "counter": 4, 
        "_cursor": "eyIkb2lkIjoiNTgwZmQxNmFjYTJhNmIyNzE1NjJkOGJiIn0" },
      {
       " _id": "580fd16aca2a6b271562d8ba",
       "counter": 3, 
        "_cursor": "eyIkb2lkIjoiNTgwZmQxNmFjYTJhNmIyNzE1NjJkOGJhIn0" 
      } 
  ],
  "next": "eyIkb2lkIjoiNTgwZmQxNmFjYTJhNmIyNzE1NjJkOGJhIn0",
  "hasNext": "true" 
}
@simararneja
Copy link
Author

Let me know your thoughts on this. If this seems ok, don't mind opening a PR for this.

@bradvogel
Copy link
Contributor

Interesting and could be useful. Would this be a replacement for the current API? What are the advantages of having _cursors on each result?

@simararneja
Copy link
Author

What are the advantages of having _cursor on each result

  1. A step closer to relay specification for connection based pagination for backend services ;- )

  2. Allows to paginate forwards and backwards from any node - also virtually assists in slicing and add more to the result

  3. Allows to cache smaller lists or section of a result for faster loading of User interfaces. Example -For scrolling pagination it helps to fetch the rest of the list (or more) without having to cache the whole result and metadata.

Would this be a replacement for the current API?

I don't foresee it to be a replacement, more like an addition to the existing. Can be optionally enabled by adding another param say computeCursorOnEveryNode( or a better name) of type boolean (default can be false since it does additional computation post retrieving the results from db).

@bradvogel
Copy link
Contributor

Yes, sounds useful. I'd accept a PR. Perhaps for now make it an option?

1 similar comment
@bradvogel
Copy link
Contributor

Yes, sounds useful. I'd accept a PR. Perhaps for now make it an option?

@ScreamZ
Copy link

ScreamZ commented May 9, 2020

Also, interest in, relay is very opinionated in terms of data structure that's why @simararneja is talking about cursor per document.

Relay is driving pagination spec for Graphql, a lot of services take it as "best practice". Having the ability to use document cursor would help to implement relay compliant webservers

@ScreamZ
Copy link

ScreamZ commented May 9, 2020

Also is there any way to get the total count instead of HasNext ? I can mimic that by running countDocuments with the query before the Mongo query but maybe you have another suggestion ? :)

@bradvogel
Copy link
Contributor

bradvogel commented May 9, 2020

Also is there any way to get the total count instead of HasNext ? I can mimic that by running countDocuments with the query before the Mongo query but maybe you have another suggestion ? :)

This library doesn't provide a way to get a total count. I don't think we'd want that - even if it was opt-in-only - given the performance cost of running such query.

@cerinoligutom
Copy link

Any updates on this?

@ayrtonvwf
Copy link

Forked the project to create a PR for this, but npm install is showing me this error:
image
@bradvogel any tips on how to make it work?

@ayrtonvwf
Copy link

For a project of mine we're sorting by the createdAt field, so the document cursor encoding is being done like that:

image

@bradvogel any tips on how to re-use some internal functions to implement it in a generic way?

@ayrtonvwf
Copy link

I'm implementing a POC based on this logic:
image

First I implemented a generic cursor builder function:
image

My guess is that in future we could replace that first logic with this function:
image

Or maybe removing these if's at all, using the cursors already computed for the first and last result:
image

What do you think about this approach @bradvogel?

I still got no chance to test it because of the npm install error though.

@bradvogel
Copy link
Contributor

@ayrtonvwf

Thanks for letting us know about the issue npm installing. That should now be fixed.

Can you share more about what you're trying to do? Are you putting a cursor on each document returned?

@ayrtonvwf
Copy link

@bradvogel thanks, npm install now is working fine!

I don't think putting a cursor on each document returned would be the best approach, that could introduce some hard to debug issues, but I think it's a good starting point as a prototype to show it working.

The optimal approach would be instead of returning an array of documents, return an array of results which each result would include the document and some metadata, including the cursor.

[
    {
        "cursor": "string",
        "document": {
            "_id": "string"
        }
    }, {
        "cursor": "string",
        "document": {
            "_id": "string"
        }
    }
]

That's the approach seen in Relay Connections specification, so a good slice of the community would be already familiar with it. The cursor of course would be the first metadata, but I can imagine it growing in the future with other requirements like a score property (like Elasticsearch's). This approach wouldn't be backwards compatible though, and I feel like maintaining
two similar set of functions just to keep compatibility would lead to some headaches.

Another valid approach would be returning a separated array of cursors, where the relationship would be maintained with the indexes of the documents on the result:

{
    "results": [
        { "_id": "string" },
        { "_id": "string" }
    },
    "cursors": [
        "string",
        "string"
    ]
}

I see some friction with the developer experience here, but this approach would be backwards compatible and wouldn't mix the cursor with the document fields.

I'm looking forward to discuss the options here, I would love to see other's considerations about each approach:

  • document cursor injection: inject the cursor inside the document
  • result object: create one result object per document returned, containing the document itself and the cursor as metadata
  • cursor list: return a list of cursors maintaining the relationship with the list of documents based on the index of the arrays

@ayrtonvwf
Copy link

ayrtonvwf commented Dec 8, 2020

I'm trying to run the tests, but when I run npm run test I get the following output where it seems to be downloading MongoDB:

image

I do have a freshly intalled MongoDB instance runing on my machine available at localhost on the default port, as pointed in the readme.

After 5 seconds the tests seem to timeout:
image

@ayrtonvwf
Copy link

Added the POC of this feature, waiting for guidance on which approach to take on this feature.

@bradvogel
Copy link
Contributor

Re: mongodb installing when running npm test. Strange. I'm not seeing that. My repro steps:

git clone [email protected]:mixmaxhq/mongo-cursor-pagination.git
cd mongo-cursor-pagination
npm install
npm test

Perhaps mongodb-memory-server (https://www.npmjs.com/package/mongodb-memory-server) isn't able to cache its version of Mongo.

Re: adding _cursor to each document. Looks like a useful feature. Perhaps make it opt-in via a parameter so we don't need to change the API?

@ayrtonvwf
Copy link

ayrtonvwf commented Dec 10, 2020

Pushed some updates in my PR to include the _cursor property on the results based on a new includeCursor param. It should be working but I still couldn't test it. Tried tweaking the version property of the MongoMemoryServer, it Didn't try to download it again but the results were the same when I:

  • Set MongoMemoryServer to the version of Mongo I was using (4.4.2)
  • Installed Mongo 3.6.21 on my machine (latests 3.6 available)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants