Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call stack size exceeded with r.args #11

Open
Calavoow opened this issue Jul 14, 2016 · 10 comments
Open

Call stack size exceeded with r.args #11

Calavoow opened this issue Jul 14, 2016 · 10 comments

Comments

@Calavoow
Copy link

Calavoow commented Jul 14, 2016

In the case where we want to update 100k documents by id at once, we want to perform a selection on 100k ids. However, the JavaScript library throws an Error when constructing a getAll(...ids) where ids.length === 1e6:

    RangeError: Uncaught error: Maximum call stack size exceeded
    at /.../node_modules/rethinkdb/ast.js:1087:43
    at Table.RDBVal.getAll (/.../node_modules/rethinkdb/ast.js:1089:7)
    ....

To resolve this error I made a proof of concept modification to the ast.coffee library file that supports such large operations without giving an error. It is available here: https://gist.github.com/Calavoow/714705fa6bdeeb2479af6f2db531be76/. When calling getAll(ids) (note: no rest arguments) the query completes successfully. Thus RethinkDB itself is able to handle this operation, but the JavaScript interface is not.

Of course this poses is an issue for indices that are arrays. But I think it would be a good addition to support large selections so that batch updates do not have to be chunked. Honestly, to us it seems that getAll should rather take indices as elements of an array rather than arguments. Since it maps a list of ids to a list of results. It would be the list analogue of get. I.e. getAll = [ids].map(get). Alternatively, another operator could be introduced requiring the ids as a list. For example a getBatch function.

@danielmewes
Copy link
Member

danielmewes commented Jul 15, 2016

@Calavoow you can already pass an array to getAll, but you have to use the special r.args command:

.getAll(r.args(ids))

The explicit r.args avoids the ambiguity with index values that are arrays.

That being said, I would not recommend passing 100k IDs into a getAll (or any ReQL command) at once. The ReQL implementation was designed with terms receiving only a handful of arguments (maybe 100s) in mind, and I have no idea how it behaves for 100k arguments. It might stall the server for a moment, generate high latencies for other concurrent queries, or require a lot of RAM.

It would be better to split the IDs up into batches of maybe 100-1000 or so on the client-side, and then issuing one such getAll.update(...) at a time.

@danielmewes
Copy link
Member

I'm going to close this since the functionality already exists.
If you think that we should optimize for the case of very large numbers of IDs better, we can open a separate issue for that.

@Calavoow
Copy link
Author

Calavoow commented Jul 15, 2016

@danielmewes Unfortunately, the r.args command does not resolve the issue with the stack overflow. If I insert the ids list with r.args instead of ...ids I still get the same error. I believe internally r.args simply does the same things as ...ids (i.e. call function.apply).

Plus this is an issue that is caused by the way RDBOp in the js library handles its incoming arguments. It expects variadic arguments. And this is precisely not possible if you have 100k arguments. And no matter how the input is transformed, to construct an AST you will still have to call RDBOp to create the operator for the selection query.

@chrisvariety
Copy link

We've found RethinkDB can become unresponsive when doing updates across a large set of documents. As Daniel recommended I would definitely batch on the client side instead of attempting to update all 100,000 at once. I can share some JS code we are using for that if you would find it helpful.

@Calavoow
Copy link
Author

For now I work around this issue by chunking the ids into lists of 10k ids and doing updates with those. From my profiling the more ids the better the performance really. Although marginal at 1k+. But it greatly simplifies some of the query construction code if we can make a query in one step. Instead of working with a list of queries.

@danielmewes
Copy link
Member

@Calavoow r.args should be different on the driver-side from using function.apply. Did you pass an array with the IDs to r.args?

@Calavoow
Copy link
Author

@danielmewes Yes, I did. Here is a minimal example:

> r.db('test').table('test').getAll(r.args(Array(1e6).fill(0)))
RangeError: Maximum call stack size exceeded
    at ...

@danielmewes
Copy link
Member

Hmm interesting. It's possible that we can change something in the JavaScript driver to make this work. Let me rename the issue and re-open it...

@danielmewes danielmewes changed the title Support for batch selections Call stack size exceeded with r.args Jul 18, 2016
@danielmewes
Copy link
Member

Renamed from "Support for batch selections" to "Call stack size exceeded with r.args"

@danielmewes
Copy link
Member

Putting into backlog for now because passing a very high number of arguments into a single getAll call isn't currently recommended for other reasons anyway.

@danielmewes danielmewes reopened this Jul 18, 2016
@gabor-boros gabor-boros transferred this issue from rethinkdb/rethinkdb May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants