Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a synchronous api #94

Open
ceymard opened this issue May 15, 2021 · 3 comments
Open

add a synchronous api #94

ceymard opened this issue May 15, 2021 · 3 comments

Comments

@ceymard
Copy link

ceymard commented May 15, 2021

As the node.js driver better-sqlite3 shows, when using in-process databases that do little to no IO, performances are vastly improved when using the database synchronously instead of having everything go through the event loop.

Since duck db is pretty focused on analytics, I actually doubt that this driver will be used in many other settings than a single user process crunching numbers.

Maybe it would be a good idea to add a synchronous way of speaking with duckdb ?

@rprovodenko
Copy link
Contributor

rprovodenko commented May 19, 2021

There are two parts to this:

  • the execute call which indeed causes duckdb to perform the query in another thread, unblocking the event loop
  • and the fetch call (which the streaming API uses) which retrieves data from duckdb and this happens on the main thread

Query execution can be long, I don't think that starving the event loop for such massive amounts of time is a good thing. But also, I don't see how doing so would "vastly" improve performance, since you're scheduling the long running operation to be processed outside the main thread only once.

@ceymard
Copy link
Author

ceymard commented May 19, 2021 via email

@jupiter
Copy link
Contributor

jupiter commented Jul 19, 2021

It would be interesting to see a benchmark, because the situation is a little different for this library in comparison to Sqlite (sqlite and sqlite3), especially for multiple row fetches. Sqlite (sqlite and sqlite3) does an async call for every row it fetches, whereas node-duckdb only does an async call for the execute call (as per @rostislavdeepcrawl's comment above). The async call here allows for parallelism which in the case of lots of single row calls as per your suggestion above might actually be more performant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants