-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a synchronous api #94
Comments
There are two parts to this:
Query execution can be long, I don't think that starving the event loop for such massive amounts of time is a good thing. But also, I don't see how doing so would "vastly" improve performance, since you're scheduling the long running operation to be processed outside the main thread only once. |
The gains are mostly in the case where you exchange lots of successive
queries with the database ; for instance hundreds or thousand consecutive
updates that depend on the result of select queries.
Look at the benchmarks (and what is being benchmarked) by the
better-sqlite3 library : https://github.com/JoshuaWise/better-sqlite3
The way I look at it is that the asynchronous API is really only useful
- if you have long running queries and actual stuff to do while waiting for
the result (usually not the case when scripting stuff)
- you have the duckdb as a backend to a web application (or any server,
really), which is not the *intended* purpose of duckdb, though it can be
used as such
I think asynchronous is *useful* but probably not the main approach I'd be
using when dealing with a database meant for processing data and not simply
storing it.
I have scripts that handle big loads of data but that don't process
anything in parallel. In that case, I don't mind blocking the process since
anyways it would just be one async function waiting on the result of
execute.
I also have scripts that do a bunch of individual updates in quick
sequence, in which case I do appreciate the speedup resulting from not
deferring on to the event loop an operation that is almost instant since
everything is local.
Anyways, I think there is a place for both approaches, and I'm also pretty
certain that a synchronous access to a local database is much more
efficient in a lot of scenarios (if not most) this particular database will
end up being used in.
…On Wed, May 19, 2021 at 6:43 PM Rostislav Provodenko < ***@***.***> wrote:
There are two parts to this:
- the execute call which indeed causes duckdb to perform the query in
another thread, unblocking the event loop
- and the fetch call (which the streaming API uses) which retrieves
data from duckdb and this happens on the main thread
Query execution can be long, I don't think that starving the event
loop for such massive amounts of time is a good thing. But also, I don't
see how doing so would "vastly" improve performance, since you're
scheduling the long running operation to be processed outside the main
thread only once.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#94 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFUBIJQQ2TADNDNSQLPMA3TOPTEPANCNFSM446CLEBQ>
.
--
Christophe Eymard
|
It would be interesting to see a benchmark, because the situation is a little different for this library in comparison to Sqlite (sqlite and sqlite3), especially for multiple row fetches. Sqlite (sqlite and sqlite3) does an async call for every row it fetches, whereas node-duckdb only does an async call for the |
As the node.js driver better-sqlite3 shows, when using in-process databases that do little to no IO, performances are vastly improved when using the database synchronously instead of having everything go through the event loop.
Since duck db is pretty focused on analytics, I actually doubt that this driver will be used in many other settings than a single user process crunching numbers.
Maybe it would be a good idea to add a synchronous way of speaking with duckdb ?
The text was updated successfully, but these errors were encountered: