-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OPFS support #1490
base: main
Are you sure you want to change the base?
Add OPFS support #1490
Conversation
Thanks a lot for this PR! I will have to go properly through it and do a review, but looks a very cool addition |
This PR seems to me addresses 2 separate features, OPFS and support for threads in the example. I think it's clearer to have them land separately, I did a partial cherry-pick of this PR in #1501. OPFS part I would need to more closely review and experiment with, but I will get to it in the next days. |
6772718
to
3ca5b16
Compare
Thanks a lot! I just did a rebase to resolve the conflict in |
@carlopi Hi, checking in on the progress here, any updates? or any assistance i can offer? |
Howdy @dengkunli! We are pretty excited about this capability - thank you! One quick question - does this also enable you to query a file that is located in OPFS? Could this enable a user to drop in a .csv file for example, and then query that file with DuckDB Wasm in a way that persists across browser sessions? If so, mind adding a test to show that sweet feature working? |
Hi @dengkunli, sorry for the delay, work was undergoing in other areas of the code, and this has not gotten yet the required attention. I will send a proper review in the next few days, but this is very interesting to have in duckdb-wasm. |
@Alex-Monahan of course! I've pushed a commit adding tests including:
@carlopi Thanks a lot for taking the time to review ! |
does this PR merged in release version? |
Is there any update on getting this merged/released? We are currently bringing in our own version of the package which is not ideal. I understand you are open source but if the PR is ready then can we get an ETA on merging it please? |
Hi, I am also interested to know the ETA. Thanks |
I am experimenting with this PR, and tried to |
There have been very recent rework of the file system in mainline duckdb, see duckdb/duckdb#11297 or duckdb/duckdb#11343. Those changes of the interface are not compatible with current duckdb-wasm, that will need to be slightly changed to support those new APIs. I do not know, but it's possible that with the newer interface more stuff will work out of the box, or possible some different methods needs to be supported. |
First and foremost: awesome work/library, thank you so much!! On the topic of this PR: would love to use this feature asap, even if it is released as experimental and/or there are upcoming changes that may (or may not) streamline this use-case. A bird in the hand is worth two in the bush? It seems like there's high demand for this feature based on:
🙏 |
Hi @Alex-Monahan . @ravwojdyla and @carlopi Actually we have maintained a DuckDB WASM with OPFS support since last year October. It seems like that it supports the future changes @carlopi mentioned. And our forked version support opening db, attaching db, reading/writing files from OPFS. and it is similar to this PR: import { createOPFSFileHandle } from '@datadocs/duckdb-wasm';
...
const dirHandle = await navigator.storage.getDirectory();
const dbHandle = await createOPFSFileHandle(`attach.db`, dirHandle, {
create: true,
emptyAsAbsent: true,
});
const walHandle = await createOPFSFileHandle(`attach.db.wal`, dirHandle, {
create: true,
emptyAsAbsent: true,
});
...
await duckdbConnection.query('ATTACH 'attach.db'); The repositories at here:
I originally intended to create these support as a PR into DuckDB repository. However, after several months app testing, I found that this PR already exists. |
@carlopi Got it, thanks for the info, I'll provide an update accordingly |
Hi @dengkunli! Do you need any help on getting this over the finish line? |
yes actually, following recent rework of the file system in mainline duckdb, i've work out the mvp & eh build, but not the coi build (cross-origin isolation build that uses pthreads). In coi mode, an error is frequently thrown (not always, but very likely) in pthread worker: the error occurs when js invokes i've spent some time digging in, but could not find the cause, any idea what kind of error it may be? |
@carlopi have you approved this merge or should we work from the branch? |
Based on this article, it may be best to have a separate OPFS DB per web worker and not share it scross browser tabs: |
That would be a huge mistake IMO, for the same reason they didn't do that in the article. Multiple tabs should be able to share the same backing dataset, without duckdb-wasm enforcing a per tab restriction. That being said, while it might be nice for this package to handle concurrency internally, I think it would be preferable to ship something first, with the caveat being that users have to manage concurrency / race conditions themselves. |
any updates on this PR? What's required to land this? |
@dengkunli @derekperkins is there any reason that this isn't yet merged? Happy to pitch in if there's any additional work needed to land this in the next couple weeks. |
gentle bump @carlopi |
If @hangxingliu has a working implementation, and this still requires significant reworking with the v1 filesystem changes, should we look at just upstreaming that? |
Sorry @derekperkins , I just read all changes between our forked version and the latest code in this repo, and confirmed that our implementation is also outdated with the latest DuckDB's version. But the good news is that the migration is not very difficult and we will update our code recently because our app also needs to use the latest version of DuckDB as one of frontend storage solution. But I am not sure does DuckDB official team has any plan to implement this feature. We would use official solution if they have. |
@hangxingliu thanks for the update. If this PR isn't updated by the time you update your fork, you should submit a separate PR. Maybe we can take the best ideas from both. |
@derekperkins upstreaming the changes made by @hangxingliu would be ideal. I'm very interested in having some kind of built-in persistence story, and I think there have been a few others threads on GH about this - would make duckdb-wasm much more powerful! |
gentle bump again @hangxingliu and @derekperkins :) Wondering if anyone is planning on getting these changes upstreamed. Would be a big win! I don't yet have the duckdb expertise to do this without spending some time digging in, so it would be lovely if someone who's more familiar is planning to merge the changes Apologies for all of the comments on the thread. Just would be good to understand if there are plans to make this happen, or if I should assume that it's not on anyone's near-term roadmap. |
Any news? |
I am reviewing this PR right now, trying to remove the needs to change the signature of |
This PR adds OPFS (Origin Private File System) support for duckdb-wasm. With OPFS support, we could persist the database in the browser 🔥
API
To remain simple and intuitive, no new api is added, just pass a
path
starting withopfs://
todb.open
and we'll know we're using OPFS and finish all the preparation work needed:List of Changes
-g
option to enable all DWARF debugging capabilities (before this, we could breakpoint but debug symbol information are missing)duckdb-browser-coi.pthread.worker.ts
override the global onmessage handler but forgot to setstartWorker
in theload
cmd (when missing, the coi worker will not start properly)