-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sync flag to flatfs #30
Conversation
Ugh -- writes to disk should be durable... @whyrusleeping didn't you find that the syncs weren't the bottleneck? |
Perf increases by an order magnitude (1.5s -> 0.13s) with 10 1.1MB files, and 6min -> 1.7min with 999 1MB files. Of course 0.13s is still much slower compared to git/cp. Reproducible test script (or if someone already wrote a benchmark script for git vs cp vs rsync etc) coming soon like @ion1 did in ipfs/kubo#1750. [1] https://github.com/coreutils/coreutils/search?utf8=%E2%9C%93&q=fsync
|
Hm, how do @tv42 @whyrusleeping @cryptix @chriscool feel about this? As far as IPFS is concerned, maybe we can make this a config setting, or even be able to do it per-command (like |
I feel comparing The git config option seems to be only about loose objects; there still is an Would like to hear what @whyrusleeping has to say about his experiments. |
👍 |
Ok the fsync flag in git is only used in loose object writes. |
When writing many files or creating many directories, I think it is probably overkill to fsync after each write. It might be a good idea to sync when all the writes are done though. |
I mean that it is impossible anyway to prevent crashes from happening while writing files (or between writing files), but it could be a good idea to sync after writing all the stuff and before advertising it on the network, to make sure that everything advertised has been writen on disk at one point. Though it is of course impossible to ensure that a disk crash will not wipe out the content anyway. |
@chriscool yeah, i believe that's what @whyrusleeping's attempt did previously -- fsync at the and of all writes. would that be good enough for you rht? also if we had a write-ahead log style arena-storage repo, it might work well. |
The current implementation is already to fsync after all files have been written to tmpfiles, then I have no idea what magic git uses, but I'd opt for no-sync as default, and only do Still relevant today: https://web.archive.org/web/20150315020954/http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ ? |
I think right now, if we add the option to disable fsyncs, it should be an explicit opt in. I think later, we can move to mmapped files and perhaps a journal. That will likely have a different sync model with different knobs to turn. |
} | ||
|
||
var _ datastore.Datastore = (*Datastore)(nil) | ||
|
||
func New(path string, prefixLen int) (*Datastore, error) { | ||
func New(path string, prefixLen int, sync bool) (*Datastore, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should keep New
as is, setting sync to true
and just add another constructor:
NewSync(...)
with the bool flag option. this prevents changing the interface on whoever depends on this code
Enabled for the paranoid
(assuming sync is disabled.
Mind to elaborate? |
If this is about |
@cryptix the per-file sync is the most significant bottleneck. Even if add perf issue were to be addressed later, it will have to go down first. |
Yeah, this might be worth trying. It should just call: https://www.cl.cam.ac.uk/cgi-bin/manpage?2+sync and might be more efficient than many calls to fsync |
With |
@rht did you try with 10 1.1MB files? What about with 999 1MB files? |
It would be very useful to have good benchmarks and graphs of all this, across disk types, host fses, and OSes. Maybe we can include this in our thought re golang/build — On Mon, Sep 28, 2015 at 4:15 PM, Christian Couder
|
10 1MB files (actually it is 1): 1.3s -> 0.9s. (making the benchmark. should including adding updated files as well) |
Does that calling code use whyrusleeping's batching feature? |
Iirc the batching is used in the (but then again, fsync is required for ipdb, but only for ipdb use case) |
What is the decision on the Notice that even sqlite has http://www.sqlite.org/pragma.html#pragma_synchronous. Here is the rationale behind the rare usage of fsync in git: https://lwn.net/Articles/326505/, https://plus.google.com/+JonathanCorbet/posts/JBxiKPe3VXa (+comments). |
(duplicate of #32) |
(should only be enabled for the paranoid, cc: @tv42 and ipfs-inactive/archives#20)