Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sync flag to flatfs #30

Closed
wants to merge 1 commit into from
Closed

Add sync flag to flatfs #30

wants to merge 1 commit into from

Conversation

rht
Copy link
Contributor

@rht rht commented Sep 27, 2015

(should only be enabled for the paranoid, cc: @tv42 and ipfs-inactive/archives#20)

@jbenet
Copy link
Member

jbenet commented Sep 27, 2015

Ugh -- writes to disk should be durable...

@whyrusleeping didn't you find that the syncs weren't the bottleneck?

@rht
Copy link
Contributor Author

rht commented Sep 27, 2015

Ugh -- writes to disk should be durable...

cp doesn't do this [1], and not git by default either [2].

Perf increases by an order magnitude (1.5s -> 0.13s) with 10 1.1MB files, and 6min -> 1.7min with 999 1MB files. Of course 0.13s is still much slower compared to git/cp.

Reproducible test script (or if someone already wrote a benchmark script for git vs cp vs rsync etc) coming soon like @ion1 did in ipfs/kubo#1750.

[1] https://github.com/coreutils/coreutils/search?utf8=%E2%9C%93&q=fsync
[2] ipfs-inactive/archives#20 (comment) copied here for convenience

Also, found this http://git.kernel.org/cgit/git/git.git/tree/Documentation/config.txt#n693:

This is a total waste of time and effort on a filesystem that orders data writes properly, but can be useful for filesystems that do not use journalling (traditional UNIX filesystems) or that only journal metadata and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").

@jbenet
Copy link
Member

jbenet commented Sep 27, 2015

Hm, how do @tv42 @whyrusleeping @cryptix @chriscool feel about this?


As far as IPFS is concerned, maybe we can make this a config setting, or even be able to do it per-command (like ipfs add --no-sync type of thing).

@tv42
Copy link
Contributor

tv42 commented Sep 28, 2015

I feel comparing cp to something providing services to other hosts is apples to oranges wrt crash safety.

The git config option seems to be only about loose objects; there still is an fsync_or_die in e.g. pack creation -- that is, there is an fsync for pushed data. Git is fast because of a series of smart design decisions, not because it's unsafe.

Would like to hear what @whyrusleeping has to say about his experiments.

@jbenet
Copy link
Member

jbenet commented Sep 28, 2015

I feel comparing cp to something providing services to other hosts is apples to oranges wrt crash safety.

👍

@rht
Copy link
Contributor Author

rht commented Sep 28, 2015

I feel comparing cp to something providing services to other hosts is apples to oranges wrt crash safety.

ipfs add is used under several conditions, some of which intersects with cp (offline add where the blocks being added don't have to be streamed live), while some others (distributed db on ipfs) don't.

Ok the fsync flag in git is only used in loose object writes.
Maybe fsync is not git's bottleneck (since its perf ~ cp perf), but it is for flatfs.
edit: but git is fast even with fsync.

@chriscool
Copy link

When writing many files or creating many directories, I think it is probably overkill to fsync after each write. It might be a good idea to sync when all the writes are done though.

@chriscool
Copy link

I mean that it is impossible anyway to prevent crashes from happening while writing files (or between writing files), but it could be a good idea to sync after writing all the stuff and before advertising it on the network, to make sure that everything advertised has been writen on disk at one point. Though it is of course impossible to ensure that a disk crash will not wipe out the content anyway.

@jbenet
Copy link
Member

jbenet commented Sep 28, 2015

@chriscool yeah, i believe that's what @whyrusleeping's attempt did previously -- fsync at the and of all writes. would that be good enough for you rht?

also if we had a write-ahead log style arena-storage repo, it might work well.

@rht
Copy link
Contributor Author

rht commented Sep 28, 2015

The current implementation is already to fsync after all files have been written to tmpfiles, then mv to 'blocks/'.
This PR is meant to make these explicit fsync calls optional.

I have no idea what magic git uses, but I'd opt for no-sync as default, and only do ipfs add --sync for ipdb. My reason: the former case ("offline" batch add) is more common today.

Still relevant today: https://web.archive.org/web/20150315020954/http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/ ?

@jbenet
Copy link
Member

jbenet commented Sep 28, 2015

I think right now, if we add the option to disable fsyncs, it should be an explicit opt in.

I think later, we can move to mmapped files and perhaps a journal. That will likely have a different sync model with different knobs to turn.

}

var _ datastore.Datastore = (*Datastore)(nil)

func New(path string, prefixLen int) (*Datastore, error) {
func New(path string, prefixLen int, sync bool) (*Datastore, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should keep New as is, setting sync to true and just add another constructor:

NewSync(...) 

with the bool flag option. this prevents changing the interface on whoever depends on this code

Enabled for the paranoid
@rht
Copy link
Contributor Author

rht commented Sep 28, 2015

(assuming sync is disabled.
Still with the 10 1.1MB files test, the bulk of the 0.13s comes from merkledag pb Marshal, in particular the line https://github.com/ipfs/go-ipfs/blob/master/merkledag/pb/merkledag.pb.go#L557 (~100ms).
One option is to use unsafe_marshaler_all in the protobuf generation.
The doc says, https://github.com/gogo/protobuf/blob/master/plugin/marshalto/marshalto.go#L44, but currently checking how fast it could be.
e.g. syscall.Read in zsyscall_linux_*.go opts to use unsafe pointer.
)

we can move to mmapped files and perhaps a journal.

Mind to elaborate?
For the latter, what if the host fs is already journaled?
Here is another compromise: run this once after all the files have been written to tmpfiles https://golang.org/pkg/syscall/#Sync.

@cryptix
Copy link
Contributor

cryptix commented Sep 28, 2015

If this is about ipfs add performance and not a consistency issue, it feels like a band-aid patch to me.

@rht
Copy link
Contributor Author

rht commented Sep 28, 2015

@cryptix the per-file sync is the most significant bottleneck. Even if add perf issue were to be addressed later, it will have to go down first.

@chriscool
Copy link

Here is another compromise: run this once after all the files have been written to tmpfiles
https://golang.org/pkg/syscall/#Sync.

Yeah, this might be worth trying. It should just call: https://www.cl.cam.ac.uk/cgi-bin/manpage?2+sync and might be more efficient than many calls to fsync

@rht
Copy link
Contributor Author

rht commented Sep 28, 2015

(This still does not guarantee data integrity: modern disks have large caches.)

With syscall.Sync(): 1.3s -> ~0.9s.

@chriscool
Copy link

@rht did you try with 10 1.1MB files? What about with 999 1MB files?

@jbenet
Copy link
Member

jbenet commented Sep 28, 2015

It would be very useful to have good benchmarks and graphs of all this, across disk types, host fses, and OSes. Maybe we can include this in our thought re golang/build


Sent from Mailbox

On Mon, Sep 28, 2015 at 4:15 PM, Christian Couder
[email protected] wrote:

@rht did you try with 10 1.1MB files? What about with 999 1MB files?

Reply to this email directly or view it on GitHub:
#30 (comment)

@rht
Copy link
Contributor Author

rht commented Sep 29, 2015

10 1MB files (actually it is 1): 1.3s -> 0.9s.
999 1MB files: 6 min -> 1.7.

(making the benchmark. should including adding updated files as well)

@tv42
Copy link
Contributor

tv42 commented Sep 29, 2015

Does that calling code use whyrusleeping's batching feature?

@rht
Copy link
Contributor Author

rht commented Sep 29, 2015

Iirc the batching is used in the Commit() call, since flatfsBatch uses putMany, then yes.
I modified doPut and putMany to use syscall.Read().

(but then again, fsync is required for ipdb, but only for ipdb use case)

@rht
Copy link
Contributor Author

rht commented Nov 6, 2015

What is the decision on the --no-sync flag?
Benchmark is shown in #31, and it hasn't changed much in the latest master ipfs/kubo@b508c23.

Notice that even sqlite has http://www.sqlite.org/pragma.html#pragma_synchronous.

Here is the rationale behind the rare usage of fsync in git: https://lwn.net/Articles/326505/, https://plus.google.com/+JonathanCorbet/posts/JBxiKPe3VXa (+comments).

@rht rht closed this Nov 13, 2015
@rht
Copy link
Contributor Author

rht commented Nov 13, 2015

(duplicate of #32)

@rht rht deleted the syncflag branch November 13, 2015 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants