-
Notifications
You must be signed in to change notification settings - Fork 63
Performance notes are wrong #6
Comments
Well, when creating a db neo4j is not simply writing data sequentially to the disk so I wouldn't expect it to reach the max throughput. In my tests the disk made a huge difference so I called it the "critical factor" (not "bottleneck"). But thanks for suggesting lbzip2, will add it to the README. |
I only described what I thought was wrong with the description of first step of the importing process, which is read -> regexp -> write (creating intermediate XML file). The second part is still running (7 hours, and it only imported 70M links). I have no idea how you managed to do it in only 10 minutes. I've run jvisualvm, iotop, htop and discovered that at the beginning the process is mostly running read / write operations (org.neo4j.io.fs.StoreFIleChannel.write / read). It creates 50K links per 3 seconds and at that pace the whole thing would take 1 hour and 40 minutes. After a while it starts to run more MuninnPageCache operations (flushAtIORatio, parkUntilEvictionRequired) and slows down significantly. In the first part of the operation the CPU usage was maxed out (95-100% on 4 cores) and the read/write throughput was 10 MB/s and 5 MB/s respectively. Now in the second part the CPU usage is really low (around 10%) write throughput is around 10 MB/s. iostat shows that cpu is mostly waiting on IO or idle avg-cpu: %user %nice %system %iowait %steal %idle I think there is something wrong with caching mechanism in neo4j. What do you think? |
Fair point, I didn't realise you were talking about the first step only. Yes, the second part i.e. creating the graph db is where having an SSD really helps. I haven't really investigated much, but I guess it must be doing a lot of random access operations. |
Hello,
if it took 30 minutes to process 9.1GB file, it means that the throughput was 5,06 MB/s.
(9.1G = 1024 * 9.1 MB = 9100 MB, 9100 / (30 * 60s) = 5,055555556 MB/s
5400 disks have 40 MB/s read / write throughput, so they are not the bottleneck. To speed things up you can use lbzip2 which is multi-threaded (it helped me a lot).
Best regards
The text was updated successfully, but these errors were encountered: