Skip to content
This repository has been archived by the owner on Aug 13, 2023. It is now read-only.

High disk IO when running WHO query #9

Open
RX14 opened this issue Nov 16, 2016 · 5 comments
Open

High disk IO when running WHO query #9

RX14 opened this issue Nov 16, 2016 · 5 comments

Comments

@RX14
Copy link

RX14 commented Nov 16, 2016

When a WHO query is run through znc, for the entire time who responses are being received, ~5mbps of disk IO happens.

strace shows this (who on a channel with 975 members):

$ strace -c -p 23199
Process 23199 attached - interrupt to quit
^CProcess 23199 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.04    0.028303           0    584939           write
  1.40    0.000407           2       247        16 open
  1.03    0.000300           1       231           close
  0.42    0.000122           1       149           poll
  0.07    0.000020           0        56           munmap
  0.05    0.000015           0       477           read
  0.00    0.000000           0       291        16 stat
  0.00    0.000000           0       112           fstat
  0.00    0.000000           0        56           lseek
  0.00    0.000000           0        56           mmap
  0.00    0.000000           0       350           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00    0.029167                586964        32 total

The culprit seems to be an open("/data/znc/znc-data//users/RX14/networks/esper/moddata/clientbuffer/.registry", O_WRONLY|O_CREAT|O_TRUNC, 0600), followed by a crazy amount of write calls (15433 for a who on a channel with only 7 members).

See znc/znc#1275 for more details.

@michalrus
Copy link

This is indeed insane, when you look at the code. =) E.g. for each entry in the buffer, when playing it back, we’re reading the last seen timestamp again and again…

@michalrus
Copy link

michalrus commented Dec 8, 2016

We’d be way better off if all the data was kept in memory with occasional (even as often once a second) dump to disk.

@psychon
Copy link

psychon commented Dec 9, 2016

I'd like to mention the optional third parameter for SetNV(sName, sValue, bWriteToDisk). If you set it to false, ZNC will not rewrite the .registry-file.
https://github.com/znc/znc/blob/a9a7f17910471482640132a5c22cb037bd12a7cc/include/znc/Modules.h#L1156-L1157

You could then always call SetNV() with bWriteToDisk=false and at the same time start a timer (of course, don't start a new timer when one is already running). When the timer fires, it calls SaveRegistry() on the module.
https://github.com/znc/znc/blob/a9a7f17910471482640132a5c22cb037bd12a7cc/include/znc/Modules.h#L1154

(Does it make sense to make this change in ZNC instead, so that it always writes .registry-files in such a delayed way?)

Edit: Oh and ZNC does already keep all the NV-data in memory. :-)

@michalrus
Copy link

@psychon, awesome! ❤️

So this is trivial to fix (cc @RX14):

diff --git a/clientbuffer.cpp b/clientbuffer.cpp
index 7a9079a..925ae7c 100644
--- a/clientbuffer.cpp
+++ b/clientbuffer.cpp
@@ -29,6 +29,21 @@
 #error The clientbuffer module requires ZNC version 1.6.0 or later.
 #endif

+class CClientBufferCacheJob : public CTimer
+{
+public:
+    CClientBufferCacheJob(CModule* pModule, unsigned int uInterval, unsigned int uCycles, const CString& sLabel, const CString& sDescription)
+        : CTimer(pModule, uInterval, uCycles, sLabel, sDescription) {}
+    virtual ~CClientBufferCacheJob() {}
+
+protected:
+    virtual void RunJob();
+};
+
+void CClientBufferCacheJob::RunJob() {
+    GetModule()->SaveRegistry();
+}
+
 class CClientBufferMod : public CModule
 {
 public:
@@ -38,6 +53,7 @@ public:
         AddCommand("AddClient", static_cast<CModCommand::ModCmdFunc>(&CClientBufferMod::OnAddClientCommand), "<identifier>", "Add a client.");
         AddCommand("DelClient", static_cast<CModCommand::ModCmdFunc>(&CClientBufferMod::OnDelClientCommand), "<identifier>", "Delete a client.");
         AddCommand("ListClients", static_cast<CModCommand::ModCmdFunc>(&CClientBufferMod::OnListClientsCommand), "", "List known clients.");
+        AddTimer(new CClientBufferCacheJob(this, 60 /* sec */, 0, "ClientBufferCache", "Periodically save ClientBuffer registry to disk"));
     }

     void OnAddClientCommand(const CString& line);
@@ -211,7 +227,7 @@ CModule::EModRet CClientBufferMod::OnPrivBufferPlayLine2(CClient& client, CStrin
 // returns true if the new client was successfully written to disk
 bool CClientBufferMod::AddClient(const CString& identifier)
 {
-    return SetNV(identifier, "");
+    return SetNV(identifier, "", false);
 }

 bool CClientBufferMod::DelClient(const CString& identifier)
@@ -224,7 +240,7 @@ bool CClientBufferMod::DelClient(const CString& identifier)
     }
     bool success = true;
     for (const CString& key : keys)
-        success &= DelNV(key);
+        success &= DelNV(key, false);
     return success;
 }

@@ -270,7 +286,7 @@ bool CClientBufferMod::SetTimestamp(const CString& identifier, const CString& ta
 {
     char timestamp[32];
     std::snprintf(timestamp, 32, "%ld.%06ld", tv.tv_sec, tv.tv_usec);
-    return SetNV(identifier + "/" + target, timestamp);
+    return SetNV(identifier + "/" + target, timestamp, false);
 }

 timeval CClientBufferMod::GetTimestamp(const CString& identifier, const CString& target)

@psychon, yes, I would argue that this seems like a common pattern and it should be abstracted away to the ZNC core. A good flow might be:

  1. Change the default value of bWriteToDisk to false.
  2. Read the NV registry in CModule’s constructor (already happens).
  3. Since the destructor (with SaveRegistry) might not always be called (when not exiting cleanly), have a default of how often we dump the registry to the disk—if it is needed—say, once every 15 seconds. But, perhaps, let the author change it.
  4. During SetNV(bWriteToDisk = true) (and DelNV(true), and ClearNV(true)), since the module author wants to write right away, do write right away.
  5. During SetNV(bWriteToDisk = false) (the default), check if the last write to disk happened more than 15 seconds ago, and if so, write right away, and reschedule the automatic timer to now() + 15 seconds. If it happened during the last 15 seconds, save it only in memory, don’t touch the disk.
  6. In the automatic more-or-less-every-15-seconds timer, SaveRegistry to the disk — ONLY if it was changed since the last SaveRegistry.

This minimizes IO and benefits all.

There’s this slight downside: if a module tries to save something with the new default bWriteToDisk == false, and the last SaveRegistry happened less than 15 seconds ago, and during those 15 seconds ZNC exits uncleanly (e.g. power loss or SIGKILL), this last write will be lost. These conditions seem improbable/rare enough to justify implementing this, given the benefits. (Keep in mind, that module authors that want to save crucial data, can call SetNV with bWriteToDisk to be sure it gets written right away and with no potential data loss).

I’ll copy this to the ZNC repo.

@psychon, as you clearly know ZNC’s innards, could you, please, also take a look at #7 (comment) ?

MAGICCC added a commit to MAGICCC/znc-clientbuffer that referenced this issue Feb 23, 2017
BtbN added a commit to BtbN/znc-clientbuffer that referenced this issue Aug 8, 2017
BtbN added a commit to BtbN/znc-clientbuffer that referenced this issue Aug 8, 2017
@CyberShadow
Copy link

@michalrus Thanks! I've improved on your patch (which @BtbN incorporated as a commit) by only flushing when we edited the registry. I agree that making this the standard behaviour for ZNC in general would be much better, though.

BtbN pushed a commit to BtbN/znc-clientbuffer that referenced this issue May 15, 2020
makefile: respect PREFIX, LIBDIR and DESTDIR
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants