merge history db file into a common db file to avoid db file booming #518

lee3164 · 2020-02-28T12:05:24Z

No description provided.

…ming Signed-off-by: lixiaoyu <[email protected]>

brian-brazil · 2020-02-28T14:49:56Z

I don't believe this is safe, nothing stops something else with that pid coming along while this is running. In addition there's nothing stopping this being run concurrently.

lee3164 · 2020-02-29T02:35:10Z

I don't believe this is safe, nothing stops something else with that pid coming along while this is running. In addition there's nothing stopping this being run concurrently.

I think this method should be called when a process is atexit, so, nobody should called it at runtime.
for uwsgi and gunicorn, they both have a hook to run method at worker exit
for safe, if you insist on it, we can use a file lock, but I think process exit is a low probability event.

lee3164 · 2020-02-29T02:48:25Z

#443 this issue has the same problem with me. I used multi process mode a few days and I found the endpoint for scrape latency was becoming more and more slow, so I checked the prometheus_multiproc_dir, and found there were 1000+ db files, because sometimes the worker would exit, so the number of db files increased dramatically

brian-brazil · 2020-03-02T08:03:55Z

We can't make any assumptions about how this code is used, and given it's in a multi-process environment it must avoid any race conditions.

How would this work for gauges?

lee3164 · 2020-03-05T03:01:30Z

We can't make any assumptions about how this code is used, and given it's in a multi-process environment it must avoid any race conditions.

How would this work for gauges?

As you can see in the code, we use the same method to merge metrics as they are being scraped. I have use this commit in my production environment, and it works well, the latency of scrape endpoint has been reduced significantly. We can use file lock to avoid race condition.

lee3164 · 2020-03-05T03:02:36Z

We should point out that this code should only be used in the multiprocess mode

lee3164 · 2020-03-05T03:24:16Z

I have added process lock code which is from django.core.files.locks

lee3164 · 2020-03-05T03:25:44Z

prometheus_client/multiprocess.py

+            MmapedDict(merge_file).close()
+
+    # do merge, here we use the same method to merge
+    metrics = MultiProcessCollector.merge(files + merge_files, accumulate=False)


@brian-brazil here I use the same method to merge metrics

I believe this will cause deadlock: mark_process_dead is already holding lock and calling merge will try to acquire the same lock again.

I don't believe it will -- it looks like lock acquisition via flock is reentrant safe, but unlocking via flock isn't, so I think this will prematurely release the lock after the merge call here, at least on POSIX systems. I don't know what the locking behaviour on NT will be.

Signed-off-by: lixiaoyu <[email protected]>

brian-brazil · 2020-03-05T14:12:24Z

What happens if a process exists, sets a gauge, exits, and a new process comes along with the same pid?

lee3164 · 2020-03-05T15:52:37Z

What happens if a process exists, sets a gauge, exits, and a new process comes along with the same pid?

            for s in metric.samples:
                name, labels, value, timestamp, exemplar = s
                if metric.type == 'gauge':
                    without_pid_key = (name, tuple([l for l in labels if l[0] != 'pid']))
                    if metric._multiprocess_mode == 'min':
                        current = samples_setdefault(without_pid_key, value)
                        if value < current:
                            samples[without_pid_key] = value
                    elif metric._multiprocess_mode == 'max':
                        current = samples_setdefault(without_pid_key, value)
                        if value > current:
                            samples[without_pid_key] = value
                    elif metric._multiprocess_mode == 'livesum':
                        samples[without_pid_key] += value
                    else:  # all/liveall
                        samples[(name, labels)] = value

We have 5 mode for Gauge in multi process mode, all, min, max, liveall, livesum. In livesum and liveall mode, we will delete the died process's db file, in min and max mode, we will compare the old and new value and chose the correct value, in all mode, if this event happend, no matter old code or my commit, the new value will cover the old value.
Process exit and get the same pid is a low probability event.

brian-brazil · 2020-03-05T16:18:28Z

Process exit and get the same pid is a low probability event.

It's a certainty though at scale.

lee3164 · 2020-03-05T16:31:59Z

Process exit and get the same pid is a low probability event.

It's a certainty though at scale.

Yes，Gauge is really a problem in multi process mode, developer should think it carefully. Do you have a good idea to reduce the latency in multiprocess mode when the nums of db files increase?

brian-brazil · 2020-03-05T16:34:47Z

Currently the main approach is avoiding creating lots of files in the first place.

PierreF · 2020-08-12T17:05:35Z

I'm also hitting this issue (uWSGI restarted often, resulting in lots of files which cause /metrics endpoints to become too slow). I'm interested and willing to help to see this PR being merged.

With the exception of test that fail due to:

nested lock acquisition - mark_process_dead and merge took the lock
and labels pid=merge added on gauge in addition to pid=$original_pid. So we could not add this duplicated pid labels or ensure it's added "before".

What block this PR ?

locking seems correct. Merge is only performed on own pid files (no pid reuse issue) and with the lock file
gauges are merged as they already are. We just reads less files.
- On gauges, this PR indeed don't fix the fact that number of gauge metrics will increase... but it's not a regression.

brian-brazil · 2020-08-12T17:42:24Z

locking seems correct. Merge is only performed on own pid files (no pid reuse issue)

That's not the case, mark_process_dead runs in a different process.

lee3164 · 2020-08-13T03:41:45Z

I' m using this feature in my production env now and there is no problem at present

lee3164 · 2020-08-13T03:47:36Z

locking seems correct. Merge is only performed on own pid files (no pid reuse issue)

That's not the case, mark_process_dead runs in a different process.

mark_process_dead runs when process is going to exit, and it only do its own merge work, using a lock will make this progress serialized.

PierreF · 2020-08-13T07:24:52Z

In uWSGI case, mark_process_dead will probably run at process exit (so with own PID). At least it's own I did it (using Python atexit handler). But in Gunicorn, mark_process_dead will be called by master process for another PID.

This indeed open a (small) race-condition: if mark_process_dead is run with the PID of a terminated worker AND a new worker start using the same PID. The lock won't help there, since the new worker kept an open file descriptor to the file that mark_process_dead will remove.

This should however be quite rare as it require the OS to re-assign the PID of a recently terminated process. But nothing forbid this to happen.

PierreF · 2020-08-13T07:38:24Z

BTW, this race-condition is not new. Gauge livesum and liveall already had this issue (new worker open the file, then mark_process_dead run by master remove them). Or did I miss something that avoid this race condition which could be reused ?

lee3164 · 2020-08-13T07:52:13Z

I think we can use pid+process_create_time to name the file, this could avoid 2 process with same pid to have same file name

brian-brazil · 2020-08-13T08:21:30Z

I think we can use pid+process_create_time to name the file, this could avoid 2 process with same pid to have same file name

There's nothing stopping two processes being created and terminated in the same second with the same PID.

(new worker open the file, then mark_process_dead run by master remove them).

True, we should look at fixing that.

lee3164 · 2020-08-13T08:34:41Z

There's nothing stopping two processes being created and terminated in the same second with the same PID.

yes, but I think this is a very very very low probability event. In psutil, I also find a comments related to this problem.

PierreF · 2020-08-13T09:00:13Z

In psutil, the create_time has more precision that in our case (it's a float number of seconds). But I agree that this event look like a very low probability (and in my opinion it's acceptable. maybe the merge should be an opt-in option ?)

The best solution I can think of that don't have "low probability" error is to add a new method that must be called by worker itself at exit, not by the master. It will merge files. User may or not register this method if the merge is wanted or not.

brian-brazil · 2020-08-13T09:02:29Z

to add a new method that must be called by worker itself at exit, not by the master.

I doubt that our users will have that level of control. This needs to work with all forms of multiprocess setups, including gunicorn and multiprocessing - and handling things like a segfault of a child correctly.

lee3164 · 2020-08-13T09:07:23Z

In psutil, the create_time has more precision that in our case (it's a float number of seconds). But I agree that this event look like a very low probability (and in my opinion it's acceptable. maybe the merge should be an opt-in option ?)

The best solution I can think of that don't have "low probability" error is to add a new method that must be called by worker itself at exit, not by the master. It will merge files. User may or not register this method if the merge is wanted or not.

why gunicorn worker cannot register this method to ateixt?

brian-brazil · 2020-08-13T09:18:53Z

Atexit is not reliably generally, plus things might happen before it got registered.

indrif · 2020-09-24T06:48:13Z

I still dont understand how to handle the growing number of db files. When is it safe to remove them? The documentation states that the files should be cleared on every deploy which suggests that they can be removed without any special hesitation unless a process is currently writing to them. So what happens if I remove files? Do I risk losing data? Then how come it works to purge the files on a hard restart? Wouldnt an easy solution be to just have some kind of cleanup cron that just removes old files?

I feel that the documentation regarding this issue is a bit lacking.

brian-brazil · 2020-09-24T08:09:50Z

The documentation says that you should delete them when the process restarts, there's nothing more to it. Anything else is unsafe.

This issue is also not the place for usage questions, please take those to the prometheus-users mailing list.

mhumpula

def collect(self) should be guarded by lock too. The list and subsequent merge can have different view on directory content otherwise.

vearutop · 2021-07-29T22:48:22Z

The issue from 2017 #204 is still not resolved, suggesting this client library is designed to serve metrics with multi-second latency.

I appreciate @brian-brazil is so careful about unlikely race conditions that may happen, but it means this library is in no way production-ready.

maciej-gol · 2022-05-27T13:05:54Z

Have you guys looked into a shared database as storage? For example, local redis or memcached? Their semantics when it comes to missing values align with Prometheus' (new/expired initializes to 0).

The documentation correctly states that in Python, multiprocessing is the dominant solution to handling concurrency. Unfortunately, it doesn't also mention that due to how Python manages memory, another dominant solution is rotating workers after a certain number (+/- jitter) requests have been handled. This causes the orphaned pidfiles issue to be more visible.

The suggested integration (that removes the db pidfile) suffers from inflating counter values. When only one process gets rotated and its' metrics file purged, Prometheus will scrape counters with lower values (by the number stored in the removed file). Consequently, it will assume the new counter value is a full increment. This, usually, is not true. We also have no way to determine how much the counter had increased since the purge (if it's below the previously observed value).

Shared database complicates the deployment slightly (you need to maintain local redis/memcached). For cases where you already have local redis/memcached, this might be a good option, though.

merge exit process db file into a common db file to avoid db file boo…

bebdcf4

…ming Signed-off-by: lixiaoyu <[email protected]>

lee3164 force-pushed the master branch from 6e9dfe9 to bebdcf4 Compare February 28, 2020 12:09

lee3164 commented Mar 5, 2020

View reviewed changes

add file lock to avoid race condition

d99cc66

Signed-off-by: lixiaoyu <[email protected]>

lee3164 force-pushed the master branch from 416842f to d99cc66 Compare March 5, 2020 03:28

brian-brazil mentioned this pull request Jul 24, 2020

High cpu when using Gunicorn with multiprocess #568

Open

add default_process_identifier

5775f30

mhumpula reviewed May 12, 2021

View reviewed changes

csmarchbanks mentioned this pull request Sep 22, 2021

Create multiprocess files with process uuid #694

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge history db file into a common db file to avoid db file booming #518

merge history db file into a common db file to avoid db file booming #518

lee3164 commented Feb 28, 2020

brian-brazil commented Feb 28, 2020

lee3164 commented Feb 29, 2020

lee3164 commented Feb 29, 2020

brian-brazil commented Mar 2, 2020

lee3164 commented Mar 5, 2020

lee3164 commented Mar 5, 2020

lee3164 commented Mar 5, 2020

lee3164 Mar 5, 2020

mhumpula Apr 23, 2021

chawco May 15, 2024

brian-brazil commented Mar 5, 2020

lee3164 commented Mar 5, 2020

brian-brazil commented Mar 5, 2020

lee3164 commented Mar 5, 2020

brian-brazil commented Mar 5, 2020

PierreF commented Aug 12, 2020

brian-brazil commented Aug 12, 2020

lee3164 commented Aug 13, 2020

lee3164 commented Aug 13, 2020

PierreF commented Aug 13, 2020

PierreF commented Aug 13, 2020 •

edited

Loading

lee3164 commented Aug 13, 2020

brian-brazil commented Aug 13, 2020

lee3164 commented Aug 13, 2020

PierreF commented Aug 13, 2020

brian-brazil commented Aug 13, 2020

lee3164 commented Aug 13, 2020

brian-brazil commented Aug 13, 2020

indrif commented Sep 24, 2020 •

edited

Loading

brian-brazil commented Sep 24, 2020

mhumpula left a comment

vearutop commented Jul 29, 2021

maciej-gol commented May 27, 2022 •

edited

Loading

merge history db file into a common db file to avoid db file booming #518

Are you sure you want to change the base?

merge history db file into a common db file to avoid db file booming #518

Conversation

lee3164 commented Feb 28, 2020

brian-brazil commented Feb 28, 2020

lee3164 commented Feb 29, 2020

lee3164 commented Feb 29, 2020

brian-brazil commented Mar 2, 2020

lee3164 commented Mar 5, 2020

lee3164 commented Mar 5, 2020

lee3164 commented Mar 5, 2020

lee3164 Mar 5, 2020

Choose a reason for hiding this comment

mhumpula Apr 23, 2021

Choose a reason for hiding this comment

chawco May 15, 2024

Choose a reason for hiding this comment

brian-brazil commented Mar 5, 2020

lee3164 commented Mar 5, 2020

brian-brazil commented Mar 5, 2020

lee3164 commented Mar 5, 2020

brian-brazil commented Mar 5, 2020

PierreF commented Aug 12, 2020

brian-brazil commented Aug 12, 2020

lee3164 commented Aug 13, 2020

lee3164 commented Aug 13, 2020

PierreF commented Aug 13, 2020

PierreF commented Aug 13, 2020 • edited Loading

lee3164 commented Aug 13, 2020

brian-brazil commented Aug 13, 2020

lee3164 commented Aug 13, 2020

PierreF commented Aug 13, 2020

brian-brazil commented Aug 13, 2020

lee3164 commented Aug 13, 2020

brian-brazil commented Aug 13, 2020

indrif commented Sep 24, 2020 • edited Loading

brian-brazil commented Sep 24, 2020

mhumpula left a comment

Choose a reason for hiding this comment

vearutop commented Jul 29, 2021

maciej-gol commented May 27, 2022 • edited Loading

PierreF commented Aug 13, 2020 •

edited

Loading

indrif commented Sep 24, 2020 •

edited

Loading

maciej-gol commented May 27, 2022 •

edited

Loading