[Monitoring] Extend TESTCASE_UPLOAD_TRIAGE_DURATION to account for fuzzer generated test cases #4481

vitorguidi · 2024-12-10T00:47:46Z

Motivation

#4364 implemented the tracking for the time it takes clusterfuzz to complete several steps of the manually uploaded testcase lifecycle.

As per Chrome's request, the metric will now contain an 'origin' label, which indicates if the testcase was 'manually_uploaded' or generated by a 'fuzzer'.

The code was also simplified, by reusing the get_age_in_seconds method from the TestCase entity.

Also, it adds the 'stuck_in_triage' boolean field to the testcase entity, to facilitate figuring out what testcases are in a stuck state, so follow up actions can be taken.

Part of #4271

…test cases

letitz · 2024-12-10T11:18:19Z

I'll let @alhijazi review this first. Let me know if you'd still like my review afterwards.

…sbehaving ones

…zzer generated test cases (#4481) [#4364](#4364) implemented the tracking for the time it takes clusterfuzz to complete several steps of the manually uploaded testcase lifecycle. As per Chrome's request, the metric will now contain an 'origin' label, which indicates if the testcase was 'manually_uploaded' or generated by a 'fuzzer'. The code was also simplified, by reusing the get_age_in_seconds method from the TestCase entity. Also, it adds the 'stuck_in_triage' boolean field to the testcase entity, to facilitate figuring out what testcases are in a stuck state, so follow up actions can be taken. Part of #4271

Running CI checks with a PR prior to deployment

jonathanmetzman · 2024-12-16T19:02:35Z

src/clusterfuzz/_internal/common/testcase_utils.py

@@ -31,6 +31,8 @@


 def emit_testcase_triage_duration_metric(testcase_id: int, step: str):
+  '''Finds out if a testcase is fuzzer generated or manually uploaded,


Wrong style docstring.

jonathanmetzman · 2024-12-16T19:03:37Z

src/clusterfuzz/_internal/common/testcase_utils.py

@@ -61,15 +60,30 @@ def emit_testcase_triage_duration_metric(testcase_id: int, step: str):
                 ' failed to emit TESTCASE_UPLOAD_TRIAGE_DURATION metric.')
    return

+  from_fuzzer = not get_testcase_upload_metadata(testcase_id)
+
+  assert step in [


So we can throw an assertion failure if our metrics collection isn't working? I don't think this is a good idea.

+1 to removing this.

jonathanmetzman · 2024-12-16T19:04:06Z

src/clusterfuzz/_internal/common/testcase_utils.py

+                 ' failed to emit TESTCASE_UPLOAD_TRIAGE_DURATION metric.')
+    return
+
+  testcase_age_in_hours = testcase.get_age_in_seconds() / 3600


nit: / (60 * 60)

alhijazi · 2024-12-19T17:33:52Z

src/clusterfuzz/_internal/metrics/monitoring_metrics.py

    bucketer=monitor.GeometricBucketer(),
    field_spec=[
        monitor.StringField('step'),
        monitor.StringField('job'),
+        monitor.StringField('origin'),


As discussed, this can be a boolean field.

alhijazi · 2024-12-19T17:35:19Z

src/clusterfuzz/_internal/datastore/data_types.py

@@ -686,6 +689,8 @@ def get_created_time(self) -> ndb.DateTimeProperty:

  def get_age_in_seconds(self):
    current_time = datetime.datetime.utcnow()
+    if not self.get_created_time():


I don't think this check is needed since get_created_time seems to never return None

alhijazi · 2024-12-20T11:30:26Z

src/clusterfuzz/_internal/common/testcase_utils.py

  logs.info('Emiting TESTCASE_UPLOAD_TRIAGE_DURATION metric for testcase '
-            f'{testcase_id} (age = {elapsed_time_since_upload}) '
+            f'{testcase_id} (age = {testcase_age_in_hours} hours.) '
            'in step {step}.')

  monitoring_metrics.TESTCASE_UPLOAD_TRIAGE_DURATION.add(


The naming of this metric is now inaccurate since it now covers both fuzzed and uploaded testcases, it needs to be changed.
Does BUG_FILING_FROM_TESTCASE_ELAPSED_TIME cover both also?

…stuck in analyze (#4547) ### Motivation We currently have no way to tell if analyze task was successfully executed. The TESTCASE_UPLOAD_TRIAGE_DURATION metric from #4364 would only track duration for tasks that did finish. An analyze_pending field is added to the Testcase entity in datastore, which is set to False by default, to True for manually uploaded testcases, and to False once analyze task postprocess runs. It also increments the UNTRIAGED_TESTCASE_AGE metric from #4381 with a status label, so we can know at what step the testcase is stuck, thus allowing us to alert if analyze is taking longer to finish than expected. The alert itself could be, for instance, P50 age of untriaged testcase (status=analyze_pending) > 3h. Also, this retroactively addresses comments from #4481: * Fixes docstring for emit_testcase_triage_duration_metric * Removes assertions * Renames TESTCASE_UPLOAD_TRIAGE_DURATION to TESTCASE_TRIAGE_DURATION, since it now accounts for fuzzer generated testcases * Use a boolean "from_fuzzer" field, instead of "origin" string, in TESTCASE_TRIAGE_DURATION

Extend testcase triage upload metric to account for fuzzer generated …

f88b11b

…test cases

vitorguidi changed the title ~~Extend TESTCASE_UPLOAD_TRIAGE_DURATION to account for fuzzer generated test cases~~ [Monitoring] Extend TESTCASE_UPLOAD_TRIAGE_DURATION to account for fuzzer generated test cases Dec 10, 2024

vitorguidi requested review from letitz, alhijazi and jonathanmetzman December 10, 2024 00:47

Account for when there is no timestamp on the testcase

aaa8e45

letitz removed their request for review December 10, 2024 11:17

vitorguidi and others added 6 commits December 13, 2024 10:56

Merge branch 'master' into feature/triage-lifecycle-for-fuzzers

3f0644f

Adding a stuck_in_triage field to testcase entity, so we can query mi…

4b86f25

…sbehaving ones

Merge branch 'master' into feature/triage-lifecycle-for-fuzzers

7190553

Fix lint

73d3272

Fix lint

cb8b390

Fix lint again

76d2cff

vitorguidi merged commit 8da25b5 into master Dec 16, 2024
7 checks passed

vitorguidi deleted the feature/triage-lifecycle-for-fuzzers branch December 16, 2024 13:59

vitorguidi added a commit that referenced this pull request Dec 16, 2024

Merge #4499 and #4481 into chrome branch (#4505)

19fea40

Running CI checks with a PR prior to deployment

jonathanmetzman reviewed Dec 16, 2024

View reviewed changes

alhijazi reviewed Dec 19, 2024

View reviewed changes

alhijazi reviewed Dec 20, 2024

View reviewed changes

vitorguidi added a commit that referenced this pull request Dec 23, 2024

Retroactive reviews from #4481

ffb37cc

vitorguidi mentioned this pull request Dec 23, 2024

[Monitoring] Enrich UNTRIAGED_TESTCASE_AGE metric to track testcases stuck in analyze #4547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Monitoring] Extend TESTCASE_UPLOAD_TRIAGE_DURATION to account for fuzzer generated test cases #4481

[Monitoring] Extend TESTCASE_UPLOAD_TRIAGE_DURATION to account for fuzzer generated test cases #4481

vitorguidi commented Dec 10, 2024 •

edited

Loading

letitz commented Dec 10, 2024

jonathanmetzman Dec 16, 2024

jonathanmetzman Dec 16, 2024

alhijazi Dec 20, 2024

jonathanmetzman Dec 16, 2024

alhijazi Dec 19, 2024

alhijazi Dec 19, 2024

alhijazi Dec 20, 2024

		@@ -31,6 +31,8 @@


		def emit_testcase_triage_duration_metric(testcase_id: int, step: str):
		'''Finds out if a testcase is fuzzer generated or manually uploaded,

[Monitoring] Extend TESTCASE_UPLOAD_TRIAGE_DURATION to account for fuzzer generated test cases #4481

[Monitoring] Extend TESTCASE_UPLOAD_TRIAGE_DURATION to account for fuzzer generated test cases #4481

Conversation

vitorguidi commented Dec 10, 2024 • edited Loading

Motivation

letitz commented Dec 10, 2024

jonathanmetzman Dec 16, 2024

Choose a reason for hiding this comment

jonathanmetzman Dec 16, 2024

Choose a reason for hiding this comment

alhijazi Dec 20, 2024

Choose a reason for hiding this comment

jonathanmetzman Dec 16, 2024

Choose a reason for hiding this comment

alhijazi Dec 19, 2024

Choose a reason for hiding this comment

alhijazi Dec 19, 2024

Choose a reason for hiding this comment

alhijazi Dec 20, 2024

Choose a reason for hiding this comment

vitorguidi commented Dec 10, 2024 •

edited

Loading