CEPH-83575437: Test to verify rgw bilog trimming with down OSDs #4263

harshkumarRH · 2024-11-27T02:32:36Z

Description

CEPH-83575437: Tier-3 test to verify scrub errors on PGs of RGW metadata pools after osds are rebooted

Before the fix, partial recovery did not set clean_omap to false for CEPH_OSD_OP_OMAPRMKEYRANGE operations. This caused replicated pools with inconsistent PGs due to incomplete omap recovery.

Jira tracker: RHCEPHQE-15945
Customer Bug: 2056818

Test suites modified:

suites/squid/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test suites added:

suites/reef/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test cases added:

tests/rados/test_bilog_trim.py

Steps:
Refer:
- https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c42
- https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c50
1. Deploy minimal RGW multisite setup
2. Add data to both the sites in S3 buckets and let the minimal client IO running
3. Keep running the deep-scrub for index pool PGs
4. Set noout flag
5. stop the secondary OSD on the primary site(site1)
6. run the following command
radosgw-admin bilog autotrim
7. Once the above command finishes, start the primary OSD that was stopped in step 4
8. Re-trigger the deep-scrub for eight PGs in index pool

Logs-
Reef:
Squid:

Signed-off-by: Harsh Kumar [email protected]

Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs

click to expand checklist

Create a test case in Polarion reviewed and approved.
Create a design/automation approach doc. Optional for tests with similar tests already automated.
Review the automation design
Implement the test script and perform test runs
Submit PR for code review and approve
Update Polarion Test with Automation script details and update automation fields
If automation is part of Close loop, update BZ flag qe-test_coverage “+” and link Polarion test

openshift-ci-robot · 2024-11-27T02:32:39Z

@harshkumarRH: No Jira issue with key CEPH-83575437 exists in the tracker at https://issues.redhat.com/.
Once a valid jira issue is referenced in the title of this pull request, request a refresh with /jira refresh.

In response to this:

Description

CEPH-83575437: Tier-3 test to verify scrub errors on PGs of RGW metadata pools after osds are rebooted

Before the fix, partial recovery did not set clean_omap to false for CEPH_OSD_OP_OMAPRMKEYRANGE operations. This caused replicated pools with inconsistent PGs due to incomplete omap recovery.

Jira tracker: RHCEPHQE-15945
Customer Bug: 2056818

Test suites modified:
- suites/squid/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test suites added:

suites/reef/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test cases added:

tests/rados/test_bilog_trim.py

Steps:
Refer:

https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c42

https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c50

Deploy minimal RGW multisite setup

Add data to both the sites in S3 buckets and let the minimal client IO running

Keep running the deep-scrub for index pool PGs

Set noout flag

stop the secondary OSD on the primary site(site1)

run the following command
radosgw-admin bilog autotrim

Once the above command finishes, start the primary OSD that was stopped in step 4

Re-trigger the deep-scrub for eight PGs in index pool

Logs-
Reef:
Squid:

Signed-off-by: Harsh Kumar [email protected]

Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs

click to expand checklist

Create a test case in Polarion reviewed and approved.

Create a design/automation approach doc. Optional for tests with similar tests already automated.

Review the automation design

Implement the test script and perform test runs

Submit PR for code review and approve

Update Polarion Test with Automation script details and update automation fields

If automation is part of Close loop, update BZ flag qe-test_coverage “+” and link Polarion test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-11-27T02:32:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harshkumarRH

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2024-11-27T02:33:05Z

@harshkumarRH: No Jira issue with key CEPH-83575437 exists in the tracker at https://issues.redhat.com/.
Once a valid jira issue is referenced in the title of this pull request, request a refresh with /jira refresh.

In response to this:

Description

CEPH-83575437: Tier-3 test to verify scrub errors on PGs of RGW metadata pools after osds are rebooted

Before the fix, partial recovery did not set clean_omap to false for CEPH_OSD_OP_OMAPRMKEYRANGE operations. This caused replicated pools with inconsistent PGs due to incomplete omap recovery.

Jira tracker: RHCEPHQE-15945
Customer Bug: 2056818

Test suites modified:

suites/squid/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test suites added:

suites/reef/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test cases added:

tests/rados/test_bilog_trim.py

Steps:
Refer:

https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c42

https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c50

Deploy minimal RGW multisite setup

Add data to both the sites in S3 buckets and let the minimal client IO running

Keep running the deep-scrub for index pool PGs

Set noout flag

stop the secondary OSD on the primary site(site1)

run the following command
radosgw-admin bilog autotrim

Once the above command finishes, start the primary OSD that was stopped in step 4

Re-trigger the deep-scrub for eight PGs in index pool

Logs-
Reef:
Squid:

Signed-off-by: Harsh Kumar [email protected]

Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs

click to expand checklist

Create a test case in Polarion reviewed and approved.

Create a design/automation approach doc. Optional for tests with similar tests already automated.

Review the automation design

Implement the test script and perform test runs

Submit PR for code review and approve

Update Polarion Test with Automation script details and update automation fields

If automation is part of Close loop, update BZ flag qe-test_coverage “+” and link Polarion test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Signed-off-by: Harsh Kumar <[email protected]>

harshkumarRH added DNM Do Not Merge RADOS Rados Core Tier-3 close-loop-automation customer BZ automated as part of close-loop labels Nov 27, 2024

harshkumarRH requested review from SrinivasaBharath, pdhiran, anrao19, neha-gangadhar, Divya-78, s-vipin and a team November 27, 2024 02:32

harshkumarRH self-assigned this Nov 27, 2024

CEPH-83575437: Test to verify rgw bilog trimming with down OSDs

4bc3e46

Signed-off-by: Harsh Kumar <[email protected]>

harshkumarRH force-pushed the rgw-cla branch from 8140557 to 4bc3e46 Compare November 27, 2024 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CEPH-83575437: Test to verify rgw bilog trimming with down OSDs #4263

CEPH-83575437: Test to verify rgw bilog trimming with down OSDs #4263

harshkumarRH commented Nov 27, 2024 •

edited

Loading

openshift-ci-robot commented Nov 27, 2024

Description

openshift-ci bot commented Nov 27, 2024

openshift-ci-robot commented Nov 27, 2024

Description

CEPH-83575437: Test to verify rgw bilog trimming with down OSDs #4263

Are you sure you want to change the base?

CEPH-83575437: Test to verify rgw bilog trimming with down OSDs #4263

Conversation

harshkumarRH commented Nov 27, 2024 • edited Loading

Description

openshift-ci-robot commented Nov 27, 2024

Description

openshift-ci bot commented Nov 27, 2024

openshift-ci-robot commented Nov 27, 2024

Description

harshkumarRH commented Nov 27, 2024 •

edited

Loading