Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEPH-83575437: Test to verify rgw bilog trimming with down OSDs #4263

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

harshkumarRH
Copy link
Contributor

@harshkumarRH harshkumarRH commented Nov 27, 2024

Description

CEPH-83575437: Tier-3 test to verify scrub errors on PGs of RGW metadata pools after osds are rebooted

Before the fix, partial recovery did not set clean_omap to false for CEPH_OSD_OP_OMAPRMKEYRANGE operations. This caused replicated pools with inconsistent PGs due to incomplete omap recovery.

Jira tracker: RHCEPHQE-15945
Customer Bug: 2056818

Test suites modified:

  • suites/squid/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test suites added:

  • suites/reef/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test cases added:

  • tests/rados/test_bilog_trim.py

Steps:
Refer:
- https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c42
- https://bugzilla.redhat.com/show_bug.cgi?id=2056818#c50
1. Deploy minimal RGW multisite setup
2. Add data to both the sites in S3 buckets and let the minimal client IO running
3. Keep running the deep-scrub for index pool PGs
4. Set noout flag
5. stop the secondary OSD on the primary site(site1)
6. run the following command
radosgw-admin bilog autotrim
7. Once the above command finishes, start the primary OSD that was stopped in step 4
8. Re-trigger the deep-scrub for eight PGs in index pool

Logs-
Reef:
Squid:

Signed-off-by: Harsh Kumar [email protected]

Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs

click to expand checklist
  • Create a test case in Polarion reviewed and approved.
  • Create a design/automation approach doc. Optional for tests with similar tests already automated.
  • Review the automation design
  • Implement the test script and perform test runs
  • Submit PR for code review and approve
  • Update Polarion Test with Automation script details and update automation fields
  • If automation is part of Close loop, update BZ flag qe-test_coverage “+” and link Polarion test

@harshkumarRH harshkumarRH added DNM Do Not Merge RADOS Rados Core Tier-3 close-loop-automation customer BZ automated as part of close-loop labels Nov 27, 2024
@harshkumarRH harshkumarRH self-assigned this Nov 27, 2024
@openshift-ci-robot
Copy link

@harshkumarRH: No Jira issue with key CEPH-83575437 exists in the tracker at https://issues.redhat.com/.
Once a valid jira issue is referenced in the title of this pull request, request a refresh with /jira refresh.

In response to this:

Description

CEPH-83575437: Tier-3 test to verify scrub errors on PGs of RGW metadata pools after osds are rebooted

Before the fix, partial recovery did not set clean_omap to false for CEPH_OSD_OP_OMAPRMKEYRANGE operations. This caused replicated pools with inconsistent PGs due to incomplete omap recovery.

Jira tracker: RHCEPHQE-15945
Customer Bug: 2056818

Test suites modified:
- suites/squid/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test suites added:

  • suites/reef/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test cases added:

  • tests/rados/test_bilog_trim.py

Steps:
Refer:

  1. Deploy minimal RGW multisite setup
  2. Add data to both the sites in S3 buckets and let the minimal client IO running
  3. Keep running the deep-scrub for index pool PGs
  4. Set noout flag
  5. stop the secondary OSD on the primary site(site1)
  6. run the following command
    radosgw-admin bilog autotrim
  7. Once the above command finishes, start the primary OSD that was stopped in step 4
  8. Re-trigger the deep-scrub for eight PGs in index pool

Logs-
Reef:
Squid:

Signed-off-by: Harsh Kumar [email protected]

Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs

click to expand checklist
  • Create a test case in Polarion reviewed and approved.
  • Create a design/automation approach doc. Optional for tests with similar tests already automated.
  • Review the automation design
  • Implement the test script and perform test runs
  • Submit PR for code review and approve
  • Update Polarion Test with Automation script details and update automation fields
  • If automation is part of Close loop, update BZ flag qe-test_coverage “+” and link Polarion test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Nov 27, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: harshkumarRH

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

@harshkumarRH: No Jira issue with key CEPH-83575437 exists in the tracker at https://issues.redhat.com/.
Once a valid jira issue is referenced in the title of this pull request, request a refresh with /jira refresh.

In response to this:

Description

CEPH-83575437: Tier-3 test to verify scrub errors on PGs of RGW metadata pools after osds are rebooted

Before the fix, partial recovery did not set clean_omap to false for CEPH_OSD_OP_OMAPRMKEYRANGE operations. This caused replicated pools with inconsistent PGs due to incomplete omap recovery.

Jira tracker: RHCEPHQE-15945
Customer Bug: 2056818

Test suites modified:

  • suites/squid/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test suites added:

  • suites/reef/rgw/tier-2_rgw_rados_multisite_ecpool.yaml

Test cases added:

  • tests/rados/test_bilog_trim.py

Steps:
Refer:

  1. Deploy minimal RGW multisite setup
  2. Add data to both the sites in S3 buckets and let the minimal client IO running
  3. Keep running the deep-scrub for index pool PGs
  4. Set noout flag
  5. stop the secondary OSD on the primary site(site1)
  6. run the following command
    radosgw-admin bilog autotrim
  7. Once the above command finishes, start the primary OSD that was stopped in step 4
  8. Re-trigger the deep-scrub for eight PGs in index pool

Logs-
Reef:
Squid:

Signed-off-by: Harsh Kumar [email protected]

Please include Automation development guidelines. Source of Test case - New Feature/Regression Test/Close loop of customer BZs

click to expand checklist
  • Create a test case in Polarion reviewed and approved.
  • Create a design/automation approach doc. Optional for tests with similar tests already automated.
  • Review the automation design
  • Implement the test script and perform test runs
  • Submit PR for code review and approve
  • Update Polarion Test with Automation script details and update automation fields
  • If automation is part of Close loop, update BZ flag qe-test_coverage “+” and link Polarion test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close-loop-automation customer BZ automated as part of close-loop DNM Do Not Merge RADOS Rados Core Tier-3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants