Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pg17 #635

Merged
merged 1 commit into from
Nov 21, 2024
Merged

Support pg17 #635

merged 1 commit into from
Nov 21, 2024

Conversation

kathia-barahona
Copy link
Contributor

@kathia-barahona kathia-barahona commented Nov 19, 2024

About this change - What it does

Includes necessary changes for supporting PostgreSQL 17.

This PR updates the behavior of pg_walfile_name based on changes introduced in PostgreSQL 17 (PG17). Instead of calling `(pg_current_lsn), the function now uses the LSN obtained from the result of pg_backup_stop.

Why this way

For versions earlier than PG17, pg_walfile_name returns the previous WAL file name in certain edge cases, such as when the backup’s ending LSN falls on a segment boundary. This issue was resolved in PG17, making the get_backup_end_segment_and_time function largely obsolete for newer versions. As a result, for PG17 and later, backup end segments now rely directly on the pg_backup_stop result.

Note on Testing
While an edge case test was considered (where the backup’s ending LSN falls on a boundary), it was ultimately not included. This test requires a fresh PostgreSQL service to reliably reproduce the scenario. However, in shared pipeline environments, where other tests may have already committed transactions, reproducing the issue is highly inconsistent. Therefore, the test was deemed impractical for inclusion.

class TestPGBaseBackup:
        def test_backup_end_segment_and_time(self, db, pg_version: str):
        conn = db.connect()
        conn.autocommit = True
        cursor = conn.cursor()

        # Force a WAL switch to guarantee that the next LSN is at the start of a new WAL segment
        cursor.execute("SELECT pg_switch_wal();")

        # Generate enough activity to ensure we approach a segment boundary (16 MB)
        cursor.execute("DROP TABLE IF EXISTS wal_test;")
        cursor.execute("CREATE TABLE wal_test (id serial, data text);")
        for _ in range(1024):
            cursor.execute("INSERT INTO wal_test (data) SELECT 'x' || repeat('y', 1024) FROM generate_series(1, 1024);")
        cursor.execute("CHECKPOINT;")
        cursor.execute("SELECT pg_switch_wal();")

        # basic PGBaseBackup, just use backup_end_segment_and_time for noticing discrepancies among versions
        # when LSN falls on a boundary

        pgb = PGBaseBackup(
            config=SIMPLE_BACKUP_CONFIG,
            site="foosite",
            connection_info=None,
            basebackup_path=None,
            compression_queue=None,
            storage=None,
            transfer_queue=None,
            metrics=metrics.Metrics(statsd={}),
            pg_version_server=int(pg_version) * 10000,
        )

        basebackup_name = "test_backup"

        # Manually start and stop backup (could be better, but just need to test basic behavior)
        # pylint: disable=protected-access
        pgb._start_backup(cursor, basebackup_name)
        stop_backup_result = pgb._stop_backup(cursor)

        expected_segment = self._calculate_wal_segment(lsn=stop_backup_result.lsn)

        # for >=PG17 must rely on stop_backup_result LSN, not on current_lsn. For demo this,
        # will change pg_version_server to PG16 and show what will happen if fix in pg_walfile_name is not considered
        if int(pg_version) >= 17:
            pgb.pg_version_server = 160000
            result_end_segment, _ = pgb.get_backup_end_segment_and_time(stop_backup_result, conn)
            assert result_end_segment != expected_segment  # relying on steps for <PG17, is not accurate.

            # set back and test again
            pgb.pg_version_server = int(pg_version) * 10000

        result_end_segment, _ = pgb.get_backup_end_segment_and_time(stop_backup_result, conn)
        assert result_end_segment == expected_segment

    def _calculate_wal_segment(self, lsn: str, timeline: str = "00000001") -> str:
        # Extract XLog ID and Offset from LSN (format: XLogID/Offset)
        xlog_id, xlog_offset = map(lambda x: int(x, 16), lsn.split("/"))

        # Calculate the WAL segment number
        segment_number = (xlog_id << 32 | xlog_offset) // 0x01000000

        # Format the segment number into XLog ID and Offset
        xlog_id_formatted = f"{segment_number >> 32:08X}"
        xlog_offset_formatted = f"{segment_number & 0xFFFFFFFF:08X}"

        return f"{timeline}{xlog_id_formatted}{xlog_offset_formatted}"

Resolves: #BF-2568

UPDATE:

Decided to remove changes in get_backup_end_segment_and_time for considering changes in pg_walfile_name for PG17, was discussed offline and we agreed to keep old behavior rather than having two different ways for getting the end segment.

@kathia-barahona kathia-barahona requested a review from a team November 19, 2024 17:10
@kathia-barahona kathia-barahona force-pushed the support_pg17 branch 13 times, most recently from 471516b to 0f83d6b Compare November 19, 2024 22:37
In PostgreSQL versions prior to 17, the `pg_walfile_name` function could
return the previous WAL file if the provided LSN fell at the boundary between WAL segments.
This was fixed in PG17, and now gets the actual wal file name instead. This change affects
the way we are retrieving the backup_end_wal_segment for the backups metadata.
Therefore, instead of relying on our old behavior, we are now using the LSN yield by
pg_backup_stop.
@rdunklau rdunklau merged commit def54ec into main Nov 21, 2024
6 checks passed
@rdunklau rdunklau deleted the support_pg17 branch November 21, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants