Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](backup)(cooldown) backup/restore for cooldown data #43642

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

justfortaste
Copy link
Contributor

@justfortaste justfortaste commented Nov 11, 2024

What problem does this PR solve?

https://doc.weixin.qq.com/doc/w3_AGkArQbjACc5rBcdFOxQcaBSzESxW?scode=AJEAIQdfAAoRB9aawYAGkArQbjACc
Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Currently, backup/resore is not supported for cooldown data.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/olap/rowset/beta_rowset.cpp Outdated Show resolved Hide resolved
@justfortaste
Copy link
Contributor Author

run buildall

be/src/olap/snapshot_manager.cpp Outdated Show resolved Hide resolved
be/src/olap/snapshot_manager.cpp Outdated Show resolved Hide resolved
be/src/olap/snapshot_manager.cpp Outdated Show resolved Hide resolved
be/src/olap/rowset/beta_rowset.cpp Outdated Show resolved Hide resolved
be/src/olap/tablet.cpp Outdated Show resolved Hide resolved
be/src/runtime/snapshot_loader.cpp Outdated Show resolved Hide resolved
be/src/runtime/snapshot_loader.cpp Outdated Show resolved Hide resolved
be/src/runtime/snapshot_loader.cpp Outdated Show resolved Hide resolved
@xy720
Copy link
Member

xy720 commented Nov 27, 2024

I suggest that in the first version, let the users specify the resource or policy in restore job, no need to recreate them.

@justfortaste justfortaste force-pushed the backup_cooldown_data_master branch 3 times, most recently from 2fba7fa to 005403a Compare November 29, 2024 04:23
@justfortaste justfortaste requested a review from xy720 November 29, 2024 06:17
be/src/runtime/snapshot_loader.cpp Outdated Show resolved Hide resolved
}

std::vector<std::string> remote_index_files;
RETURN_IF_ERROR(list_segment_inverted_index_file(cold_fs, remote_tablet_path, rowset_id,
Copy link
Member

@xy720 xy720 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we don't need to list remote files here. We can get index_id in rowset meta, index_version in tablet schema. And use remote_idx_v1_path/remote_idx_v2_path function in storage_policy.h to get the remote path.

be/src/runtime/snapshot_loader.cpp Outdated Show resolved Hide resolved
@justfortaste justfortaste requested a review from xy720 December 3, 2024 04:36
@justfortaste justfortaste force-pushed the backup_cooldown_data_master branch from 13a365e to 9e2f47b Compare December 4, 2024 06:06
@justfortaste
Copy link
Contributor Author

run rebuildall

xy720
xy720 previously approved these changes Dec 5, 2024
Copy link
Member

@xy720 xy720 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xy720
Copy link
Member

xy720 commented Dec 5, 2024

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 5, 2024
Copy link
Contributor

github-actions bot commented Dec 5, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 5, 2024

PR approved by anyone and no changes requested.

lide-reed
lide-reed previously approved these changes Dec 5, 2024
Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.46% (10006/26019)
Line Coverage: 29.49% (83861/284350)
Region Coverage: 28.60% (43105/150713)
Branch Coverage: 25.20% (21907/86948)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8258cf6d8c0ee77e312d0c99fdceda1b227d4135_8258cf6d8c0ee77e312d0c99fdceda1b227d4135/report/index.html

@justfortaste justfortaste dismissed stale reviews from lide-reed and xy720 via 0db7dbb December 5, 2024 12:06
@justfortaste justfortaste force-pushed the backup_cooldown_data_master branch from 796ed9b to 8a6c454 Compare December 12, 2024 10:59
@justfortaste
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.74% (10104/26080)
Line Coverage: 29.68% (84771/285649)
Region Coverage: 28.74% (43515/151400)
Branch Coverage: 25.30% (22107/87372)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8a6c45451ea029ab6fbed515ae40ea832f48e895_8a6c45451ea029ab6fbed515ae40ea832f48e895/report/index.html

@justfortaste
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.74% (10107/26088)
Line Coverage: 29.68% (84856/285888)
Region Coverage: 28.75% (43555/151514)
Branch Coverage: 25.30% (22128/87456)
Coverage Report: http://coverage.selectdb-in.cc/coverage/328a6e5bab9045bfe4dbe511bf671477923c37f4_328a6e5bab9045bfe4dbe511bf671477923c37f4/report/index.html

Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@xy720 xy720 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@justfortaste
Copy link
Contributor Author

squash pr
#45589

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a huge pr, we should review carefully.

There is some low level change, like abs_path = std::filesystem::path(fmt::format("s3://{}/{}", _bucket, _prefix)) / path;

I am not sure its impact.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 25, 2024
@justfortaste
Copy link
Contributor Author

It is a huge pr, we should review carefully.

There is some low level change, like abs_path = std::filesystem::path(fmt::format("s3://{}/{}", _bucket, _prefix)) / path;

I am not sure its impact.

RemoteFileSystem::download will call absolute_path two times

RemoteFileSystem::download(const Path& remote_file, const Path& local) {
     S3FileSystem::absolute_path // first ime
     S3FileSystem::download_impl
     FileSystem::file_size
         S3FileSystem::absolute_path // second time

when "path" is not absolue, the result will return with double "_prefix"

original code:

    Status absolute_path(const Path& path, Path& abs_path) const override {
        if (path.string().find("://") != std::string::npos) {
            // the path is with schema, which means this is a full path like:
            // s3://bucket/path/to/file.txt
            // so no need to concat with prefix
            abs_path = path;
        } else {
            // path with no schema
            abs_path = _prefix / path;
        }
        return Status::OK();
    }

This function is only effect with S3 file system, the path will be parsed by "S3URI::parse" to get the key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants