Skip to content

Merge native backup #4381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 2, 2025
Merged

Merge native backup #4381

merged 13 commits into from
Jun 2, 2025

Conversation

Michal-Leszczynski
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski commented May 12, 2025

This PR contains subset of changes from ml/scylla-api-2 needed in order to support native backup api.
The main change comes with replacing rclone sync/movedir api with native scylla storage_service/backup api when uploading snapshot-ed files into s3. This PR also makes some changes regarding configuring SM test env to support the native scylla backup api. The most interesting part from user perspective is the new sctool backup --method flag (previously --api-hint), which is documented as below:

   - name: method
      default_value: auto
      usage: |-
        Specify API used for uploading files.
        This flag can be set to either:
        - 'auto' means that native backup API will be used when possible, otherwise Rclone API will be used.
        - 'rclone' means that only Rclone API will be used.
        - 'native' means that only native API will be used.
        ScyllaDB Manager can use either Rclone API or native Scylla API for uploading snapshot-ed files.
        In both cases, it requires to configure ScyllaDB Manager Agent 'scylla-manager-agent.yaml' config.
        In case of native backup API, it also requires to configure 'object_storage_endpoints' in 'scylla.yaml'.

Initially it was supposed to be set to rclone by default, but I decided to go with auto because:

  • we still want to have it set to auto before release
  • it increases the scope of native backup api tests
  • changing the default value is not just changing a single string - it also requires changes in the tests, which takes some minor time

Fixes #4143
Fixes #4138
Fixes #4141
Fixes #4384
Fixes #4386
Fixes #4388

@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/merge-native-backup branch 3 times, most recently from 2939c5c to df12f50 Compare May 13, 2025 08:08
@Michal-Leszczynski
Copy link
Collaborator Author

Michal-Leszczynski commented May 13, 2025

The test failure is most likely a flake connected to: #4382 (need to verify).

Update: running this test 20 times where RcloneMoveDir is called with context.Background() succeeded, so it looks like #4382 is indeed connected.

@Michal-Leszczynski Michal-Leszczynski marked this pull request as ready for review May 13, 2025 08:57
@Michal-Leszczynski
Copy link
Collaborator Author

@Michal-Leszczynski
Copy link
Collaborator Author

@karol-kokoszka @VAveryanov8 Please take a look. This is important in terms of leveraging the new native backup api from Scylla 2025.2.

Copy link
Collaborator

@VAveryanov8 VAveryanov8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request May 14, 2025
Added a support for the api-hint parameter of scylla-manager backup.
It is set by an SCT parameter of object_storage_upload_mode with an enum of ObjectStorageUploadMode:
auto/rclone/native.
refs: scylladb/scylla-manager#4381
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request May 14, 2025
Added a support for the api-hint parameter of scylla-manager backup.
It is set by an SCT parameter of object_storage_upload_mode with an enum of ObjectStorageUploadMode:
auto/rclone/native.
refs: scylladb/scylla-manager#4381
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request May 14, 2025
Added a support for the api-hint parameter of scylla-manager backup.
It is set by an SCT parameter of object_storage_upload_mode with an enum of ObjectStorageUploadMode:
auto/rclone/native.
refs: scylladb/scylla-manager#4381
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request May 15, 2025
Added a support for the api-hint parameter of scylla-manager backup.
It is set by an SCT parameter of object_storage_upload_mode with an enum of ObjectStorageUploadMode:
auto/rclone/native.
refs: scylladb/scylla-manager#4381
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request May 15, 2025
Added a support for the api-hint parameter of scylla-manager backup.
It is set by an SCT parameter of object_storage_upload_mode with an enum of ObjectStorageUploadMode:
auto/rclone/native.
refs: scylladb/scylla-manager#4381
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request May 16, 2025
Added a support for the api-hint parameter of scylla-manager backup.
It is set by an SCT parameter of object_storage_upload_mode with an enum of ObjectStorageUploadMode:
auto/rclone/native.
refs: scylladb/scylla-manager#4381
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request May 26, 2025
Added a support for the api-hint parameter of scylla-manager backup.
It is set by an SCT parameter of object_storage_upload_mode with an enum of ObjectStorageUploadMode:
auto/rclone/native.
refs: scylladb/scylla-manager#4381
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/merge-native-backup branch 2 times, most recently from 89ee907 to 4b7e582 Compare May 27, 2025 12:26
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/merge-native-backup branch 2 times, most recently from 0743352 to 4dd7b64 Compare May 28, 2025 13:07
@Michal-Leszczynski
Copy link
Collaborator Author

New commits:

@VAveryanov8 @karol-kokoszka please take a look.
I will implement #4393 in a follow-up PR, but it would be good to have all functionality ready on master ASAP.

Copy link
Collaborator

@VAveryanov8 VAveryanov8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Michal-Leszczynski
Copy link
Collaborator Author

@karol-kokoszka please take a look!

@karol-kokoszka
Copy link
Collaborator

integration-tests-latest are validating the native backup right ?

@karol-kokoszka
Copy link
Collaborator

@Michal-Leszczynski Overall, looks very good.
Please resolve the comment about unnecessary calls to the AWS instance metadata endpoint.

@Michal-Leszczynski
Copy link
Collaborator Author

integration-tests-latest are validating the native backup right ?

Yes, but I will also work on #4393 (in a separate PR) to make it more visible.

This commit also uses sizesuffix.SizeSuffix instead of managerclient.SizeSuffix in command/repair.
This type has been moved across modules, and we need to adjust for it.
@Michal-Leszczynski
Copy link
Collaborator Author

Added 3 commits that needs to be squashed before merge:

@karol-kokoszka please take a look

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Please squash and merge. It's ok to to have more than 1 commit, but don't leave micro commits, rather rely on features.
Commits like:

[feat(restore): adjust model to recreated restore_run_progress]
[refactor(scyllaclient): remove unused ToCanonicalIP]
[feat(sstable): describe sstable specification]

Are not informative without the context. Linking the issue is not enough.

As there are no plans of adding http[s] scheme constants to http pkg,
it should be fine to hardcode them instead of adding constants to some util pkg.
Ref: golang/go#40587
Previously when using sctool, --skip-schema was set to --purge-only by mistake.
A separate column for Scylla task ID is needed because:
- it has a different type from agent job ID
- it makes it clear which API was used

In case of restore run progress, we need it to be a part
of the clustering key. Otherwise, we won't be able to store multiple
run progresses with the same remote sstable dir and host in the DB.

That's why we need to recreate the restore_run_progress table via
MigrateToRestoreRunProgressWithSortKeyWithScyllaTaskID, which ensures
that restore_run_progress data is not lost in the process.

Since we are recreating the table, we can also rename manifest_path
and versioned_progress to more suitable names.
Config is passed by value, so even if it's filled before registering
provider, this change is not reflected in other places. Because of that,
it's best to fill it at the beginning.
This commit adds code for using Scylla backup API.
Luckily for us, handling pause/resume and progress
is analogous to the Rclone API handling.
It allows for configuring which API (native/rclone) will be used
for uploading backup files.

Fixes #4143
Fixes #4138
Fixes #4141
Fixes #4384
Fixes #4386
@Michal-Leszczynski Michal-Leszczynski merged commit 76a2c58 into master Jun 2, 2025
42 checks passed
@Michal-Leszczynski Michal-Leszczynski deleted the ml/merge-native-backup branch June 2, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants