-
Notifications
You must be signed in to change notification settings - Fork 79
Implement Kopia #1723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement Kopia #1723
Conversation
Hi @perfectra1n. Thanks for your PR. I'm waiting for a backube member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
That's a very scary test file! |
b6c2732
to
44665a2
Compare
@perfectra1n hey thanks for this PR! Obviously a lot of work went into this - please be patient as it may take some time before we can get around to reviewing.... One thing just from glancing over the description: The fix for "controller sync fix" - I think I'd prefer if there was an issue created for this and it was handled separately - I'm not sure I understand the need for a finalizer? However it does sound like we most likely do need a check for the deletion timestamp.... An issue would help to put more info around what is getting stuck (for example is there a specific resource that is always getting stuck?) and putting in a targeted fix. |
This comment was marked as outdated.
This comment was marked as outdated.
4343746
to
cd668cb
Compare
cd668cb
to
95fa4e3
Compare
I was able to kick the tires on this and can confirm backup and restores are working the same as I use restic. I tested Kubernetes: v1.33.3 I noticed kopia is much faster than restic and we don't run into the repository locking issues. There's a lot of features to cover though, so maybe we need more bodies on it for testing. @tesshuflower @JohnStrunk I understand this is a lengthy PR and the support burden it can add but I really hope you can get time to review and test as well. This would be a very nice addition to the project and I am sure Redhat/OpenShift customers currently using restic would be happy to have Kopia as an option. |
95fa4e3
to
2394f56
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: perfectra1n The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@perfectra1n can you create a separate issue with details? I think this is something we'd like to fix, but would like more information on exactly what resources are holding things up, etc. Are there pvcs getting stuck in terminating for a long time or something of the sort? |
Sure! I can put it in another PR, I don't want to derail the Kopia discussion though :) |
I'd like an issue/bug report that explains it first please. |
Sorry, I meant to say that I'll open an Issue / bug report for it (not a PR) :) |
I'll rebase after this has been reviewed :) |
.github/workflows/docs-deploy.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes to .github/workflows intended to be part of this PR? Updates to do extra things like publish documentation to gihub pages should instead be an enhancement request and not part of the kopia implementation.
Additionally, it looks like the other github workflows have been renamed to *.disabled, so I'm assuming maybe these changes were just accidentally added to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to warn you, we are not yet ready to move to golang 1.24 - there's actually an issue with building on arm64 right now with golang 1.24. I would also like to move to golang 1.24 separately from the mover PR. So this would either have to wait, or you can start with a build that works with 1.23.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related golang issue: golang/go#75074
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest Kopia releases only build with Golang 1.24, so unless we get the release binary instead of building from source, I'm not quite sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did fix that in my Dockerfile though, so I'm able to build on ARM64 via this change, but I'm not sure if that aligns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, thanks for that - I may try using that myself as the golang issue seems to still be unresolved for the moment. At least that would unblock us for moving to 1.24.
config/rbac/role.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional? These permissions are required and should not be removed from the role
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, that was unintentional. That was from me removing the "sync controller" fix earlier, I'll revert.
Note that the DCO check is failing because the commits are not signed off - please see the instructions here on how to resolve: https://github.com/backube/volsync/pull/1723/checks?check_run_id=48522602445 |
Signed-off-by: Mend Renovate <bot@renovateapp.com> Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: Mend Renovate <bot@renovateapp.com> Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
…t recent" first Signed-off-by: perf3ct <jonfuller2012@gmail.com>
…ot selection is correct Signed-off-by: perf3ct <jonfuller2012@gmail.com>
67b0816
to
b869768
Compare
eb6c88b
to
b869768
Compare
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Alright, I'm done messing around with the rebase - I believe I've addressed the concerns you highlighted @tesshuflower. I can also remove the Dockerfile change for Golang 1.24 if you'd prefer to just keep it at Golang 1.23. |
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
… overflow The Kopia mover logs were filling up the 1GB cache PVC due to high default retention settings and verbose debug logging. This commit adds environment variable controls for log configuration with sensible defaults: - Set default file log level to 'warn' instead of 'debug' - Limit log retention to 10 files and 24 hours by default - Add environment variables for users to override these settings: - KOPIA_FILE_LOG_LEVEL (default: warn) - KOPIA_LOG_DIR_MAX_FILES (default: 10) - KOPIA_LOG_DIR_MAX_AGE (default: 24h) - KOPIA_CONTENT_LOG_DIR_MAX_FILES (default: 10) - KOPIA_CONTENT_LOG_DIR_MAX_AGE (default: 24h) These settings prevent the cache PVC from filling up while still maintaining useful logs for debugging when needed. Users can adjust these values through their Kopia repository secret if different retention is required. Signed-off-by: perf3ct <jonfuller2012@gmail.com>
…ocs for logging settings Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
Signed-off-by: perf3ct <jonfuller2012@gmail.com>
0e3df3b
to
90fb931
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe you used testing.T rather than ginkgo to avoid the suite setup/teardown, but I think we'd prefer to have all tests using ginkgo to be consistent. Almost anytime these are run will be for the entire suite or at least the package, so I'd expect the suite setup/teardown to be invoked anyway.
We run the "make test" target to run tests in our CI which invokes ginkgo, and I'm not sure it'll even pickup these tests by default.
Describe what this PR does
This PR implements Kopia as a mover.
Is there anything that requires special attention?
To give this a spin yourself to see if/how it works, you can review the top of the fork's README. You can also see the deployed Kopia documentation here. I have a handful of commits in this branch so that the fork works as expected, but I've reverted them so that this branch can be used for the explicit purpose of merging.
Related issues:
Closes #320
Kopia
Unlike the other mover tools in VolSync, Kopia operates as a content-addressable backup system with fundamentally different approaches to data storage, deduplication, and repository management.
How Kopia works:
Kopia automatically deduplicates identical content across all backups, while Restic/Rsync store incremental changes.
How Concurrent Access Works
Kopia's approach:
Kopia's content-addressable design means multiple clients can write simultaneously because identical content gets the same hash - no lock conflicts.
Kopia's multi-tenant design:
Benefits include shared storage with isolation, cross-tenant deduplication saves space, and per-tenant policies and access control.
Kopia uses pluggable storage drivers that abstract the underlying storage while providing consistent repository semantics:
Unified Repository Format
Native Cloud Integration
Unlike Restic which uses generic S3 API, Kopia has native drivers for S3, GCS, and Azure (as examples)
S3 Intelligent Path Parsing:
Multi-Backend Credential Flexibility:
Configuration Options
The
ReplicationSource
spec includes Kopia-specific options:Smart Cache Strategy Selection:
Instead of requiring users to understand cache implications, the mover automatically chooses the best strategy:
The mover implements lifecycle-based metrics collection throughout the backup process:
How metrics are structured with labels:
Available metrics categories:
backup_duration_seconds
,compression_ratio
,data_transfer_rate
backup_success_total
,backup_failure_total
,job_retries_total
cache_type
(pvc/emptydir),cache_size_bytes
,policy_compliance
repository_connectivity
,maintenance_operations_total
Policy Configuration
Support for structured repository configuration and policy files:
Core Implementation:
internal/controller/mover/kopia/mover.go
- Main mover implementationinternal/controller/mover/kopia/metrics.go
- Prometheus metricsinternal/controller/mover/kopia/builder.go
- Builder pattern for mover creationmover-kopia/entry.sh
- Container entry point with debug supportmover-kopia/Dockerfile
- Kopia container with all backendsCRD Extensions:
api/v1alpha1/replicationsource_types.go
- ReplicationSourceKopiaSpecapi/v1alpha1/replicationdestination_types.go
- ReplicationDestinationKopiaSpecapi/v1alpha1/common_types.go
- KopiaPolicySpecDocumentation:
docs/usage/kopia/index.rst
- User guidedocs/usage/kopia/database_example.rst
- Database backup examplesFlexible Job Configuration:
Multi-Backend Environment Variables:
Database Backup:
Multi-Cloud with Policies:
Restore Example: