-
Notifications
You must be signed in to change notification settings - Fork 70
WIP π Workload should still resilient when catalog is deleted #2439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP π Workload should still resilient when catalog is deleted #2439
Conversation
β Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive end-to-end tests to verify that installed OLM extensions continue functioning correctly when their source catalog is deleted. The tests cover both standard runtime and experimental Boxcutter runtime scenarios.
Changes:
- Added new feature file with 8 scenarios testing catalog deletion resilience
- Implemented
CatalogIsDeletedfunction to support catalog deletion in tests - Added step registrations for ClusterExtension update operations
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/e2e/steps/steps.go | Adds CatalogIsDeleted function and step registrations for testing catalog deletion and ClusterExtension updates |
| test/e2e/features/catalog-deletion-resilience.feature | Defines 8 test scenarios covering extension resilience, resource restoration, config changes, version upgrades, and revision behavior when catalog is deleted |
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d3cbb5a to
f31b184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f31b184 to
dce6d68
Compare
| l.Info("skipping unpack - using installed bundle content") | ||
| // imageFS will remain nil - the applier will use the existing installed content | ||
| return nil, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PROBLEM: Always tries to pull image, even when using installed bundle
|
|
||
| // If contentFS is nil, we're maintaining the current state without catalog access. | ||
| // In this case, reconcile the existing Helm release if it exists. | ||
| if contentFS == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fail if is not able to get the content.
PROBLEM: Immediately tries to build chart from contentFS
FIX: Reconcile the existing release and watch the release objects to ensure they're maintained
dce6d68 to
b15c262
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b15c262 to
b1d259e
Compare
b1d259e to
c6870c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Show resolved
Hide resolved
c6870c5 to
36e9069
Compare
36e9069 to
6799025
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
986e945 to
95f39fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Show resolved
Hide resolved
95f39fb to
8851183
Compare
8851183 to
3f99ac5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
3f99ac5 to
20ba323
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_controller_test.go
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_controller_test.go
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
043d689 to
773f156
Compare
773f156 to
84e6cc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
π‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
Enables installed extensions to continue working when their source catalog becomes unavailable or is deleted. When resolution fails due to catalog unavailability, the operator now continues reconciling with the currently installed bundle instead of failing. Changes: - Resolution falls back to installed bundle when catalog unavailable - Unpacking skipped when maintaining current installed state - Helm and Boxcutter appliers handle nil contentFS gracefully - Version upgrades properly blocked without catalog access This ensures workloads remain stable and operational even when the catalog they were installed from is temporarily unavailable or deleted, while appropriately preventing version changes that require catalog access.
84e6cc6 to
865ac9b
Compare
What This Fixes
Installed extensions now keep working when their catalog becomes unavailable or is deleted.
TL'DR: Detailed Scenarios Examples
Scenario 1: Registry Server Goes Offline
Setup: You have Prometheus operator v1.0.0 installed and running
What happens: The container registry where the catalog image is stored becomes unreachable (network issue, registry maintenance, etc.)
On Main Branch β
Impact: Prometheus stops monitoring your cluster
With This PR β
Impact: Prometheus continues monitoring without interruption
## Scenario 2: Catalog Deleted, Resource Gets Deleted
Setup: Prometheus operator installed
What happens:
On Main Branch β
Why: Catalog deleted β reconciliation stopped β Apply step never runs β resources not maintained
Impact: Manual recovery needed (recreate ConfigMap manually)
With This PR β
Impact: Self-healing! Prometheus automatically recovers
Scenario 3: Update Configuration Without Catalog
Setup: Prometheus installed, catalog unavailable
What happens: You want to change CRD upgrade safety enforcement from "Strict" to "None"
On Main Branch β
Why: Reconciliation requires catalog lookup
Impact: Can't modify any settings
With This PR β
Impact: Configuration management works offline
Scenario 4: Try to Upgrade Version Without Catalog
Setup: Prometheus v1.0.0 installed, catalog unavailable
What happens: You try to upgrade Prometheus to v1.0.1
On Main Branch β
Why: Reconciliation stopped
Impact: Unclear why upgrade not happening, no visibility
With This PR β
Progressing=Retrying: catalog not found,Installed=v1.0.0 (True)Progressing=Retrying: catalog not found,Installed=v1.0.0 (True)Impact: