Skip to content

Conversation

@camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Jan 11, 2026

What This Fixes

Installed extensions now keep working when their catalog becomes unavailable or is deleted.

Scenario Runtime On Main With This PR
Registry offline Helm ❌ Extension fails βœ… Keeps running
Registry offline Boxcutter ❌ Extension fails βœ… Keeps running
Catalog deleted + resource deleted Helm ❌ Resource NOT restored βœ… Auto-restored
Catalog deleted + resource deleted Boxcutter ❌ Resource NOT restored βœ… Auto-restored
Catalog unavailable + update config Helm ❌ Update fails βœ… Update works
Catalog unavailable + update config Boxcutter ❌ Update fails βœ… Update works
Catalog unavailable + upgrade version Helm ❌ Unclear error βœ… Clear error + keeps running
Catalog unavailable + upgrade version Boxcutter ❌ Unclear error βœ… Clear error + keeps running

TL'DR: Detailed Scenarios Examples

Scenario 1: Registry Server Goes Offline

Setup: You have Prometheus operator v1.0.0 installed and running

What happens: The container registry where the catalog image is stored becomes unreachable (network issue, registry maintenance, etc.)

On Main Branch ❌

Runtime What Breaks Why
Helm Extension shows "Failed" Can't contact catalog β†’ reconciliation stops
Boxcutter Extension shows "Failed" Can't contact catalog β†’ reconciliation stops

Impact: Prometheus stops monitoring your cluster

With This PR βœ…

Runtime What Works How
Helm Extension stays "Installed", keeps running Falls back to Helm release stored in cluster
Boxcutter Extension stays "Installed", keeps running Falls back to ClusterExtensionRevision in cluster

Impact: Prometheus continues monitoring without interruption


## Scenario 2: Catalog Deleted, Resource Gets Deleted

Setup: Prometheus operator installed

What happens:

1. You delete the ClusterCatalog (Any reasons, the catalog was deleted it should not broke current workloads)
2. Someone accidentally runs: kubectl delete configmap prometheus-config

On Main Branch ❌

Runtime What Happens Result
Helm ConfigMap NOT restored Prometheus breaks, no monitoring
Boxcutter ConfigMap NOT restored Prometheus breaks, no monitoring

Why: Catalog deleted β†’ reconciliation stopped β†’ Apply step never runs β†’ resources not maintained

Impact: Manual recovery needed (recreate ConfigMap manually)

With This PR βœ…

Runtime What Happens How It's Fixed
Helm ConfigMap automatically restored Helm.Reconcile() compares cluster to stored release, recreates ConfigMap
Boxcutter ConfigMap automatically restored Boxcutter engine SSA from ClusterExtensionRevision, recreates ConfigMap

Impact: Self-healing! Prometheus automatically recovers


Scenario 3: Update Configuration Without Catalog

Setup: Prometheus installed, catalog unavailable

What happens: You want to change CRD upgrade safety enforcement from "Strict" to "None"

On Main Branch ❌

Runtime What Happens Error Message
Helm Update FAILS "reconciliation failed: catalog not found"
Boxcutter Update FAILS "reconciliation failed: catalog not found"

Why: Reconciliation requires catalog lookup

Impact: Can't modify any settings

With This PR βœ…

Runtime What Happens How
Helm Update APPLIED Helm applier updates existing release with new config
Boxcutter Update APPLIED Boxcutter updates existing revision with new config

Impact: Configuration management works offline


Scenario 4: Try to Upgrade Version Without Catalog

Setup: Prometheus v1.0.0 installed, catalog unavailable

What happens: You try to upgrade Prometheus to v1.0.1

On Main Branch ❌

Runtime What Happens Status Shown
Helm Upgrade doesn't happen Generic "Failed"
Boxcutter Upgrade doesn't happen Generic "Failed"

Why: Reconciliation stopped

Impact: Unclear why upgrade not happening, no visibility

With This PR βœ…

Runtime What Happens Status Shown
Helm Upgrade properly blocked Progressing=Retrying: catalog not found, Installed=v1.0.0 (True)
Boxcutter Upgrade properly blocked Progressing=Retrying: catalog not found, Installed=v1.0.0 (True)

Impact:

  • Clear feedback: "Need catalog to upgrade"
  • Workload still runs on v1.0.0
  • Will auto-upgrade when catalog returns (via watch)

Copilot AI review requested due to automatic review settings January 11, 2026 05:18
@netlify
Copy link

netlify bot commented Jan 11, 2026

βœ… Deploy Preview for olmv1 ready!

Name Link
πŸ”¨ Latest commit 865ac9b
πŸ” Latest deploy log https://app.netlify.com/projects/olmv1/deploys/6964df6f0c1457000898a77d
😎 Deploy Preview https://deploy-preview-2439--olmv1.netlify.app
πŸ“± Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci
Copy link

openshift-ci bot commented Jan 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign pedjak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive end-to-end tests to verify that installed OLM extensions continue functioning correctly when their source catalog is deleted. The tests cover both standard runtime and experimental Boxcutter runtime scenarios.

Changes:

  • Added new feature file with 8 scenarios testing catalog deletion resilience
  • Implemented CatalogIsDeleted function to support catalog deletion in tests
  • Added step registrations for ClusterExtension update operations

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
test/e2e/steps/steps.go Adds CatalogIsDeleted function and step registrations for testing catalog deletion and ClusterExtension updates
test/e2e/features/catalog-deletion-resilience.feature Defines 8 test scenarios covering extension resilience, resource restoration, config changes, version upgrades, and revision behavior when catalog is deleted

πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 11, 2026 05:43
@camilamacedo86 camilamacedo86 changed the title 🌱 test: add e2e tests for workload resilience when catalog is deleted WIP 🌱 test: add e2e tests for workload resilience when catalog is deleted Jan 11, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

l.Info("skipping unpack - using installed bundle content")
// imageFS will remain nil - the applier will use the existing installed content
return nil, nil
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROBLEM: Always tries to pull image, even when using installed bundle


// If contentFS is nil, we're maintaining the current state without catalog access.
// In this case, reconcile the existing Helm release if it exists.
if contentFS == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fail if is not able to get the content.
PROBLEM: Immediately tries to build chart from contentFS
FIX: Reconcile the existing release and watch the release objects to ensure they're maintained

Copilot AI review requested due to automatic review settings January 11, 2026 07:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 11, 2026 07:30
@camilamacedo86 camilamacedo86 changed the title WIP 🌱 test: add e2e tests for workload resilience when catalog is deleted WIP πŸ› Workload should still resilient when catalog is deleted Jan 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 11, 2026 09:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@camilamacedo86 camilamacedo86 force-pushed the test-e2e-res branch 2 times, most recently from 986e945 to 95f39fb Compare January 11, 2026 09:41
Copilot AI review requested due to automatic review settings January 11, 2026 09:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 12, 2026 09:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings January 12, 2026 11:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Enables installed extensions to continue working when their source
catalog becomes unavailable or is deleted. When resolution fails due
to catalog unavailability, the operator now continues reconciling with
the currently installed bundle instead of failing.

Changes:
- Resolution falls back to installed bundle when catalog unavailable
- Unpacking skipped when maintaining current installed state
- Helm and Boxcutter appliers handle nil contentFS gracefully
- Version upgrades properly blocked without catalog access

This ensures workloads remain stable and operational even when the
catalog they were installed from is temporarily unavailable or deleted,
while appropriately preventing version changes that require catalog access.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant