Skip to content

OCPBUGS-81627: *: Shift ClusterResourceQuotas CRD from 0000_03 to 0000_00_fast_...#2791

Open
wking wants to merge 1 commit intoopenshift:masterfrom
wking:apply-ClusterResourceQuotas-fast
Open

OCPBUGS-81627: *: Shift ClusterResourceQuotas CRD from 0000_03 to 0000_00_fast_...#2791
wking wants to merge 1 commit intoopenshift:masterfrom
wking:apply-ClusterResourceQuotas-fast

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented Apr 2, 2026

OCPBUGS-81627 describes this constraint:

... the ClusterResourceQuota CRD must be applied in the first 30s after startup (the timeout of the rbac/bootstrap-roles post-start hook). This is a problem because of a dependency in kube-apiserver:

  1. ClusterResourceQuota admission plugin tries to sync informers on startup
  2. Informer LIST fails because CRD doesn't exist yet
  3. Plugin blocks ALL namespaced resource requests for 10s waiting for sync
  4. ServiceAccounts get created BEFORE the CRD (alphabetical ordering)
  5. Requests blocked → 10s timeout → failures → liveness check fails → pod killed

and points out that recent cluster-version operator changes in the 0000_00-cluster-version-opreator_... space have slowed the CRD too much, and it trips off that liveness-check cycle.

With this pull, I'm moving the critical, bootstrap-time CRD up to 0000_00_fast_... to sort before 0000_00_cluster-version-operator_... That should reliably address the kube-apiserver liveness check.

It will also means that failures reconciling the CRD will block the new CVO from rolling out during updates, which could make things sticky (e.g. requiring manual work to unstick CVO bugs in CRD application, because we'd need to successfully apply this CRD before we got to the update's new CVO). But CRD-application issues don't crop up too often, and the kube-apiserver liveness issue is cropping up today in CI.

[1] describes this constraint:

  ... the ClusterResourceQuota CRD must be applied in the first 30s
  after startup (the timeout of the rbac/bootstrap-roles post-start
  hook). This is a problem because of a dependency in kube-apiserver:

  1. ClusterResourceQuota admission plugin tries to sync informers on startup
  2. Informer LIST fails because CRD doesn't exist yet
  3. Plugin blocks ALL namespaced resource requests for 10s waiting for sync
  4. ServiceAccounts get created BEFORE the CRD (alphabetical ordering)
  5. Requests blocked → 10s timeout → failures → liveness check fails → pod killed

and points out that recent cluster-version operator changes in the
0000_00-cluster-version-opreator_... space have slowed the CRD too
much, and it trips off that liveness-check cycle.

With this pull, I'm moving the critical, bootstrap-time CRD up to
0000_00_fast_... to sort before 0000_00_cluster-version-operator_...
That should reliably address the kube-apiserver liveness check.

It will also means that failures reconciling the CRD will block the
new CVO from rolling out during updates, which could make things
sticky (e.g. requiring manual work to unstick CVO bugs in CRD
application, because we'd need to successfully apply this CRD before
we got to the update's new CVO).  But CRD-application issues don't
crop up too often, and the kube-apiserver liveness issue is cropping
up today in CI.

[1]: https://redhat.atlassian.net/browse/OCPBUGS-81627
@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@wking: This pull request references Jira Issue OCPBUGS-81627, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

OCPBUGS-81627 describes this constraint:

... the ClusterResourceQuota CRD must be applied in the first 30s after startup (the timeout of the rbac/bootstrap-roles post-start hook). This is a problem because of a dependency in kube-apiserver:

  1. ClusterResourceQuota admission plugin tries to sync informers on startup
  2. Informer LIST fails because CRD doesn't exist yet
  3. Plugin blocks ALL namespaced resource requests for 10s waiting for sync
  4. ServiceAccounts get created BEFORE the CRD (alphabetical ordering)
  5. Requests blocked → 10s timeout → failures → liveness check fails → pod killed

and points out that recent cluster-version operator changes in the 0000_00-cluster-version-opreator_... space have slowed the CRD too much, and it trips off that liveness-check cycle.

With this pull, I'm moving the critical, bootstrap-time CRD up to 0000_00_fast_... to sort before 0000_00_cluster-version-operator_... That should reliably address the kube-apiserver liveness check.

It will also means that failures reconciling the CRD will block the new CVO from rolling out during updates, which could make things sticky (e.g. requiring manual work to unstick CVO bugs in CRD application, because we'd need to successfully apply this CRD before we got to the update's new CVO). But CRD-application issues don't crop up too often, and the kube-apiserver liveness issue is cropping up today in CI.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 2, 2026

Hello @wking! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 93cff981-4979-4a15-a867-735596694da5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 2, 2026
@openshift-ci openshift-ci bot requested review from JoelSpeed and deads2k April 2, 2026 12:21
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 2, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 2, 2026

verify-crd-schema is grumpy, but says:

This verifier checks all files that have changed. In some cases you may have changed or renamed a file that already contained api violations, but you are not introducing a new violation. In such cases it is appropriate to /override the failing CI job.

And we'll want that /override ci/prow/verify-crd-schema for that reason here.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 2, 2026

@wking: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-crd-schema 4b59baa link true /test verify-crd-schema
ci/prow/verify 4b59baa link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@wking
Copy link
Copy Markdown
Member Author

wking commented Apr 3, 2026

/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-agent-compact-fips

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 3, 2026

@wking: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-nightly-4.22-e2e-agent-compact-fips

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a2faa160-2f66-11f1-98f0-46894b072ef2-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants