OCPBUGS-81627: *: Shift ClusterResourceQuotas CRD from 0000_03 to 0000_00_fast_...#2791
OCPBUGS-81627: *: Shift ClusterResourceQuotas CRD from 0000_03 to 0000_00_fast_...#2791wking wants to merge 1 commit intoopenshift:masterfrom
Conversation
[1] describes this constraint: ... the ClusterResourceQuota CRD must be applied in the first 30s after startup (the timeout of the rbac/bootstrap-roles post-start hook). This is a problem because of a dependency in kube-apiserver: 1. ClusterResourceQuota admission plugin tries to sync informers on startup 2. Informer LIST fails because CRD doesn't exist yet 3. Plugin blocks ALL namespaced resource requests for 10s waiting for sync 4. ServiceAccounts get created BEFORE the CRD (alphabetical ordering) 5. Requests blocked → 10s timeout → failures → liveness check fails → pod killed and points out that recent cluster-version operator changes in the 0000_00-cluster-version-opreator_... space have slowed the CRD too much, and it trips off that liveness-check cycle. With this pull, I'm moving the critical, bootstrap-time CRD up to 0000_00_fast_... to sort before 0000_00_cluster-version-operator_... That should reliably address the kube-apiserver liveness check. It will also means that failures reconciling the CRD will block the new CVO from rolling out during updates, which could make things sticky (e.g. requiring manual work to unstick CVO bugs in CRD application, because we'd need to successfully apply this CRD before we got to the update's new CVO). But CRD-application issues don't crop up too often, and the kube-apiserver liveness issue is cropping up today in CI. [1]: https://redhat.atlassian.net/browse/OCPBUGS-81627
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@wking: This pull request references Jira Issue OCPBUGS-81627, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Hello @wking! Some important instructions when contributing to openshift/api: |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
verify-crd-schema is grumpy, but says:
And we'll want that |
|
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-agent-compact-fips |
|
@wking: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a2faa160-2f66-11f1-98f0-46894b072ef2-0 |
OCPBUGS-81627 describes this constraint:
and points out that recent cluster-version operator changes in the
0000_00-cluster-version-opreator_...space have slowed the CRD too much, and it trips off that liveness-check cycle.With this pull, I'm moving the critical, bootstrap-time CRD up to
0000_00_fast_...to sort before0000_00_cluster-version-operator_...That should reliably address the kube-apiserver liveness check.It will also means that failures reconciling the CRD will block the new CVO from rolling out during updates, which could make things sticky (e.g. requiring manual work to unstick CVO bugs in CRD application, because we'd need to successfully apply this CRD before we got to the update's new CVO). But CRD-application issues don't crop up too often, and the kube-apiserver liveness issue is cropping up today in CI.