Skip to content

fix(ROX-31121): Monitoring deployment from Prometheus Community chart#1807

Merged
tommartensen merged 2 commits intomasterfrom
tm/fix-monitoring-deploy
Apr 17, 2026
Merged

fix(ROX-31121): Monitoring deployment from Prometheus Community chart#1807
tommartensen merged 2 commits intomasterfrom
tm/fix-monitoring-deploy

Conversation

@tommartensen
Copy link
Copy Markdown
Contributor

@tommartensen tommartensen commented Apr 17, 2026

Replace Bitnami kube-prometheus with Prometheus Community kube-prometheus-stack (83.5.1), refresh requirements.lock, and rewrite monitoring-values.yaml for the new subchart: same resource limits, PVC-backed Prometheus, disabled node/kube-state/kube control-plane scrapers, Slack routing via alertmanagerConfigSelector, tuned defaultRules for disabled targets, no custom image repos, and Grafana off.

Why: Bitnami kube-prometheus chart is outdated and the legacy images were deleted from DockerHub.

Also includes a fix for #1804 if NO_MONITORING is unset.

I tested the monitoring stack in a separate GKE cluster and it produced a test alert in Slack: https://redhat-internal.slack.com/archives/C06L0LDBEGZ/p1776413056761179

@tommartensen tommartensen self-assigned this Apr 17, 2026
@rhacs-bot
Copy link
Copy Markdown
Contributor

A single node development cluster (infra-pr-1807) was allocated in production infra for this PR.

CI will attempt to deploy quay.io/rhacs-eng/infra-server: to it.

🔌 You can connect to this cluster with:

gcloud container clusters get-credentials infra-pr-1807 --zone us-central1-a --project acs-team-temp-dev

🛠️ And pull infractl from the deployed dev infra-server with:

nohup kubectl -n infra port-forward svc/infra-server-service 8443:8443 &
make pull-infractl-from-dev-server

🚲 You can then use the dev infra instance e.g.:

bin/infractl -k -e localhost:8443 whoami

⚠️ Any clusters that you start using your dev infra instance should have a lifespan shorter then the development cluster instance. Otherwise they will not be destroyed when the dev infra instance ceases to exist when the development cluster is deleted. ⚠️

Further Development

☕ If you make changes, you can commit and push and CI will take care of updating the development cluster.

🚀 If you only modify configuration (chart/infra-server/configuration) or templates (chart/infra-server/{static,templates}), you can get a faster update with:

make helm-deploy

Logs

Logs for the development infra depending on your @redhat.com authuser:

Or:

kubectl -n infra logs -l app=infra-server --tail=1 -f

@tommartensen tommartensen marked this pull request as ready for review April 17, 2026 08:11
@tommartensen tommartensen requested review from a team and rhacs-bot as code owners April 17, 2026 08:11
@tommartensen tommartensen changed the title fix: Monitoring deployment from Prometheus Community chart fix(ROX-31121): Monitoring deployment from Prometheus Community chart Apr 17, 2026
@tommartensen tommartensen enabled auto-merge (squash) April 17, 2026 08:14
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆

@tommartensen tommartensen merged commit 5e8fc16 into master Apr 17, 2026
11 checks passed
@tommartensen tommartensen deleted the tm/fix-monitoring-deploy branch April 17, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants