OCPEDGE-2491: Log pcs status and etcd member list after every recovery#30949
OCPEDGE-2491: Log pcs status and etcd member list after every recovery#30949lucaconsalvi wants to merge 2 commits intoopenshift:mainfrom
Conversation
…y test Add post-test logging of `sudo pcs status` and `sudo podman exec etcd etcdctl member list -w table` via SSH through the hypervisor after every recovery test (pass or fail). This provides visibility into the final cluster state without relying on the Kubernetes API, which may be unavailable after recovery tests. The logging is registered via DeferCleanup in BeforeEach and is gated on HasHypervisorConfig(). Errors are logged but never fail the test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Skipping CI for Draft Pull Request. |
|
@lucaconsalvi: This pull request references OCPEDGE-2491 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAdded a cleanup hook to always collect and log final cluster diagnostics after each spec in the two-node recovery test. Implemented a file-local helper that obtains PCS and etcd membership state via SSH, handling missing hypervisor/SSH prerequisites gracefully without failing the test. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: lucaconsalvi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/two_node/tnf_recovery.go`:
- Around line 828-843: The SSH helper calls currently discard stderr (using `_`)
and only log the error object on failure, losing valuable diagnostics; update
the calls to services.PcsStatus and core.ExecuteRemoteSSHCommand so they capture
both stdout and stderr (e.g., pcsOutput, pcsErrOutput, pcsErr and etcdOutput,
etcdErrOutput, etcdErr), and when pcsErr or etcdErr is non-nil include both the
stdout and stderr variables in the framework.Logf message (alongside the error)
to preserve command output for debugging while keeping the existing success-path
logging unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 562d68bd-fc7c-43f9-8584-c2979c2c83dc
📒 Files selected for processing (1)
test/extended/two_node/tnf_recovery.go
|
Scheduling required tests: |
Include stdout and stderr in error log messages for pcs status and etcd member list commands, so diagnostic output is not lost on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Scheduling required tests: |
|
@lucaconsalvi: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Job Failure Risk Analysis for sha: d451f3c
|
Summary
-w table after every recovery test (pass or fail)
after recovery tests
configured
Details
Recovery tests currently don't log final cluster state, making it harder to diagnose
failures. This adds a logFinalClusterStatus helper registered via DeferCleanup in
BeforeEach that SSHs through the hypervisor to collect pacemaker and etcd membership
status from the cluster nodes after every test completes.
The function tries each node in order and stops after the first node where both commands
succeed (since both commands return cluster-wide state). If a node is unreachable or a
command fails, it logs the error and tries the next node.
Test plan
Fixes: OCPEDGE-2491