Add configurable database health check with automatic restart on failure#38
Open
rositsa-popova wants to merge 2 commits intocloudfoundry:mainfrom
Open
Add configurable database health check with automatic restart on failure#38rositsa-popova wants to merge 2 commits intocloudfoundry:mainfrom
rositsa-popova wants to merge 2 commits intocloudfoundry:mainfrom
Conversation
|
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements a configurable database health check for Locket that monitors database connectivity and automatically restarts the process when failures are detected. Follows the same pattern as BBS (cloudfoundry/bbs#134).
Resolves: cloudfoundry/diego-release#1105
Problem: Locket can enter a degraded state when the database becomes unresponsive, with no automatic recovery mechanism.
Solution:
Test Results
All tests passed on dev landscape with PostgreSQL backend:
Test 1 - Backward Compatibility (Health Check Disabled): ✅
enable_db_health_check: false(default)Test 2 - Health Check Enabled and Working: ✅
locket.db-health-check-runner.health-check-succeededTest 3 - Database Failure Detection: ✅
"database-failure-detected-restarting-locket"Test 4 - Timeout Protection: ✅
Test 5 - Configuration Parameters: ✅
enable_db_health_check: truehealth_check_interval: 10shealth_check_timeout: 5shealth_check_failure_threshold: 3Database Support
Backward Compatibility
Breaking Change? No
This feature is disabled by default and requires explicit operator opt-in via the
enable_db_health_checkBOSH property. When disabled (default), Locket behaves exactly as before with no changes to functionality or performance.When enabled:
locket_health_check(simple 2-column table)