Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion prod/eks/terragrunt.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ inputs = {
# ARM node group configuration
arm_instance_types = ["t4g.large", "t4g.xlarge"]
arm_desired_size = 3
arm_max_size = 10
arm_max_size = 50
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: This change violates the established EKS Auto-Scaling Policy and must not be merged without proper approvals.

Increasing arm_max_size from 10 to 50 directly conflicts with the documented EKS Auto-Scaling Policy found in Notion, which explicitly states that the maximum node limit of 10 is a hard constraint based on:

  1. AWS service quota limits - No verification that quotas support 50 nodes
  2. Budget cap - $15,000/month for prod compute with alerts at 75%, 90%, and 100%; a 5x capacity increase could trigger automatic scale-down
  3. IP address availability - Private subnet IP exhaustion risk

Missing Required Approvals:

  • ❌ No capacity planning request found in Linear
  • ❌ No FinOps review for budget impact (5x increase)
  • ❌ No AWS service quota increase request (requires 2-3 day lead time)

The policy requires a formal process: capacity planning request → FinOps review → AWS quota increase before any changes to max node limits.

Based on external tools context (Notion EKS Auto-Scaling Policy).

Please provide evidence of:

  1. Approved capacity planning ticket
  2. FinOps sign-off on budget impact
  3. Confirmed AWS service quota increase to support 50 nodes
  4. Verification that private subnets have sufficient IP addresses

Until these prerequisites are met, this change should be reverted to arm_max_size = 10.

🤖 Prompt for AI Agents
In prod/eks/terragrunt.hcl around line 28, changing arm_max_size from 10 to 50
violates the EKS Auto-Scaling Policy; revert the value back to arm_max_size = 10
and do not merge this change until the following are completed and referenced in
the PR: (1) an approved capacity planning ticket in Linear, (2) documented
FinOps sign-off on the budget impact, (3) an AWS service quota increase
confirmed to support 50 nodes, and (4) verification that private subnets have
sufficient IP addresses; include links or IDs for each approval in the PR
description before attempting any future increase.

arm_min_size = 3
capacity_type = "ON_DEMAND" # Use ON_DEMAND for production

Expand Down