Skip to content

Added GPU operator support for ap-northeast-2 and ca-central-1 #312

Closed
Sean1783 wants to merge 2 commits intoaws:mainfrom
Sean1783:gpu-operator-mig-public
Closed

Added GPU operator support for ap-northeast-2 and ca-central-1 #312
Sean1783 wants to merge 2 commits intoaws:mainfrom
Sean1783:gpu-operator-mig-public

Conversation

@Sean1783
Copy link
Collaborator

What's changing and why?

Adding support for (2) additional regions for the GPU operator Helm chart.

Before/After UX

Before: No support for ICN and YUL.

After:
This change adds support for ap-northeast-2 and ca-central-1 for the GPU operator.

How was this change tested?

helm upgrade dependencies helm_chart/HyperPodHelmChart \
--set gpu-operator.enabled=true \
--set nvidia-device-plugin.devicePlugin.enabled=false \
-f helm_chart/HyperPodHelmChart/ecr-values/values-{$AWS_REGION}.yaml \
-n kube-system
helm list -A
kubectl get pods -n kube-system
kubectl describe pod $GPU_OPERATOR_COMPONENT_POD -n kube-system
kubectl describe node -n kube-system

Are unit tests added?

No

Are integration tests added?

No

Reviewer Guidelines

‼️ Merge Requirements: PRs with failing integration tests cannot be merged without justification.

One of the following must be true:

  • All automated PR checks pass
  • Failed tests include local run results/screenshots proving they work
  • Changes are documentation-only

@Sean1783 Sean1783 requested a review from a team as a code owner November 18, 2025 03:28
@Sean1783 Sean1783 closed this Nov 18, 2025
aws-brianxia added a commit to aws-brianxia/sagemaker-hyperpod-cli that referenced this pull request Feb 13, 2026
* feat: Implement elastic training cli arguments (aws#273)

* feat: Implement elastic training cli arguments

* Add elastic training unified config and unit test

* Add graceful shutdown and scaling timeout to cli args

* Revert "feat: Implement elastic training cli arguments (aws#273)"

This reverts commit 18428ef.

* Remove space CLI error traceback

---------

Co-authored-by: Sophia <yungwenh@amazon.com>
Co-authored-by: Molly He <mollyhe@amazon.com>

---

Implement port forwarding for space (aws#312)

---

Implement MIG profile validation for spaces (aws#315)
mollyheamazon pushed a commit that referenced this pull request Feb 25, 2026
* feat: Implement elastic training cli arguments (#273)

* feat: Implement elastic training cli arguments

* Add elastic training unified config and unit test

* Add graceful shutdown and scaling timeout to cli args

* Revert "feat: Implement elastic training cli arguments (#273)"

This reverts commit 18428ef2b1c0562bf51a9a4b4aa2914eed441259.

* Remove space CLI error traceback

---------

Co-authored-by: Sophia <yungwenh@amazon.com>
Co-authored-by: Molly He <mollyhe@amazon.com>

---

Implement port forwarding for space (#312)

---

Implement MIG profile validation for spaces (#315)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant