Merge Documentation changes to main for Launch by rsareddy0329 · Pull Request #196 · aws/sagemaker-hyperpod-cli

rsareddy0329 · 2025-08-06T00:04:22Z

PR to merge all the documentation change to main branch for public launch

PR Approval Steps

For Requester

Description
- Check the PR title and description for clarity. It should describe the changes made and the reason behind them.
- Ensure that the PR follows the contribution guidelines, if applicable.
Security requirements
- Ensure that a Pull Request (PR) does not expose passwords and other sensitive information by using git-secrets and upload relevant evidence: https://github.com/awslabs/git-secrets
- Ensure commit has GitHub Commit Signature
Manual review
1. Click on the Files changed tab to see the code changes. Review the changes thoroughly:
  - Code Quality: Check for coding standards, naming conventions, and readability.
  - Functionality: Ensure that the changes meet the requirements and that all necessary code paths are tested.
  - Security: Check for any security issues or vulnerabilities.
  - Documentation: Confirm that any necessary documentation (code comments, README updates, etc.) has been updated.
Check for Merge Conflicts:
- Verify if there are any merge conflicts with the base branch. GitHub will usually highlight this. If there are conflicts, you should resolve them.

For Reviewer

Go through For Requester section to double check each item.
Request Changes or Approve the PR:
1. If the PR is ready to be merged, click Review changes and select Approve.
2. If changes are required, select Request changes and provide feedback. Be constructive and clear in your feedback.
Merging the PR
1. Check the Merge Method:
  1. Decide on the appropriate merge method based on your repository's guidelines (e.g., Squash and merge, Rebase and merge, or Merge).
2. Merge the PR:
  1. Click the Merge pull request button.
  2. Confirm the merge by clicking Confirm merge.

… main (#190) * Fix training test (#184) * Fix SDK training test: Add wait time before refresh * Fix training tests in canaries * Update logging information for submitting and deleting training job (#189) Co-authored-by: pintaoz <pintaoz@amazon.com> --------- Co-authored-by: Zhaoqi <zhaoqiwang.baruch@gmail.com> Co-authored-by: pintaoz-aws <167920275+pintaoz-aws@users.noreply.github.com> Co-authored-by: pintaoz <pintaoz@amazon.com>

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

* Fix training test (#184) * Fix SDK training test: Add wait time before refresh * Fix training tests in canaries * Update logging information for submitting and deleting training job (#189) Co-authored-by: pintaoz <pintaoz@amazon.com> --------- Co-authored-by: Zhaoqi <zhaoqiwang.baruch@gmail.com> Co-authored-by: pintaoz-aws <167920275+pintaoz-aws@users.noreply.github.com> Co-authored-by: pintaoz <pintaoz@amazon.com>

* Documentation Fixes * Documentation Fixes --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

mollyheamazon · 2025-08-06T00:15:11Z

.gitignore

 /.mypy_cache

 /doc/_apidoc/
+doc/_build/


Does this needs to be /doc/_build/ here?

This is mainly to make sure _build is ignored in git Version control system

mollyheamazon · 2025-08-06T00:19:50Z

doc/installation.md

+source {venv-name}/bin/activate
+```
+```{note}
+Remember to activate your virtual environment (source {venv-name}/bin/activate) each time you want to use the HyperPod CLI and SDK if you chose the virtual environment installation method.


Add code quote around source {venv-name}/bin/activate

mollyheamazon · 2025-08-06T00:21:05Z

doc/training.md

+    --image pytorch/pytorch:latest \
+```
+````
+````{tab-item} SDK


Is SDK code keeping parity with CLI here?

This will be a fast-follow item

mollyheamazon · 2025-08-06T00:22:32Z

doc/inference.md

+```
+````
+
+````{tab-item} SDK


Seems like SDK code here is still using some optional variables

mollyheamazon · 2025-08-06T00:22:59Z

doc/inference.md

+````
+
+````{tab-item} SDK
+```python


Need to update SDK code here too

mollyheamazon · 2025-08-06T00:23:37Z

doc/inference.md

+# Custom endpoint
+hyp list-pods hyp-custom-endpoint
+```
+````


Missing SDK code here

mollyheamazon · 2025-08-06T00:23:41Z

doc/inference.md

+# Custom endpoint
+hyp get-logs hyp-custom-endpoint --pod-name <pod-name>
+```
+````


Missing SDK code here

mollyheamazon · 2025-08-06T00:25:26Z

doc/cli_training.md

+
+List all HyperPod PyTorch jobs in a namespace.
+
+#### Syntax


Seems like Syntax is even bigger then hyp list hyp-pytorch-job, not sure why the rendering is like that

yup, mainly CSS changes required.
would be a fast follow as well.

mollyheamazon · 2025-08-06T00:28:19Z

doc/index.md

+:::
+
+:::{grid-item-card} HyperPod Developer Guide
+:link: https://catalog.workshops.aws/sagemaker-hyperpod-eks/en-US


Link seems to be the same as the workshop. Maybe needs an update?

yes, checking with Shweta on this.

* Documentation Fixes * Documentation Fixes * Documentation Fixes * Documentation Fixes --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

nargokul · 2025-08-06T18:26:55Z

doc/inference.md

+When creating an inference endpoint, you'll need to specify:
+
+- **endpoint-name**: Unique identifier for your endpoint
+- **instance-type**: The EC2 instance type to use
+- **model-id** (JumpStart): ID of the pre-trained JumpStart model
+- **image-uri** (Custom): Docker image containing your inference code
+- **model-name** (Custom): Name of model to create on SageMaker
+- **model-source-type** (Custom): Source type: fsx or s3
+- **model-volume-mount-name** (Custom): Name of the model volume mount
+- **container-port** (Custom): Port on which the model server listens


Can we separate this into 2

Parameters required for Jumpstart

Parameters required for Custom

nargokul · 2025-08-06T18:28:18Z

doc/installation.md

+### Supported ML Frameworks
+- PyTorch (version ≥ 1.10)


Nit: Supported ML Frameworks for Training maybe

nargokul · 2025-08-06T18:29:07Z

test/integration_tests/training/cli/test_cli_training.py

-    def test_set_cluster_context(self, cluster_name):
-        """Test setting cluster context."""
-        result = execute_command([
-            "hyp", "set-cluster-context",
-            "--cluster-name", cluster_name
-        ])
-        assert result.returncode == 0
-        context_line = result.stdout.strip().splitlines()[-1]
-        assert any(text in context_line for text in ["Updated context", "Added new context"])
-


Is this change needed ?

Looks like this change is from other commits. Can you rebase to main to clean it up?

I merged in the latest changes from main and this change is shown up as diff. Change is from this PR: https://github.com/aws/sagemaker-hyperpod-cli/pull/184/files

* Documentation Fixes * Documentation Fixes * Documentation Fixes * Documentation Fixes * Documentation Fixes --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

* Documentation Fixes * Documentation Fixes * Documentation Fixes * Documentation Fixes * Documentation Fixes * Documentation Fixes --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

rsareddy0329 and others added 4 commits August 5, 2025 16:00

Documentation Fixes (#191)

a615b61

Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

Documentation Fixes (#195)

a98202a

* Documentation Fixes * Documentation Fixes --------- Co-authored-by: Roja Reddy Sareddy <rsareddy@amazon.com>

rsareddy0329 requested a review from a team as a code owner August 6, 2025 00:04

rsareddy0329 had a problem deploying to manual-approval August 6, 2025 00:04 — with GitHub Actions Error