-
Notifications
You must be signed in to change notification settings - Fork 2.9k
feat(tpu): add create/delete/get samples #9585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Here is the summary of changes. You are about to add 3 region tags.
This comment is generated by snippet-bot.
|
tpu/src/test/java/tpu/TpuVmIT.java
Outdated
| @AfterAll | ||
| public static void cleanup() throws Exception { | ||
| DeleteTpuVm.deleteTpuVm(PROJECT_ID, ZONE, TPU_VM_NAME); | ||
| TimeUnit.MINUTES.sleep(5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really takes 5 minutes to delete a TPU VM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends :) This is long term operation. I've got error and after set timeout it disappeared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented OperationTimedPollAlgorithm with RetrySettings for TpuClient which permits to call delete method and don't use any timeouts.
tpu/src/test/java/tpu/TpuVmIT.java
Outdated
| static String javaVersion = System.getProperty("java.version").substring(0, 2); | ||
| private static final String TPU_VM_NAME = "test-tpu-" + javaVersion + "-" | ||
| + UUID.randomUUID().toString().substring(0, 8); | ||
| private static final String ACCELERATOR_TYPE = "v5litepod-1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have access to v5lite? Can we rather use some older and cheaper TPU model for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed accelerator type and version.
tpu/src/test/java/tpu/TpuVmIT.java
Outdated
| String creationTime = formatTimestamp(node.getCreateTime()); | ||
| String name = node.getName().substring(node.getName().lastIndexOf("/") + 1); | ||
| if (containPrefixToDeleteAndZone(node, prefixToDelete, zone) | ||
| && isCreatedBeforeThresholdTime(creationTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the test case takes < 5 minutes, do we just not delete the VM we created? Can we guarantee there won't be a TPU VM left running after the test runs once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Fixed time to 30 minutes. After implementing all samples for TPU we can set it to 24 hours for example.
|
@TetyanaYahodska Please ping for review post @m-strzelczyk's LGTM. |
Description
Sample in Python feat: TPU *Create / *get / *create_with_script / *delete Samples by Thoughtseize1 · Pull Request #12690 · GoogleCloudPlatform/python-docs-samples
Documentation - https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm
Fixes #
Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.
Checklist
pom.xmlparent set to latestshared-configurationmvn clean verifyrequiredmvn -P lint checkstyle:checkrequiredmvn -P lint clean compile pmd:cpd-check spotbugs:checkadvisory only