You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* updated instance cound environment variables
* updated message for IAM execution role creation
* added check_jq function
* removed old todos
* updated order of hyperpod cluster config message
* updated hyperpod cluster stack to conditionally disable deep health checks
* put S3 endpoint into separate cfn stack
* updated helm chart injector to use kube-system namespace
* syntax fix in lambda function
* enabled pathrough of existing resource ids from tmp_env_vars to env_vars
* fixed execution role stack boolean variable and security group stack display
* bump k8s version to 1.31
* breaking ground on terraform support
* adding boilerplate module files
* added child modules and default values in root
* made output and variable corrections
* bug fixes on helm chart and eks auth mode
* Remove .terraform.lock.hcl file
* Added .terraform.lock.hcl to gitignore
* remane parent directory hyperpod-eks-tf
* added readme and env vars script
* code tidy after testing
* Update 1.architectures/7.sagemaker-hyperpod-eks/terraform-modules/README.md
Co-authored-by: Keita Watanabe <mlkeita@amazon.com>
---------
Co-authored-by: Keita Watanabe <mlkeita@amazon.com>
The diagram below depicts the Terraform modules that have been bundled into a single project to enable you to deploy a full HyperPod cluster environment all at once.
6
+
7
+
<imgsrc="./smhp_tf_modules.png"width="50%"/>
8
+
9
+
## Configuration
10
+
Start by reviewing the default configurations in the `terraform.tfvars` file and make modifications as needed to suit your needs.
11
+
12
+
```bash
13
+
vim hyperpod-eks-tf/terraform.tfvars
14
+
```
15
+
For example, you may want to add or modify the HyperPod instance groups to be created:
16
+
```
17
+
instance_groups = {
18
+
group1 = {
19
+
instance_type = "ml.g5.8xlarge"
20
+
instance_count = 8
21
+
ebs_volume_size = 100
22
+
threads_per_core = 2
23
+
enable_stress_check = true
24
+
enable_connectivity_check = true
25
+
lifecycle_script = "on_create.sh"
26
+
}
27
+
}
28
+
```
29
+
If you wish to reuse any cloud resources rather than creating new ones, set the associated `create_*` variable to `false` and provide the id for the corresponding resource as the value of the `existing_*` variable.
30
+
31
+
For example, if you want to reuse an existing VPC, set `create_vpc ` to `false`, then set `existing_vpc_id` to your VPC ID, like `vpc-1234567890abcdef0`.
32
+
33
+
## Deployment
34
+
Run `terraform init` to initialize the Terraform working directory, install necessary provider plugins, download modules, set up state storage, and configure the backend for managing infrastructure state:
35
+
36
+
```bash
37
+
terraform -chdir=hyperpod-eks-tf init
38
+
```
39
+
Run `terraform plan` to generate and display an execution plan that outlines the changes Terraform will make to your infrastructure, allowing you to review and validate the proposed updates before applying them.
40
+
41
+
```bash
42
+
terraform -chdir=hyperpod-eks-tf plan
43
+
```
44
+
Run `terraform apply` to execute the proposed changes outlined in the Terraform plan, creating, updating, or deleting infrastructure resources according to your configuration, and updating the state to reflect the new infrastructure setup.
45
+
46
+
```bash
47
+
terraform -chdir=hyperpod-eks-tf apply
48
+
```
49
+
When prompted to confirm, type `yes` and press enter.
50
+
51
+
You can also run `terraform apply` with the `-auto-approve` flag to avoid being prompted for confirmation, but use with caution to avoid unintended changes to your infrastructure.
52
+
53
+
## Environment Variables
54
+
Run the `terraform_outputs.sh` script, which populates the `env_vars.sh` script with your environment variables for future reference:
55
+
```bash
56
+
chmod +x terraform_outputs.sh
57
+
./terraform_outputs.sh
58
+
cat env_vars.sh
59
+
```
60
+
Source the `env_vars.sh` script to set your environment variables:
61
+
```bash
62
+
source env_vars.sh
63
+
```
64
+
Verify that your environment variables are set:
65
+
```bash
66
+
echo$EKS_CLUSTER_NAME
67
+
echo$PRIVATE_SUBNET_ID
68
+
echo$SECURITY_GROUP_ID
69
+
```
70
+
71
+
## Clean Up
72
+
73
+
Before cleaning up, validate the changes by running a speculative destroy plan:
74
+
75
+
```bash
76
+
terraform -chdir=hyperpod-eks-tf plan -destroy
77
+
```
78
+
79
+
Once you've validated the changes, you can proceed to destroy the resources:
0 commit comments