-
Notifications
You must be signed in to change notification settings - Fork 156
Config file driven multiple AWS Databricks workspaces deployment [Issue 187] #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
half way through the review - some minor changes suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should keep this file, but make sure we update the providers to the latest versions - https://developer.hashicorp.com/terraform/language/files/dependency-lock
} | ||
} | ||
``` | ||
You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder. | |
You should separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder. |
|
||
Since we are using CMK (customer managed key) for encryption on root S3 bucket and Databricks managed resources, you also need to provide an AWS IAM ARN for `cmk_admin`. The format will be: `arn:aws:iam::123456:user/xxx`. You need to create this user and assign KMS admin role to it. | ||
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values. | |
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down `client_id` and `client_secret` values. |
workspace_3 = var.workspace_3_config | ||
} | ||
export TF_VAR_aws_account_id=xxxx | ||
export AWS_ACCESS_KEY_ID=your_aws_role_access_key_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's avoid this, tf should just use awscli profile to auth
In the default setting, this template creates one VPC (with one public subnet and one private subnet for hosting VPCEs). Each incoming workspace will add 2 private subnets into this VPC. If you need to create multiple VPCs, you should copy paste the VPC configs and change accordingly, or you can wrap VPC configs into a module, we leave this to you. | ||
|
||
At this step, your workspaces deployment and VPC networking infra should have been successfully deployed and you will have `n` config json files for `n` workspaces deployed, under `/artifacts` folder, to be used in another Terraform project to deploy workspace objects including IP Access List. | ||
Then in root level `variables.tf`, change the region default value to your region. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you add this to the yml config as well?
@@ -0,0 +1,43 @@ | |||
# VPC Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should start with three dashes
# VPC Configuration | |
--- | |
# VPC Configuration |
@@ -0,0 +1,107 @@ | |||
terraform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still suggest customer to configure log delivery? system tables should replace this now as parsing the audit logs is quite time-consuming
deploy_metastore: "false" | ||
existing_metastore_id: "xxxxx-xxxx-xxxx-xxxx-xxxxxx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should combine these into a single flag existing_metastore_id
} | ||
|
||
module "vpc" { | ||
source = "terraform-aws-modules/vpc/aws" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pin the module version?
} | ||
|
||
|
||
resource "aws_iam_role_policy" "cross_account" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we already have data.databricks_aws_crossaccount_policy
Revamping this example of modular AWS DB workspace deployment - we can now define multiple environments with yaml files to deploy m workspaces into n VPCs and create multiple catalogs with isolated underlying infra. All the workspaces are using backend PL and CMK feature; the VPCs are configured with NAT for egress.