Skip to content

Config file driven multiple AWS Databricks workspaces deployment [Issue 187] #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

hwang-db
Copy link
Contributor

Revamping this example of modular AWS DB workspace deployment - we can now define multiple environments with yaml files to deploy m workspaces into n VPCs and create multiple catalogs with isolated underlying infra. All the workspaces are using backend PL and CMK feature; the VPCs are configured with NAT for egress.

@hwang-db hwang-db requested review from a team as code owners April 24, 2025 11:47
Copy link
Collaborator

@nkvuong nkvuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

half way through the review - some minor changes suggested

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should keep this file, but make sure we update the providers to the latest versions - https://developer.hashicorp.com/terraform/language/files/dependency-lock

}
}
```
You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder.
You should separate out these 2 pipelines into different projects, instead of keeping everything in the same repo folder.


Since we are using CMK (customer managed key) for encryption on root S3 bucket and Databricks managed resources, you also need to provide an AWS IAM ARN for `cmk_admin`. The format will be: `arn:aws:iam::123456:user/xxx`. You need to create this user and assign KMS admin role to it.
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down client_id and client_secret values.
> Step 1.1: Manually create a service principal with account admin role on Account Console, generate client secret; and note down `client_id` and `client_secret` values.

workspace_3 = var.workspace_3_config
}
export TF_VAR_aws_account_id=xxxx
export AWS_ACCESS_KEY_ID=your_aws_role_access_key_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid this, tf should just use awscli profile to auth

In the default setting, this template creates one VPC (with one public subnet and one private subnet for hosting VPCEs). Each incoming workspace will add 2 private subnets into this VPC. If you need to create multiple VPCs, you should copy paste the VPC configs and change accordingly, or you can wrap VPC configs into a module, we leave this to you.

At this step, your workspaces deployment and VPC networking infra should have been successfully deployed and you will have `n` config json files for `n` workspaces deployed, under `/artifacts` folder, to be used in another Terraform project to deploy workspace objects including IP Access List.
Then in root level `variables.tf`, change the region default value to your region.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you add this to the yml config as well?

@@ -0,0 +1,43 @@
# VPC Configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should start with three dashes

Suggested change
# VPC Configuration
---
# VPC Configuration

@@ -0,0 +1,107 @@
terraform {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still suggest customer to configure log delivery? system tables should replace this now as parsing the audit logs is quite time-consuming

Comment on lines +32 to +33
deploy_metastore: "false"
existing_metastore_id: "xxxxx-xxxx-xxxx-xxxx-xxxxxx"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should combine these into a single flag existing_metastore_id

}

module "vpc" {
source = "terraform-aws-modules/vpc/aws"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pin the module version?

}


resource "aws_iam_role_policy" "cross_account" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we already have data.databricks_aws_crossaccount_policy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants