-
Notifications
You must be signed in to change notification settings - Fork 156
Databricks on GCP data exfiltration protection workspace deployment #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Databricks on GCP data exfiltration protection workspace deployment #172
Conversation
The commit contains the implementation for the workspace resource group. However this change requires to no more use the local.rg_location variable, since the value is known after the apply, and rhis force the replacement of all of the resources
Most of README files were already defined. TFDocs updated in each of them
@bhavink - wdyt? |
@alexott I do not think on GCP we need traditional hub/spoke based arch. Shared vpc based deployment is a common and popular arch where one could use vpc f/w rules along with vpc sc to prevent data exfiltration. TF support for CMv1 will be available by early March 2025 so may I suggest that we wait for it to be released and then update the GCP specific module? |
I agree about waiting for CMv1 migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds documentation to support the deployment of Databricks on GCP with data exfiltration protection using a Hub & Spoke network architecture while still using the CMv2 architecture.
- Added an example README for provisioning the workspace in the examples directory.
- Introduced a module README that details resource outcomes and the network architecture for the deployment.
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
examples/gcp-with-psc-exfiltration-protection/README.md | New documentation for workspace provisioning using hub & spoke architecture |
modules/gcp-with-psc-exfiltration-protection/README.md | Detailed module documentation including architecture and resource listings |
Most of the values are related to resources managed by Databricks. Values to use be found at: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] There appears to be a grammatical error. Consider rephrasing to something like 'Most values are related to resources managed by Databricks. The required values can be found at: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html'.
Most of the values are related to resources managed by Databricks. Values to use be found at: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html | |
Most values are related to resources managed by Databricks. The required values can be found at: https://docs.gcp.databricks.com/en/resources/ip-domain-region.html |
Copilot uses AI. Check for mistakes.
**REMARK THAT** the module does not contain the VPC SC implementation. This can be added to increase the security level in the Databricks deployment, providing detailed access level for ingress and egress traffic. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The phrasing 'REMARK THAT' can be softened for better readability. Consider using 'Note that' instead.
**REMARK THAT** the module does not contain the VPC SC implementation. This can be added to increase the security level in the Databricks deployment, providing detailed access level for ingress and egress traffic. | |
**Note that** the module does not contain the VPC SC implementation. This can be added to increase the security level in the Databricks deployment, providing detailed access level for ingress and egress traffic. |
Copilot uses AI. Check for mistakes.
@micheledaddetta-databricks can you update the code to use provider >= 1.71 - it includes changes for CMv1 support |
@alexott I'll update it during next week |
Starting from provider version 1.71 CMv1 is supported for Databricks on GCP
@alexott here you can find updated code |
This is an initial implementation. I will enhance it in the future commits in order to include metastore admin assignment, workspace-metastore binding, catalog owner, catalog-workspace binding. In case the module can be built in order to be cloud agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes required, like, update image
depends_on = [ | ||
databricks_storage_credential.this, | ||
databricks_external_location.this | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we don't need this if we we'll use
storage_root = databricks_external_location.this.url
resource "databricks_external_location" "this" { | ||
provider = databricks.workspace | ||
name = "${var.prefix}-external-location" | ||
url = "gs://${google_storage_bucket.ext_bucket.name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I remember, the url
and storage_root
should end with the /
because backend will do a normalization, and this will lead to a permanent configuration drift
@@ -0,0 +1,84 @@ | |||
# Provisioning Databricks on GCP workspace with a Hub & Spoke network architecture for data exfiltration protection | |||
|
|||
This example is using the [gcp-with-psc-exfiltration-protection](../../modules/gcp-with-psc-exfiltration-protection) module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to put a warning at beginning that PSC isn't enabled by default and user should contact databricks team.
@@ -0,0 +1,126 @@ | |||
# Databricks on Google Cloud with Private Service Connect and Hub-Spoke network structure (data exfiltration protection). | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to put a warning at beginning that PSC isn't enabled by default and user should contact databricks team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This picture still shows GKE, we need a version with GCE
The module still uses the CMv2 architecture. When the CMv1 architecture will be released and supported by Terraform provider, the implementation will be reviewed