Managed.hopsworks.ai is our managed platform for running Hopsworks and the Feature Store in the cloud. It integrates seamlessly with third-party platforms such as Databricks, SageMaker and KubeFlow. This guide shows how to set up managed.hopsworks.ai with your organization's Azure account.
To follow the instruction on this page you will need the following:
- An Azure resource group in which the Hopsworks cluster will be deployed.
- The azure CLI installed and logged in.
To run all the commands on this page the user needs to have at least the following permissions on the Azure resource group:
Microsoft.Authorization/roleDefinitions/write
Microsoft.Authorization/roleAssignments/write
Microsoft.Compute/sshPublicKeys/generateKeyPair/action
Microsoft.Compute/sshPublicKeys/read
Microsoft.Compute/sshPublicKeys/write
Microsoft.ContainerRegistry/registries/operationStatuses/read
Microsoft.ContainerRegistry/registries/read
Microsoft.ContainerRegistry/registries/write
Microsoft.ManagedIdentity/userAssignedIdentities/write
Microsoft.Resources/subscriptions/resourcegroups/read
Microsoft.Storage/storageAccounts/write
You will also need to have a role such as Application Administrator on the Azure Active Directory to be able to create the hopsworks.ai service principal.
For managed.hopsworks.ai to deploy a cluster the following resource providers need to be registered on your Azure subscription.
Microsoft.Network
Microsoft.Compute
Microsoft.Storage
Microsoft.ManagedIdentity
Microsoft.ContainerRegistry
This can be done by running the following commands:
!!!note To run these commands you need to have the following permission on your subscription: Microsoft.Network/register/action
az provider register --namespace 'Microsoft.Network'
az provider register --namespace 'Microsoft.Compute'
az provider register --namespace 'Microsoft.Storage'
az provider register --namespace 'Microsoft.ManagedIdentity'
az provider register --namespace 'Microsoft.ContainerRegistry'
All the commands have been written for a Unix system. These commands will need to be adapted to your terminal if it is not directly compatible.
All the commands use your default location. Add the --location parameter if you want to run your cluster in another location. Make sure to create the resources in the same location as you are going to run your cluster.
Managed.hopsworks.ai deploys Hopsworks clusters to your Azure account. To enable this, you have to create a service principal and a custom role for managed.hopsworks.ai granting access to your resource group.
<iframe title="Azure information video" style="width:700px; height: 370px;" src="https://www.youtube.com/embed/Pfx2b3UTt88" frameBorder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowFullScreen > </iframe>
In managed.hopsworks.ai click on Connect to Azure or go to Settings and click on Configure next to Azure. This will direct you to a page with the instructions needed to create the service principal and set up the connection. Follow the instructions.
!!! note it is possible to limit the permissions that are set up during this phase. For more details see restrictive-permissions.
Cloud account settings
!!! note If you prefer using terraform, you can skip this step and the remaining steps, and instead, follow this guide.
The Hopsworks clusters deployed by managed.hopsworks.ai store their data in a storage container in your Azure account. To enable this you need to create a storage account. This is done by running the following command, replacing $RESOURCE_GROUP with the name of your resource group.
az storage account create --resource-group $RESOURCE_GROUP --name hopsworksstorage$RANDOM
The Hopsworks clusters deployed by managed.hopsworks.ai store their docker images in a container registry in your Azure account. To create this storage account run the following command, replacing $RESOURCE_GROUP with the name of your resource group.
az acr create --resource-group $RESOURCE_GROUP --name hopsworksecr --sku Premium
To prevent the registry from filling up with unnecessary images and artifacts you can enable a retention policy. A retention policy will automatically remove untagged manifests after a specified number of days. To enable a retention policy, run the following command, replacing $RESOURCE_GROUP with the name of your resource group.
az acr config retention update --resource-group $RESOURCE_GROUP --registry hopsworksecr --status Enabled --days 7 --type UntaggedManifests
To allow the hopsworks cluster instances to access the storage account and the container registry, managed.hopsworks.ai assigns a managed identity to the cluster nodes. To enable this you need to:
- Create a managed identity
- Create a role with appropriate permission and assign it to the managed identity
You create a managed identity by running the following command, replacing $RESOURCE_GROUP with the name of your resource group.
identityId=$(az identity create --name hopsworks-instance --resource-group $RESOURCE_GROUP --query principalId -o tsv)
To create a new role for the managed identity, first, create a file called instance-role.json with the following content. Replace SUBSCRIPTION_ID by your subscription id and RESOURCE_GROUP by your resource group
{
"Name": "hopsworks-instance",
"IsCustom": true,
"Description": "Allow the hopsworks instance to access the storage and the docker repository",
"Actions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/read",
"Microsoft.Storage/storageAccounts/blobServices/write",
"Microsoft.Storage/storageAccounts/blobServices/read",
"Microsoft.Storage/storageAccounts/listKeys/action",
"Microsoft.ContainerRegistry/registries/artifacts/delete",
"Microsoft.ContainerRegistry/registries/pull/read",
"Microsoft.ContainerRegistry/registries/push/write"
],
"NotActions": [
],
"DataActions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write"
],
"AssignableScopes": [
"/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP"
]
}
Then run the following command, to create the new role.
az role definition create --role-definition instance-role.json
Finally assign the role to the managed identity by running the following command, replacing $RESOURCE_GROUP with the name of your resource group.
az role assignment create --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP --role hopsworks-instance --assignee $identityId
!!!note It takes several minutes between the time you create the managed identity and the time a role can be assigned to it. So if we get an error message starting by the following wait and retry: Cannot find user or service principal in graph database
When deploying clusters, managed.hopsworks.ai installs an ssh key on the cluster's instances so that you can access them if necessary. For this purpose, you need to add an ssh key to your resource group.
To create an ssh key in your resource group run the following command, replacing $RESOURCE_GROUP with the name of your resource group.
az sshkey create --resource-group $RESOURCE_GROUP --name hopsworksKey
!!!note the command returns the path to the private and public keys associated with this ssh key. You can also create a key from an existing public key as indicated in the Azure documentation
In managed.hopsworks.ai, select Create cluster:
Create a Hopsworks cluster
Select the Resource Group (1) in which you created your storage account and managed identity (see above).
!!! note If the Resource Group does not appear in the drop-down, make sure that the custom role you created in step 1.1 has the Microsoft.Resources/subscriptions/resourceGroups/read permission and is assigned to the hopsworks.ai user.
Name your cluster (2). Your cluster will be deployed in the Location of your Resource Group (3).
Select the Instance type (4) and Local storage (5) size for the cluster Head node.
Check if you want to Use customer-managed encryption key (6)
Select the storage account (7) you created above in Azure Storage account name. The name of the container in which the data will be stored is displayed in Azure Container name (8), you can modify it if needed.
!!! note You can choose to use a container already existing in your storage account by using the name of this container, but you need to first make sure that this container is empty.
Enter the Azure container registry name (9) of the ACR registry created in Step 3.1
Press Next:
General configuration
Select the number of workers you want to start the cluster with (2). Select the Instance type (3) and Local storage size (4) for the worker nodes.
!!! note It is possible to add or remove workers or to enable autoscaling once the cluster is running.
Press Next:
Create a Hopsworks cluster, static workers configuration
Select the SSH key that you want to use to access cluster instances:
Choose SSH key
Select the User assigned managed identity that you created above:
Choose the User assigned managed identity
To backup the Azure blob storage data when taking a cluster backup we need to set a retention policy for the blob storage. You can deactivate the retention policy by setting this value to 0 but this will block you from taking any backup of your cluster. Choose the retention period in days and click on Review and submit.
Choose the backup retention policy
Review all information and select Create:
Review cluster information
!!! note We skipped cluster creation steps that are not mandatory. You can find more details about these steps here
The cluster will start. This will take a few minutes:
Booting Hopsworks cluster
As soon as the cluster has started, you will be able to log in to your new Hopsworks cluster. You will also be able to stop, restart or terminate the cluster.
Running Hopsworks cluster
Check out our other guides for how to get started with Hopsworks and the Feature Store:
- Make Hopsworks services accessible from outside services
- Get started with the Hopsworks Feature Store{:target="_blank"}
- Follow one of our tutorials
- Follow one of our Guide
- Code examples and notebooks: hops-examples