Azure Databricks and Terraform: Create a Cluster and PAT Token

[Update Feb 2021] There is now a Terraform Provider for Databricks, it’s a better route – https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs

My starting point for a recent bit of work was to try and reliably and simply deploy and manage Databricks clusters in Azure. Terraform was already in use so I set about trying to see how I could use that to also manage Databricks.

I had a look around and after trying the Terraform REST provider and a third party Datbricks provider (didn’t have much luck with either) found a Terraform Shell provider. This turned out to be exactly what I needed.

If you haven’t written a Terraform provider here’s a crash course. You basically just define a method for create, read, update and delete and the parameters they take. Then Terraform does the rest.

The Shell provider (https://github.com/scottwinkler/terraform-provider-shell) lets you do this by passing in scripts (bash, powershell, any executable that can take stdin and output stdout). In this case I wrote some powershell to wrap the databricks-cli.

It’s better (or different) to localexec with nullresources as you can store information in the Terraform State and detect drift. If a read returns different information than the current information in the state then update will be called, for example.

So I took the work of Alexandre and wrapped it into this provider and using the Shell provider have a simple, no frills Databricks provider for Terraform which makes calls to Databricks via the databricks-cli.

This is currently a simple hack and hasn’t undergone any significant testing: https://github.com/lawrencegripper/hack-databricksterraform. The flow is as follows:

Hopefully this might be useful to others as a starting point for others.


2 thoughts on “Azure Databricks and Terraform: Create a Cluster and PAT Token

  1. Upendo says:

    Hi Lawrence,

    Saw you post in github on Deploy a Nextflow genomics cluster and tried to follow your instructions but am getting two errors. 2 deployments failed the jumpboxVm and cluster and the error said is “The requested vm size not available in the current region”. Tried several attempts by changing region and size but with no lack. Can you assist . Am using the free $200 credit by azure so not sure if that is a challenge. In additional is it possible to get extra free credit cause i need to test my pipelines before opting it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s