[Update Feb 2021] There is now a Terraform Provider for Databricks, it’s a better route – https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs
My starting point for a recent bit of work was to try and reliably and simply deploy and manage Databricks clusters in Azure. Terraform was already in use so I set about trying to see how I could use that to also manage Databricks.
I had a look around and after trying the Terraform REST provider and a third party Datbricks provider (didn’t have much luck with either) found a Terraform Shell provider. This turned out to be exactly what I needed.
If you haven’t written a Terraform provider here’s a crash course. You basically just define a method for
delete and the parameters they take. Then Terraform does the rest.
The Shell provider (https://github.com/scottwinkler/terraform-provider-shell) lets you do this by passing in scripts (bash, powershell, any executable that can take stdin and output stdout). In this case I wrote some
powershell to wrap the
It’s better (or different) to
nullresources as you can store information in the Terraform State and detect drift. If a
read returns different information than the current information in the state then
update will be called, for example.
So I took the work of Alexandre and wrapped it into this provider and using the Shell provider have a simple, no frills Databricks provider for Terraform which makes calls to Databricks via the
This is currently a simple hack and hasn’t undergone any significant testing: https://github.com/lawrencegripper/hack-databricksterraform. The flow is as follows:
Hopefully this might be useful to others as a starting point for others.