Azure Databricks and Terraform: Create a Cluster and PAT Token

My starting point for a recent bit of work was to try and reliably and simply deploy and manage Databricks clusters in Azure. Terraform was already in use so I set about trying to see how I could use that to also manage Databricks.

I had a look around and after trying the Terraform REST provider and a third party Datbricks provider (didn’t have much luck with either) found a Terraform Shell provider. This turned out to be exactly what I needed.

If you haven’t written a Terraform provider here’s a crash course. You basically just define a method for create, read, update and delete and the parameters they take. Then Terraform does the rest.

The Shell provider (https://github.com/scottwinkler/terraform-provider-shell) lets you do this by passing in scripts (bash, powershell, any executable that can take stdin and output stdout). In this case I wrote some powershell to wrap the databricks-cli.

It’s better (or different) to localexec with nullresources as you can store information in the Terraform State and detect drift. If a read returns different information than the current information in the state then update will be called, for example.

So I took the work of Alexandre and wrapped it into this provider and using the Shell provider have a simple, no frills Databricks provider for Terraform which makes calls to Databricks via the databricks-cli.

This is currently a simple hack and hasn’t undergone any significant testing: https://github.com/lawrencegripper/hack-databricksterraform. The flow is as follows:

Hopefully this might be useful to others as a starting point for others.


Generate docker images of specific size

For some testing I’m doing I need a set of images of a specific size to simulate pulling larger vs smaller image.

Here is a quick script I put together for generating a 200mb, 600mb, 1000mb and 2000mb image (tiny bit larger as alpine included). Took a while to work out best to use /dev/urandom not /dev/zero as with zero the images got compressed for transfer.

set -e
set -x
# Push 200mb image
dd if=/dev/urandom of=./file.bin bs=1M count=200
docker build -t lawrencegripper/big:200mb .
docker push lawrencegripper/big:200mb
rm ./file.bin
# Push 600mb image
dd if=/dev/urandom of=./file.bin bs=1M count=600
docker build -t lawrencegripper/big:600mb .
docker push lawrencegripper/big:600mb
rm ./file.bin
# Push 1000mb image
dd if=/dev/urandom of=./file.bin bs=1M count=1000
docker build -t lawrencegripper/big:1000mb .
docker push lawrencegripper/big:1000mb
rm ./file.bin
# Push 2000mb image
dd if=/dev/urandom of=./file.bin bs=1M count=2000
docker build -t lawrencegripper/big:2000mb .
docker push lawrencegripper/big:2000mb
rm ./file.bin
view raw 1_gen_image_sizes.sh hosted with ❤ by GitHub
FROM alpine
COPY ./file.bin .
view raw 2_Dockerfile hosted with ❤ by GitHub
Coding, How to, kubernetes

How to build Kubernetes from source and test in Kind with VSCode & devcontainers

I found myself last week looking at a bit of code in K8s which I thought I could make better, so I set about trying to understand how to clone, change and test it.

Luckily K8s has some good docs, trust these over me as they’re a great guide. This blog is more of a brain dump of how I got on trying with Devcontainers and VSCode. This is my first try at this so I’ve likely got lots of things wrong.

Roughly I knew what I needed as I’d heard about:

  1. Bazel for the Kubernetes build
  2. Kind to run a cluster locally
  3. Kubetest for running end2end tests

As the K8s build and testing cycle can use up quite a bit of machine power I didn’t want be doing all this on my laptop and ideally I wanted to capture all the setup in a nice repeatable way.

Enter DevContainers for VSCode, I’ve got them setup on my laptop to actually run on a meaty server for me and I can use them to capture all the setup requirements for building K8s.

Continue reading