Coding, How to

K8s Operator with dynamic CRDs using controller runtime (no structs)

Warning: This is a bit of a brain dump.

I’m working on a project at the moment which dynamically creates a set of CRDs in Kubernetes and an operator to manage them based off a schema which is provided by the user at runtime.

When the code is being built it doesn’t know the schema/shape of the CRDs. This means the standard approach used in Kubebuilder with controller-gen isn’t going to work.

Now, for those that haven’t played with Kubebuilder it’s gives you a few super useful things to build a K8s operator in Go:

  1. Controller-gen creates all the structs, templated controllers and keeps all those type files in sync for you. So you change a CRDs Struct and the CRD Yaml spec is updated etc. These are all build time tools so we can’t use em.
  2. A nice abstraction around how to interact with K8s as a controller – The controller-runtime. As the name suggests we can use this one at runtime.

So while we can’t use the build time controller-gen we can still use all the goodness of the controller-runtime. In theory.

This is where the fun came in, there aren’t any docs on interacting with a dynamic/unstructured object type using the controller runtime so I did a bit of playing around.

(Code samples for illustration – if you want end2end running example skip to the bottom).

To get started on this journey we need a helping hand. Kuberentes has an API for working which objects which don’t have a Golang struct defined. This is how we can start: Lets check out the go docs for unstructured..

"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"

Ok so this gives us some nice ways to work with a CRD which doesn’t have a struct defined.

To use this meaningfully we’re going to have to tell it the type it represents – In K8s this means telling it it’s Group, Version and Kind. These are all wrapped up nicely in the schema.GroupVersionKind struct. Lets look at the docs:

"k8s.io/apimachinery/pkg/runtime/schema"

Great so hooking these two up together we can create an Unstructured instance that represents a CRD, like so!

Cool, so what can we do from here? Well the controller runtime uses the runtime.object interface for all it’s interactions and guess what we have now? Yup a runtime.Object.. wrapper method to make things obvious

Well now we can create an instance of the controller for our unstructured CRD.

Notice that I’m passing the GroupVersionKind into the controller struct – this will be useful when we come to make changes to a CRD we’re handling.

In the same way that you can use the r.Client on the controller in Kubebuilder you can now use it with the unstructured resource. We use the gvk again here to set the type so that the client knows how to work with it.

Now you might be thinking – wow isn’t it going to be painful working without the strongly typed CRD structs?

Yes it’s harder but there are some helper methods in the unstructured api which make things much easier. For example, the following let you easily retrieve or set a string which is nested inside it.

unstructured.NestedString(resource.Object, "status", "_tfoperator", "tfState")

unstructured.SetNestedField(resource.Object, string(serializedState), "status", "_tfoperator", "tfState")

Here is the end result hooking up the controller runtime to a set of dynamically created and managed CRDS. It’s very much a work in progress and I’d love feedback if there are easier ways to tackle this or things that I’ve got wrong.

Standard
Uncategorized

Terraform, Azure VPN Gateway and OpenVPN Config

Recently I needed a quick way to spin-up a VPN Gateway and generate the openvpn config clients could use to connect.

There is a good guide to generating the necessary certificates and manually editing the openvpn config you can download from the portal in the official docs.

Being a sucker for punishment I wondered if I could automate the process (mainly because I always forget the openssl commands) and secondly it was to be run by someone else (not me) so I wanted it to be as simple as possible.

Warning: Please take time to understand the limitations of this certificate generation before production use (TF State file containing CA private keys) and review the limitations in the TLS provider for Terraform.

So how would this work? First choice is Terraform for the automation. Luckily that has a provider for generating certs! So the flow goes like this:

1. Create the Root CA
2. Use that to generate a client cert
3. Output the client cert for use in the openvpn config file
4. Inject the CA into the Azure VPN configuration and create it
5. Run a script to fetch the Azure VPN OpenVPN configuration file (as this contains the Pre-shared key we don’t set) then inject the client cert we outputted from the Terraform.
6. Connect

One gotcha – The TLS provider only outputs standard pems. Azure VPN requires the CA to be in a specific format outputted by the following command.

openssl x509 -in caCert.pem -outform der | base64 -w0 > caCert.der

As a result there is a null_resource and local_file resource to handle this translation.

So all you should need is Terraform, Bash, OpenSSL and Azure CLI – not perfect but it’s the best I could do! (This is currently a working draft – please use with caution).

resource "random_string" "random" {
length = 8
special = false
upper = false
number = false
}
resource "azurerm_public_ip" "vpn_ip" {
name = "vpn-ip"
location = var.region
resource_group_name = var.resource_group_name
domain_name_label = random_string.random.result
allocation_method = "Dynamic"
tags = var.tags
}
resource "tls_private_key" "example" {
algorithm = "RSA"
rsa_bits = "2048"
}
# Create the root certificate
resource "tls_self_signed_cert" "ca" {
key_algorithm = tls_private_key.example.algorithm
private_key_pem = tls_private_key.example.private_key_pem
# Certificate expires after 1 year
validity_period_hours = 8766
# Generate a new certificate if Terraform is run within three
# hours of the certificate's expiration time.
early_renewal_hours = 200
# Allow to be used as a CA
is_ca_certificate = true
allowed_uses = [
"key_encipherment",
"digital_signature",
"server_auth",
"client_auth",
"cert_signing"
]
dns_names = [ azurerm_public_ip.vpn_ip.domain_name_label ]
subject {
common_name = "CAOpenVPN"
organization = "dev env"
}
}
resource "local_file" "ca_pem" {
filename = "caCert.pem"
content = tls_self_signed_cert.ca.cert_pem
}
resource "null_resource" "cert_encode" {
provisioner "local-exec" {
# Bootstrap script called with private_ip of each node in the clutser
command = "openssl x509 -in caCert.pem -outform der | base64 -w0 > caCert.der"
}
depends_on = [ local_file.ca_pem ]
}
data "local_file" "ca_der" {
filename = "caCert.der"
depends_on = [
null_resource.cert_encode
]
}
resource "tls_private_key" "client_cert" {
algorithm = "RSA"
rsa_bits = "2048"
}
resource "tls_cert_request" "client_cert" {
key_algorithm = tls_private_key.client_cert.algorithm
private_key_pem = tls_private_key.client_cert.private_key_pem
# dns_names = [ azurerm_public_ip.vpn_ip.domain_name_label ]
subject {
common_name = "ClientOpenVPN"
organization = "dev env"
}
}
resource "tls_locally_signed_cert" "client_cert" {
cert_request_pem = tls_cert_request.client_cert.cert_request_pem
ca_key_algorithm = tls_private_key.client_cert.algorithm
ca_private_key_pem = tls_private_key.client_cert.private_key_pem
ca_cert_pem = tls_self_signed_cert.ca.cert_pem
validity_period_hours = 43800
allowed_uses = [
"key_encipherment",
"digital_signature",
"server_auth",
"key_encipherment",
"client_auth",
]
}
resource "azurerm_virtual_network_gateway" "vpn-gateway" {
name = "vpn-gateway"
location = var.region
resource_group_name = var.resource_group_name
type = "Vpn"
active_active = false
enable_bgp = false
sku = "VpnGw1"
ip_configuration {
name = "vnetGatewayConfig"
public_ip_address_id = azurerm_public_ip.vpn_ip.id
private_ip_address_allocation = "Dynamic"
subnet_id = azurerm_subnet.yoursubnethere.id
}
vpn_client_configuration {
address_space = ["10.1.0.0/16"]
vpn_client_protocols = ["OpenVPN"]
root_certificate {
name = "terraformselfsignedder"
public_cert_data = data.local_file.ca_der.content
}
}
}
output "client_cert" {
value = tls_locally_signed_cert.client_cert.cert_pem
}
output "client_key" {
value = tls_private_key.client_cert.private_key_pem
}
output "vpn_id" {
value = azurerm_virtual_network_gateway.vpn-gateway.id
}
view raw main.tf hosted with ❤ by GitHub
#!/bin/bash
set -e
# Get vars from TF State
VPN_ID=`terraform output vpn_id`
VPN_CLIENT_CERT=`terraform output client_cert`
VPN_CLIENT_KEY=`terraform output client_key`
# Replace newlines with \n so sed doesn't break
VPN_CLIENT_CERT="${VPN_CLIENT_CERT//$'\n'/\\n}"
VPN_CLIENT_KEY="${VPN_CLIENT_KEY//$'\n'/\\n}"
CONFIG_URL=`az network vnet-gateway vpn-client generate –ids $VPN_ID -o tsv`
wget $CONFIG_URL -O "vpnconfig.zip"
# Ignore complaint about backslash in filepaths
unzip -o "vpnconfig.zip" -d "./vpnconftemp"|| true
OPENVPN_CONFIG_FILE="./vpnconftemp/OpenVPN/vpnconfig.ovpn"
echo "Updating file $OPENVPN_CONFIG_FILE"
sed -i "s~\$CLIENTCERTIFICATE~$VPN_CLIENT_CERT~" $OPENVPN_CONFIG_FILE
sed -i "s~\$PRIVATEKEY~$VPN_CLIENT_KEY~g" $OPENVPN_CONFIG_FILE
cp $OPENVPN_CONFIG_FILE openvpn.ovpn
rm -r ./vpnconftemp
rm vpnconfig.zip
view raw openvpn_gen.sh hosted with ❤ by GitHub
Standard
#terraform, Coding, vscode

Terraform, Docker, Ubuntu 20.04, Go 1.14 and MemLock: Down the rabbit hole

I recently upgrade my machine and and installed the latest Ubuntu 20.04 as part of that.

Very smugly I fired it up the new install and, as I use devcontainers, looked forward to not installing lots of devtools as the Dockerfile in each project had all the tooling needed for VSCode to spin up and get going.

Sadly it wasn’t that smooth. After spinning up a project which uses terraform I found an odd message when running terraform plan

failed to retrieve schema from provider “random”: rpc error: code = Unavailable desc = connection error: desc = “transport: authentication handshake failed: EOF

error from terraform plan

Terraform has a provider model which uses GRPC to talk between the CLI and the individual providers. Random is one of the HashiCorp made providers so it’s a really odd one to see a bug in.

Initially I assumed that the downloaded provider was corrupted. Nope, clearing the download and retrying didn’t help.

So assuming I’d messed something up I:

  1. Tried changing the docker image using by the devcontainer. Nope. Same problem.
  2. Different versions of terraform. Nope. Same problem.
  3. Updated the Docker version I was using. Nope. Same problem.
  4. Restarted the machine. Nope. Same problem.

Now feeling quite frustrated I finally remembered a trick I’d used lots when building my own terraform providers. I enabled debug logging on the terraform CLI.

TF_LOG=DEBUG terraform plan

This is where it gets interesting…

Continue reading
Standard