Cluster Consul using Kubernetes API

Recently we had the desire to cluster Consul (Hashicorps K/V store) without calling out to Atlas. We deploy many clusters per day as we are constantly testing and we want Consul to simply bring itself up without having to reach out over the internet.

So we added a few changes to our Consul setup, so here goes-

Dockerfile:

This is a typical dockerfile for Consul running on alpine. Nothing out of the ordinary.

FROM alpine:3.2
MAINTAINER 	Martin Devlin <martin.devlin@pearson.com>

ENV CONSUL_VERSION    0.6.3
ENV CONSUL_HTTP_PORT  8500
ENV CONSUL_HTTPS_PORT 8543
ENV CONSUL_DNS_PORT   53

RUN apk --update add openssl zip curl ca-certificates jq \
&& cat /etc/ssl/certs/*.pem > /etc/ssl/certs/ca-certificates.crt \
&& sed -i -r '/^#.+/d' /etc/ssl/certs/ca-certificates.crt \
&& rm -rf /var/cache/apk/* \
&& mkdir -p /etc/consul/ssl /ui /data \
&& wget http://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_linux_amd64.zip \
&& unzip consul_${CONSUL_VERSION}_linux_amd64.zip \
&& mv consul /bin/ \
&& rm -f consul_${CONSUL_VERSION}_linux_amd64.zip \
&& cd /ui \
&& wget http://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_web_ui.zip \
&& unzip consul_${CONSUL_VERSION}_web_ui.zip \
&& rm -f consul_${CONSUL_VERSION}_linux_amd64.zip

COPY config.json /etc/consul/config.json

EXPOSE ${CONSUL_HTTP_PORT}
EXPOSE ${CONSUL_HTTPS_PORT}
EXPOSE ${CONSUL_DNS_PORT}

COPY run.sh /usr/bin/run.sh
RUN chmod +x /usr/bin/run.sh

ENTRYPOINT ["/usr/bin/run.sh"]
CMD []

 

config.json:

And here is config.json referenced in the Dockerfile.

{
  "data_dir": "/data",
  "ui_dir": "/ui",
  "client_addr": "0.0.0.0",
  "ports": {
    "http"  : %%CONSUL_HTTP_PORT%%,
    "https" : %%CONSUL_HTTPS_PORT%%,
    "dns"   : %%CONSUL_DNS_PORT%%
  },
  "start_join":{
    %%LIST_PODIPS%%
  },
  "acl_default_policy": "deny",
  "acl_datacenter": "%%ENVIRONMENT%%",
  "acl_master_token": "%%MASTER_TOKEN%%",
  "key_file" : "/etc/consul/ssl/consul.key",
  "cert_file": "/etc/consul/ssl/consul.crt",
  "recursor": "8.8.8.8",
  "disable_update_check": true,
  "encrypt" : "%%GOSSIP_KEY%%",
  "log_level": "INFO",
  "enable_syslog": false
}

If you have read my past Consul blog you might notice we have added the following.

  "start_join":{
    %%LIST_PODIPS%%
  },

This is important because we are going to have each Consul container query the Kubernetes API using a Kubernetes Token to pull in a list of IPs for the Consul cluster to join up.

Important note – if you are running more than just the default token per namespace, you need to explicitly grant READ access to the API for the Token associated with the container.

And here is run.sh:

#!/bin/sh
KUBE_TOKEN=`cat /var/run/secrets/kubernetes.io/serviceaccount/token`
NAMESPACE=`cat /var/run/secrets/kubernetes.io/serviceaccount/namespace`

if [ -z ${CONSUL_SERVER_COUNT} ]; then
  export CONSUL_SERVER_COUNT=3
fi

if [ -z ${CONSUL_HTTP_PORT} ]; then
  export CONSUL_HTTP_PORT=8500
fi

if [ -z ${CONSUL_HTTPS_PORT} ]; then
  export CONSUL_HTTPS_PORT=8243
fi

if [ -z ${CONSUL_DNS_PORT} ]; then
  export CONSUL_DNS_PORT=53
fi

if [ -z ${CONSUL_SERVICE_HOST} ]; then
  export CONSUL_SERVICE_HOST="127.0.0.1"
fi

if [ -z ${CONSUL_WEB_UI_ENABLE} ]; then
  export CONSUL_WEB_UI_ENABLE="true"
fi

if [ -z ${CONSUL_SSL_ENABLE} ]; then
  export CONSUL_SSL_ENABLE="true"
fi

if [ ${CONSUL_SSL_ENABLE} == "true" ]; then
  if [ ! -z ${CONSUL_SSL_KEY} ] &&  [ ! -z ${CONSUL_SSL_CRT} ]; then
    echo ${CONSUL_SSL_KEY} > /etc/consul/ssl/consul.key
    echo ${CONSUL_SSL_CRT} > /etc/consul/ssl/consul.crt
  else
    openssl req -x509 -newkey rsa:2048 -nodes -keyout /etc/consul/ssl/consul.key -out /etc/consul/ssl/consul.crt -days 365 -subj "/CN=consul.kube-system.svc.cluster.local"
  fi
fi

export CONSUL_IP=`hostname -i`

if [ -z ${ENVIRONMENT} ] || [ -z ${MASTER_TOKEN} ] || [ -z ${GOSSIP_KEY} ]; then
  echo "Error: ENVIRONMENT, MASTER_TOKEN and GOSSIP_KEY environment vars must be set"
  exit 1
fi

LIST_IPS=`curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/$NAMESPACE/pods | jq '.items[] | select(.status.containerStatuses[].name=="consul") | .status .podIP'`

#basic test to see if we have ${CONSUL_SERVER_COUNT} number of containers alive
VALUE='0'

while [ $VALUE != ${CONSUL_SERVER_COUNT} ]; do
  echo "waiting 10s on all the consul containers to spin up"
  sleep 10
  LIST_IPS=`curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/kube-system/pods | jq '.items[] | select(.status.containerStatuses[].name=="consul") | .status .podIP'`
  echo "$LIST_IPS" | sed -e 's/$/,/' -e '$s/,//' > tester
  VALUE=`cat tester | wc -l`
done

LIST_IPS_FORMATTED=`echo "$LIST_IPS" | sed -e 's/$/,/' -e '$s/,//'`

sed -i "s,%%ENVIRONMENT%%,$ENVIRONMENT,"             /etc/consul/config.json
sed -i "s,%%MASTER_TOKEN%%,$MASTER_TOKEN,"           /etc/consul/config.json
sed -i "s,%%GOSSIP_KEY%%,$GOSSIP_KEY,"               /etc/consul/config.json
sed -i "s,%%CONSUL_HTTP_PORT%%,$CONSUL_HTTP_PORT,"   /etc/consul/config.json
sed -i "s,%%CONSUL_HTTPS_PORT%%,$CONSUL_HTTPS_PORT," /etc/consul/config.json
sed -i "s,%%CONSUL_DNS_PORT%%,$CONSUL_DNS_PORT,"     /etc/consul/config.json
sed -i "s,%%LIST_PODIPS%%,$LIST_IPS_FORMATTED,"      /etc/consul/config.json

cmd="consul agent -server -config-dir=/etc/consul -dc ${ENVIRONMENT} -bootstrap-expect ${CONSUL_SERVER_COUNT}"

if [ ! -z ${CONSUL_DEBUG} ]; then
  ls -lR /etc/consul
  cat /etc/consul/config.json
  echo "${cmd}"
  sed -i "s,INFO,DEBUG," /etc/consul/config.json
fi

consul agent -server -config-dir=/etc/consul -dc ${ENVIRONMENT} -bootstrap-expect ${CONSUL_SERVER_COUNT}"

Lets go through the options here: Notice in the script we do have some defaults enabled so we may or may not included them when starting up the container.

LIST_PODIPS = a list of Consul IPs for the consul node to join to

CONSUL_WEB_UI_ENABLE = true|false – if you want a web ui

CONSUL_SSL_ENABLE = SSL for cluster communication

If true expects:

CONSUL_SSL_KEY – SSL Key

CONSUL_SSL_CRT – SSL Cert

 

First we pull in the Kubernetes Token and Namespace. This is the default location for this information in every container and should work for your needs.

KUBE_TOKEN=`cat /var/run/secrets/kubernetes.io/serviceaccount/token`
NAMESPACE=`cat /var/run/secrets/kubernetes.io/serviceaccount/namespace`

And then we use those ENV VARS with some fancy jq to get a list of IPs formatted so we can shove them into config.json.

LIST_IPS=`curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://$KUBERNETES_SERVICE_HOST:$KUBERNETES_PORT_443_TCP_PORT/api/v1/namespaces/$NAMESPACE/pods | jq '.items[] | select(.status.containerStatuses[].name=="consul") | .status .podIP'`

And we wait until the number of CONSUL_SERVER_COUNT has started up

#basic test to see if we have ${CONSUL_SERVER_COUNT} number of containers alive
echo "$LIST_IPS" | sed -e 's/$/,/' -e '$s/,//' > tester
VALUE=`cat tester | wc -l`

while [ $VALUE != ${CONSUL_SERVER_COUNT} ]; do
  echo "waiting 10s on all the consul containers to spin up"
  sleep 10
  echo "$LIST_IPS" | sed -e 's/$/,/' -e '$s/,//' > tester
  VALUE=`cat tester | wc -l`
done

You’ll notice this could certainly be cleaner but its working.

Then we inject the IPs into the config.json:

sed -i "s,%%LIST_PODIPS%%,$LIST_IPS_FORMATTED,"      /etc/consul/config.json

which simplifies our consul runtime command quite nicely:

consul agent -server -config-dir=/etc/consul -dc ${ENVIRONMENT} -bootstrap-expect ${CONSUL_SERVER_COUNT}"

 

Alright so all of that is in for the Consul image.

Now lets have a look at the Kubernetes config files.

consul.yaml

apiVersion: v1
kind: ReplicationController
metadata:
  namespace: kube-system
  name: consul
spec:
  replicas: ${CONSUL_COUNT}                               # number of consul containers
  # selector identifies the set of Pods that this
  # replication controller is responsible for managing
  selector:
    app: consul
  template:
    metadata:
      labels:
        # Important: these labels need to match the selector above
        # The api server enforces this constraint.
        pool: consulpool
        app: consul
    spec:
      containers:
        - name: consul
          env:
            - name: "ENVIRONMENT"
              value: "SOME_ENVIRONMENT_NAME"             # some name
            - name: "MASTER_TOKEN"
              value: "INITIAL_MASTER_TOKEN_FOR_ACCESS"   # UUID preferable
            - name: "GOSSIP_KEY"
              value: "ENCRYPTION_KEY_FOR_GOSSIP"         # some random key for encryption
            - name: "CONSUL_DEBUG"
              value: "false"                             # to debug or not to debug
            - name: "CONSUL_SERVER_COUNT"
              value: "${CONSUL_COUNT}"                   # integer value for number of containers
          image: 'YOUR_CONSUL_IMAGE_HERE'
          resources:
            limits:
              cpu: ${CONSUL_CPU}                         # how much CPU are you giving the container
              memory: ${CONSUL_RAM}                      # how much RAM are you giving the container
          imagePullPolicy: Always
          ports:
          - containerPort: 8500
            name: ui-port
          - containerPort: 8400
            name: alt-port
          - containerPort: 53
            name: udp-port
          - containerPort: 8543
            name: https-port
          - containerPort: 8500
            name: http-port
          - containerPort: 8301
            name: serflan
          - containerPort: 8302
            name: serfwan
          - containerPort: 8600
            name: consuldns
          - containerPort: 8300
            name: server
#      nodeSelector:                                     # optional
#        role: minion                                    # optional

You might notice we need to move this to a deployment/replicaset instead of a replication controller.

These vars should look familiar by now.

CONSUL_COUNT = number of consul containers we want to run

CONSUL_HTTP_PORT = set port for http

CONSUL_HTTPS_PORT = set port for https

CONSUL_DNS_PORT = set port for dns

ENVIRONMENT = consul datacenter name

MASTER_TOKEN = the root token you want to have super admin privs to access the cluster

GOSSIP_KEY = an encryption key for cluster communication

 

consul-svc.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: consul
  namespace: kube-system
  labels:
    name: consul
spec:
  ports:
    # the port that this service should serve on
    - name: http
      port: 8500
    - name: https
      port: 8543
    - name: rpc
      port: 8400
    - name: serflan
      port: 8301
    - name: serfwan
      port: 8302
    - name: server
      port: 8300
    - name: consuldns
      port: 53
  # label keys and values that must match in order to receive traffic for this service
  selector:
    pool: consulpool

 

consul-ing.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: consul
  namespace: kube-system
  labels:
    ssl: "true"
    httpsBackend: "true"
    httpsOnly: "true"
spec:
  rules:
  - host: consul.%%ENVIRONMENT%%.%%DOMAIN%%
    http:
      paths:
      - backend:
          serviceName: consul
          servicePort: 8543
        path: /

We run ingress controllers so this will provision an ingress so we can make Consul externally available.

 

 

 

@devoperandi

 

Kubernetes: A/B Cluster Deploys

Everything thing mentioned has been POCed and proven to work so far in testing. We run an application called Pulse in which we demoed it staying up throughout this A/B migration process.

Recently the team went through an exercise on how to deploy/manage a complete cluster upgrade. There were a couple options discussed along with what it would take to accomplish both.

  • In-situ upgrade – the usual
  • A/B upgrade –  a challenge

In the end, we chose to move forward with A/B upgrades. Keeping in mind the vast majority of our containers are stateless and thus quite easy to move around. Stateful containers are a bit bigger beast but we are working on that as well.

We fully understand A/B migrations will be difficult and In-situ will be required at some point but what the hell. Why not stretch ourselves a bit right?

So here is the gist:

Build a Terraform/Ansible code base that can deploy an AWS VPC with all the core components. Minus the databases in this picture this is basically our shell. Security groups, two different ELBs for live and pre-live, a bastion box, subnets and routes, our dns hosted zone and a gateway.

Screen Shot 2016-08-05 at 1.05.59 PM

This would be its own Terraform apply. Allowing our Operations folks to manage security groups, some global dns entries, any VPN connections, bastion access etc etc without touching the actual Kubernetes clusters.

We would then have a separate Terraform apply that will stand up what we call paasA. Which includes an Auth server, our Kubernetes nodes for load balancing (running ingress controllers), master-a, and all of our minions with the kubernetes ingress-controllers receiving traffic through frontend-live ELB.

Screen Shot 2016-08-05 at 1.06.31 PM

Once we decide to upgrade, we would spin up paasB. which is essentially a duplicate of paasA running within the same VPC.

Screen Shot 2016-08-05 at 1.06.41 PM

When paasB comes up, it gets added to the frontend pre-live ELB for smoke testing, end-to-end testing and the like.

Once paasB is tested to our satisfaction, we make the switch to the live ELB while preserving the ability to switch back if we find something major preventing a complete cut-over.

Screen Shot 2016-08-05 at 1.06.54 PM

We then bring down paasA and wwwaaaahhhllllllaaaaaa, PaaS upgrade complete.

Screen Shot 2016-08-05 at 1.07.25 PM

Now I think its obvious I’m way over simplifying this so lets get into some details.

ELBs – They stay up all the time. Our Kubernetes minions running nginx-controllers get labelled in AWS. So we can quickly update the ELBs whether live or prelive to point at the correct servers.

S3 buckets – We store our config files and various execution scripts in S3 for configuring our minions and the master. In this A/B scenario, each Cluster (paasA and paas) have their own S3 bucket where their config files are stored.

Auth Servers – Our paas deploys include our Keycloak Auth servers. We still need to work through how we transfer all the configurations or IF we choose to no longer deploy auth servers as apart of the Cluster deploy but instead as apart of the VPC.

Masters – New as apart of cluster deploy in keeping with true A/B.

Its pretty cool to run two clusters side by side AND be able to bring them up and down individually. But where this gets really awesome is when we can basically take all applications running in paasA and deploy them into paasB. I’m talking about a complete migration of assets. Secrets, Jenkins, Namespaces, ACLs, Resource Quotas and ALL applications running on paasA minus any self generated items.

To be clear, we are not simply copying everything over. We are recreating objects using one cluster as the source and the other cluster as the destination. We are reading JSON objects from the Kubernetes API and using those objects along with their respective configuration to create the same object in another cluster. If you read up on Ubernetes, you will find their objectives are very much in-line with this concept. We also have ZERO intent of duplicating efforts long term. The reality is, we needed this functionality before the Kubernetes project could get there. As Kubernetes federation continues to mature, we will continue to adopt and change. Even replacing our code with theirs. With this in mind, we have specifically written our code to perform some of these actions in a way that can very easily be removed.

Now you are thinking, why didn’t you just contribute back to the core project? We are in several ways. Just not this one because we love the approach the Kubernetes team is already taking with this. We just needed something to get us by until they can make theirs production ready.

Now with that I will say we have some very large advantages that enable us to accomplish something like this. Lets take Jenkins for example. We run Jenkins in every namespace in our clusters. Our Jenkins machines are self-configuring and for the most part stateless. So while we have to copy infrastructure level items like Kubernetes Secrets to paasB, we don’t have to copy each application. All we have to do is spin up the Jenkins container in each namespace and let them deploy all the applications necessary for their namespace. All the code and configuration to do so exists in Git repos. Thus PaaS admins don’t need to know how each application stack in our PaaS is configured. A really really nice advantage.

Our other advantage is, our databases currently reside outside of Kubernetes (except some mongo and cassandra containers in dev) on virtual machines. So we aren’t yet worried about migration of stateful data sets thus it has made our work around A/B cluster migrations a much smaller stepping stone. We are however placing significant effort into this area. We are getting help from the guys at Jetstack.io around storage and we are working diligently with people like @chrislovecnm to understand how we can bring database containers into production. Some of this is reliant upon new features like Petsets and some of it requires changes in how various databases work. Take for example Cassandra snitches where Chris has managed to create a Kubernetes native snitch. Awesome work Chris.

So what about Consul, its stateful right? And its in your cluster yes?

Well that’s a little different animal. Consul is a stateful application in that it runs in a cluster. So we are considering two different ways by which to accomplish this.

  1. Externalize our flannel overlay network using aws-vpc and allow the /16s to route to one another. Then we could essentially create one Consul cluster across two Kubernetes clusters, allow data to sync and then decommission the consul containers on the paasA.
  2. Use some type of small application to keep two consul clusters in sync for a period of time during paas upgrade.

Both of the options above have benefits and limitations.

Option 1:

  • Benefits:
    • could use a similar method for other clustered applications like Cassandra.
    • would do a better job ensuring the data is synced.
    • could push data replication to the cluster level where it should be.
  • Limitations:
    • we could essentially bring down the whole Consul cluster with a wrong move. Thus some of the integrity imagined in a full A/B cluster migration would be negated.

Option 2:

  • Benefits:
    • keep a degree of separation between each Kubernetes cluster during upgrade so one can not impact the other.
    • pretty easy to implement
  • Limitations:
    • specific implementation
    • much more to keep track of
    • won’t scale for other stateful applications

I’m positive the gents on my team will call me out on several more but this is what I could think off off the top.

We have already implemented Option #2 in a POC of our A/B migration.

But we haven’t chosen a firm direction with this yet. So if you have additional insight, please comment back.

Barring stateful applications, what are we using to migrate all this stuff between clusters? StackStorm. We already have it performing other automation tasks outside the cluster, we have python libraries k8sv1 and k8sv1beta for the Kubernetes API endpoints and its quite easy to extract the data and push it into another cluster. Once we are done with the POC we’ll be pushing this to our stackstorm repo here. @peteridah you rock.

In our current POC, we migrate everything. In our next POC, we’ll enable the ability to deploy specific application stacks from one cluster to another. This will also provide us the ability to deploy an application stack from one cluster into another for things like performance testing or deep breach management testing.

Lastly we are working through how to go about stateful container migrations. There are many ideas floating around but we would really enjoy hearing yours.

For future generations:

  • We will need some sort of metadata framework for those application stacks that span multiple namespaces to ensure we duplicate an environment properly.

 

To my team-

My hat off to you. Martin, Simas, Eric, Yiwei, Peter, John, Ben and Andy for all your work on this.

Deploying Consul in Kubernetes

Deploying many distributed clustering technologies in Kubernetes can require some finesse. Not so with Consul. It dead simple.

We deploy Consul with Terraform as a part of our Kubernetes cluster deployment strategy. You can read more about it here.

We currently deploy Consul as a 3 node cluster with 2 Kubernetes configuration files. Technically we could narrow it down to one but we tend to keep our service configs separate.

  • consul-svc.yaml – to create a service for other applications to interact with
  • consul.yaml – to create consul servers in a replication controller

When we bring up a new Kubernetes cluster, we push a bunch of files to Amazon S3. Along with those files are the two listed above. The Kubernetes Master pulls these files down from S3 and places them along with others in /etc/kubernetes/addons/ directory. We then execute everything in /etc/kubernetes/addons in a for loop using kubectl create -f.

Lets take a look at consul-svc.yaml

---
apiVersion: v1
kind: Service
metadata:
  name: svc-consul
  namespace: kube-system
  labels:
    name: consul-svc
spec:
  ports:
    # the port that this service should serve on
    - name: http
      port: 8500
    - name: rpc
      port: 8400
    - name: serflan
      port: 8301
    - name: serfwan
      port: 8302
    - name: server
      port: 8300
    - name: consuldns
      port: 8600
  # label keys and values that must match in order to receive traffic for this service
  selector:
    app: consul

 

Nothing special about consul-svc.yaml. Just a generic service config file.

So what about consul.yaml

apiVersion: v1
kind: ReplicationController
metadata:
  namespace: kube-system
  name: consul
spec:
  replicas: 3
  selector:
    app: consul
  template:
    metadata:
      labels:
        app: consul
    spec:
      containers:
        - name: consul
          command: [ "/bin/start", "-server", "-bootstrap-expect", "3", "-atlas", "account_user_name/consul", "-atlas-join", "-atlas-token", "%%ATLAS_TOKEN%%" ]
          image: progrium/consul:latest
          imagePullPolicy: Always
          ports:
          - containerPort: 8500
            name: ui-port
          - containerPort: 8400
            name: alt-port
          - containerPort: 53
            name: udp-port
          - containerPort: 443
            name: https-port
          - containerPort: 8080
            name: http-port
          - containerPort: 8301
            name: serflan
          - containerPort: 8302
            name: serfwan
          - containerPort: 8600
            name: consuldns
          - containerPort: 8300
            name: server

 

Of all this code, there is one important line.

 command: [ "/bin/start", "-server", "-bootstrap-expect", "3", "-atlas", "account_user_name/consul", "-atlas-join", "-atlas-token", "%%ATLAS_TOKEN%%" ]

-bootstrap-expect – sets the number of consul servers that need to join before bootstrapping

-atlas – enables atlas integration

-atlas-join – enables auto-join

-atlas-token – sets a token you can get from your Atlas account.

Note: make sure to replace ‘account_user_name’ with your atlas account user name.

Getting an atlas account for this purpose is free so don’t hesitate but make sure you realize, the token you generate should be highly highly secure.

So go sign up for an account. Once done click on your username in the upper right hand corner then click ‘Tokens’.

You’ll see something like this:

Screen Shot 2016-01-15 at 5.21.09 PM

Generate a token with a description and use this token in %%ATLAS_TOKEN%% in the consul.yaml above.

In order to populate the Token at run time, we execute a small for loop to perform a sed replace before running ‘kubectl create -f’ on the yamls.

 

	for f in ${dest}/*.yaml ; do

		# Consul template
	# Deprecated we no longer use CONSUL_SERVICE_IP
	# sed -i "s,%%CONSUL_SERVICE_IP%%,${CONSUL_SERVICE_IP}," $f
		sed -i "s,%%ATLAS_TOKEN%%,${ATLAS_TOKEN}," $f

	done

 

The two variables listed above get derived from a resource “template_file” in Terraform.

Note: I’ve left out volume mounts from the config. Needless to say, we create volume mounts to a default location and back those up to S3.

Good Luck and let me know if you have questions.

 

@devoperandi