Kubernetes — Application High-Availability, Part 1 (The Very-Basic Basics)
Kubernetes has a number of features that can help to ensure application high-availability.
What You Get For Free
If a Node becomes unavailable, Kubernetes will move Pods created via Deployments or ReplicaSets to the Nodes that remain in the cluster, so long as there is capacity and the Pod has not been configured to preclude this possibility (for example, specifying a nodeSelector or podAntiAffinity that is incompatible with the remaining Nodes or using a local-storage PersistentVolumeClaim).
If a container crashes, Kubernetes will attempt to restart the container.
Already, we’re off to a pretty good start!
Multiple Replicas
Deploying more than one instance of your Pods is probably the easiest and most effective single approach to application high availability. With a Deployment or StatefulSet, it is easy to get started — simply specify replicas > 1 in the spec of the object. For example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: helloworld-deployment
spec:
replicas: 2 <-- change this for more Pods
template:
metadata:
labels:
app.kubernetes.io/name: helloworld
...
Load Balancing
Of course, consumers of the application likely need a way to connect to it that is aware of these Pods and can direct requests to these Pods. A Service is what you would want to use for that. For example:
apiVersion: v1
kind: Service
metadata:
name: helloworld-svc
namespace: helloworld-ns
spec:
selector:
app.kubernetes.io/name: helloworld
clusterIP: None
ports:
- name: http
protocol: TCP
port: 8080
targetPort: 8080
Now other Pods inside the Kubernetes cluster can simply refer to the service when they want to use the Pods, for example http://helloworld-svc.helloworld-ns:8080/foo
Ensuring That Pods Are Distributed Across Nodes
Even with multiple replicas and load-balancing, it will be a problem if “too many” Pods are scheduled on the same Node. What number is “too many” can vary depending on the application, but certainly you don’t want all of your Pods scheduled on the same Node because if that Node experiences an outage, then all of your Pods will be impacted.
There are a number of ways to govern how Kubernetes assigns your Pods to Nodes. nodeSelector and affinity will help you put your Pods on the Nodes that you want and avoid the Nodes that you don’t want, and topologySpreadConstraint will help to ensure that Pods are distributed across the Nodes that you want in the way that you want.
For example, the following addition to your podSpec would cause the Pod to attempt to start in zone ca-central-1a but allow it to start in ca-central-1b or ca-central-1d as well (assuming that you have worker Nodes in your cluster in those zones) if it couldn’t start in ca-central-1a:
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- ca-central-1a
Assuming that you have worker nodes in ca-central-1a, ca-central-1b and ca-central-1d and your replicas was 3 and the worker nodes in those zones have capacity for your Pod, the following will cause 1 Pod to be created in each zone:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: "my-app"
You can have multiple topologySpreadConstraints and you can combine them with nodeAffinity as well. There is great documentation on nodeAffinity and topologySpreadConstraints available here and here to get you started.
Quick Verification of Your High-Availability Configuration
How can you determine whether your high-availability configurations are doing what you think? You can start with the following command:
kubectl -n <namespace> get pod -o=wide
This will show you the worker Node that the pod has been scheduled on — allowing you to verify whether the scheduling has occurred according to your expectation.
You can determine what labels the Node had by running a kubectl describe:
kubectl describe node <nodename>
You should probably try deleting the pod a few times, or perhaps in some cases temporarily increase the number of replicas or minimum HorizontalPodAutoscaler configuration, etc. to ensure that the Pod wasn’t scheduled on an acceptable Node just out of luck.
If you are working with a StatefulSet, I also recommend testing out nodeAffinity, nodeSelector and topologySpreadConstraints with a simple helloworld-type Deployment. StatefulSets will “stick” your worker nodes to the Node on which they are initially scheduled, and you will need to delete the Pod (and any automatically created PersistentVolumeClaims if you are using volumeClaimTemplates) before Kubernetes will schedule it onto a different Node. With a Deployment, you simply need to apply any nodeAffinity, nodeSelector or topologySpreadConstraint changes and Kubernetes should automatically terminate Pods that don’t meet what you have specified and automatically schedule replacements on Nodes that do.
We’ve Only Just Begun
So far, we’ve only scratched the surface of the features that Kubernetes provides for application high-availability. We’ll continue with Part 2.