Kubernetes — Using Gatekeeper For Operational Guardrails — Part 1 (Validation)
Most organizations find it necessary to enforce policies and impose governance of various kinds on their information technology assets and processes. These policies might be formal policies dictated by your security directives or audit controls, but they may also be more operational in nature.
For example, from an operational perspective, you might want to enforce default minimum ratios of request to limit on Pod resources (and not just the maximum ratios that a LimitRange can impose). Or you may want to restrict certain worker nodes in your cluster to only be able to run Pods deployed to certain Namespaces.
From a security perspective you might require that in certain Namespaces no Pods should be permitted (e.g. “default”), allow images only from certain registries or perhaps you are looking for a consistent way to implement all of your security and non-security policies with the upcoming deprecation of PodSecurityPolicy (deprecated in v1.21, planned to be removed in 1.25).
Gatekeeper can do all of this and more! Gatekeeper is “a validating (mutating in beta) webhook that enforces CRD-based policies executed by Open Policy Agent”. Your Kubernetes distribution may have Gatekeeper included already. If not, you can install Gatekeeper using these instructions.
An example scenario
Now that you have Gatekeeper installed, let’s consider a sample use-case. Imagine a multi-tenant Kubernetes cluster where applications developed by different business units are deployed into discrete Namespaces.
Two business units, one responsible for “widgets” and the other for “doodads” are specifically funding 2 worker Nodes each. Each of these 4 worker nodes are 8 vCPU and 64GB RAM. There are also 2 “shared” nodes each with 4 vCPU and 32GB RAM, the capacity of which is already substantially consumed by random pods owned by other business units. Some capacity on each worker needs to be reserved for overhead processes, so the allocatable CPU on each worker is 100m less than the actual CPU as shown below:
One of the application developers for the doodad department deploys a Deployment similar to the following:
apiVersion: apps/v1
kind: Deployment
metadata:
name: doodad-deployment
spec:
selector:
matchLabels:
app.kubernetes.io/name: doodad-deployment
replicas: 4
template:
metadata:
labels:
app.kubernetes.io/name: doodad-deployment
spec:
containers:
- name: doodad-app
image: doodad-app:1.0.0
resources:
requests:
cpu: 3500m
memory: 32Mi
limits:
cpu: 4000m
memory: 64Mi
The requested replicas fit within the ResourceQuota on their Namespace, so Kubernetes dutifully schedules the 4 Pods, choosing to place 1 Pod on each of the 4 servers that have capacity:
It is certainly possible that Kubernetes could have put 2 of the doodad pods on each of the doodad workers, since there is nothing in the Deployment that tells Kubernetes one way or the other how to schedule the Pods. To some extent, however, this is irrelevant — you want certainty, and relying on Kubernetes’ default behavior won’t give you that
A widget team developer subsequently tries to deploy the widget application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: widget-deployment
spec:
selector:
matchLabels:
app.kubernetes.io/name: widget-deployment
replicas: 2
template:
metadata:
labels:
app.kubernetes.io/name: widget-deployment
spec:
containers:
- name: widget-app
image: widget-app:1.0.0
resources:
requests:
cpu: 7500m
memory: 32Mi
limits:
cpu: 8000m
memory: 64Mi
Unfortunately, no Pods start because there are no Nodes with sufficient CPU capacity:
The widget application team is understandably irritated. You aren’t sure, but you think you heard one of the developers muttering “Kubernetes sucks… Let’s just go back to deploying directly to our own dedicated VMs…”
What everyone was hoping for is something more like this:
Good fences make good neighbours… What can we do to turn our hopes into guarantees?
Gatekeeper can do this (and much more)
In this case, we can use a Gatekeeper Constraint to mandate that Pods specify nodeSelectors based on the labels of the Namespace that they are deploying to.
If we labelled the worker Nodes with the business unit that they are provisioned for, as well as having some worker Nodes labelled as shared, we might achieve our goal if doodad Pods were required to look like this:
version: apps/v1
kind: Deployment
metadata:
name: doodad-deployment
spec:
selector:
matchLabels:
app.kubernetes.io/name: doodad-deployment
replicas: 4
template:
metadata:
labels:
app.kubernetes.io/name: doodad-deployment
spec:
containers:
- name: doodad-app
image: doodad-app:1.0.0
resources:
requests:
cpu: 3500m
memory: 32Mi
limits:
cpu: 4000m
memory: 64Mi
nodeSelector:
businessunit: doodad
Widget pods would specify a similar nodeSelector:
...
nodeSelector:
businessunit: widget
Finally, all other pods would specify:
...
nodeSelector:
businessunit: shared
In part 2 of this article, we’ll look at having Gatekeeper automatically add the required nodeSelectors — but for now, we’ll simply reject Pods that are not compliant and leave it up to the application developers to correct their pod spec and redeploy.
Preliminary steps
First, let’s take the step of labelling all of our worker Nodes with a “businessunit” label. Nodes that are not dedicated to a specific business unit would be labelled with the value “shared”. Nodes dedicated to a specific business unit would be labelled with the value reflecting that business unit. For example:
kubectl label nodes server01 --overwrite businessunit="widget"
kubectl label nodes server02 --overwrite businessunit="widget"
kubectl label nodes server03 --overwrite businessunit="doodad"
kubectl label nodes server04 --overwrite businessunit="doodad"
kubectl label nodes server05 --overwrite businessunit="shared"
kubectl label nodes server06 --overwrite businessunit="shared"
Only one label with a particular key can be present. If we wanted to have Nodes that could be shared by multiple specified business units, we could perhaps specify the Node labels something like this:
kubectl label nodes server03 --overwrite businessunit.doodad="allow"
kubectl label nodes server03 — overwrite businessunit.widget=”allow”
kubectl label nodes server04 — overwrite businessunit.doodad=”allow”
kubectl label nodes server04 — overwrite businessunit.widget=”allow”This would require some slight modifications to the sample code shown below, but it wouldn’t be very difficult. This potential enhancement, however, is left as an exercise to the reader…
Now, let’s label the namespaces to indicate which servers the Pods deployed within them should run on:
kubectl on Windowskubectl patch ns local-demo-widget-ns -p "{\"metadata\":{\"labels\":{\"businessunit\":\"widget\"}}}"
kubectl patch ns local-demo-doodad-ns -p "{\"metadata\":{\"labels\":{\"businessunit\":\"doodad\"}}}"
kubectl patch ns local-demo-other-ns -p "{\"metadata\":{\"labels\":{\"businessunit\":\"shared\"}}}"kubectl on other platformskubectl patch ns local-demo-widget-ns -p '{"metadata":{"labels":{"businessunit":"widget"}}}'
kubectl patch ns local-demo-doodad-ns -p '{"metadata":{"labels":{"businessunit":"doodad"}}}'
kubectl patch ns local-demo-other-ns -p '{"metadata":{"labels":{"businessunit":"shared"}}}'
We’re off to a great start! Now we need to put in place the enforcement mechanism.
Configure Gatekeeper’s Policy Controller for referential constraints
In our example, our validation will not be based exclusively on the Pod’s attributes. We need to compare the Pod’s nodeSelector with the Namespace labels, so we need to use something called a referential constraint.
Gatekeeper, by default, doesn’t enable referential constraints — no Kubernetes configuration data is available to Gatekeeper out of the box. This is very easy to change via Gatekeeper’s configuration:
apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
name: config
namespace: "gatekeeper-system"
spec:
sync:
syncOnly:
- group: ""
version: "v1"
kind: "Namespace"
The Config object must be named config, so if you are working with a pre-existing Gatekeeper installation be sure you aren’t overwriting an existing object that might have modified other configurations. If you needed to reference other types of objects in your ConstraintTemplates (for example, Ingresses, Services, etc.) you would simply add additional items to the syncOnly list.
Creating a ConstraintTemplate
Next we need to create a ConstraintTemplate, which contains the rules that describe whether the object should be allowed or not:
In the ConstraintTemplate above, there are 2 violation blocks. The first block checks to make sure that there is a businessunit nodeSelector. The second block checks to make sure that the businessunit nodeSelector value matches the businessunit label on the Namespace that the Pod will be deployed to. Based on the implementation shown above, all Namespaces must have a businessunit label.
Creating a Constraint
Next you need to create a Constraint — which matches your ConstraintTemplate to the types of objects that you want it to apply to:
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequireNodeSelector
metadata:
name: require-node-selector-for-pods
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
In our example, we’re specifying objects of type Pod even though the applications might be deploying objects of type Deployment (or StatefulSet or CronJob) because all of these object eventually end up creating Pods.
A Constraint can also be used to govern the kind of action that Gatekeeper takes when a violation occurs.
Warnings
Sometimes you may want to simply inform the user that something about their object is not ideal, but there are certain cases where it might be appropriate. You can cause Gatekeeper to issue warnings by adding enforcementAction: warn on the Constraint spec:
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequireNodeSelector
metadata:
name: require-node-selector-for-pods-warning
spec:
enforcementAction: warn
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
Verifying Constraint/ConstraintTemplate changes
Having implemented a Gatekeeper Constraint, you’ll invariably need at some point to make modifications to it. You may want to implement those changes in read-only mode prior to activating the changes, so that you can make sure that there will be no unexpected behaviour.
This can be accomplished by deploying an additional ConstraintTemplate and Constraint under a different name, and specifying enforcementAction: dryrun on the Constraint spec:
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequireNodeSelector
metadata:
name: require-node-selector-for-pods-dryrun
spec:
enforcementAction: dryrun
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
Exempting/Including Namespaces
You may not want to apply any Constraints to certain Namespaces (e.g. kube-system, gatekeeper-system, etc.). You can use the config object to accomplish this. You can also specify excluded namespaces in the match field of your Constraint, or conversely you can configure the specific Namespaces that the Constraint will only apply to.
The results
Assuming that we do not specify an enforcementAction of warn or dryrun, if a widget or doodad developer deploys a Deployment without a businessunit nodeSelector, the result will be that ReplicaSets created by the Deployments are not able to create Pods. Describing the ReplicaSet reveals an event similar to the following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 0s (x12 over 8s) replicaset-controller Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [require-node-selector-for-pods] A businessunit nodeSelector must be present on all Pod specs. If your Namespace does not require that Pods run on a specific set of Nodes, specify a nodeSelector of businessunit: shared. See http://example.com/businessunit_nodeselector for further information
If a doodad developer tries to specify a businessunit nodeSelector value of widget (or vice-versa), the Pod will also not be created — and describing the ReplicaSet will reveal the following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 3m34s (x20 over 42m) replicaset-controller Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [require-node-selector-for-pods] The businessunit nodeSelector value of widget does not match the namespace businessunit label value of doodad for namespace local-demo-doodad-ns. Please modify your nodeSelector value accordingly or deploy to a different Namespace. See http://example.com/businessunit_nodeselector for further information
We’ve built a good fence, and the widget and doodad teams will hopefully now be good neighbors! Great work!
Part 2 of this article explores using Gatekeeper to automatically inject the required nodeSelector rather than requiring developers to always include the nodeSelector in the objects they deploy. This technique can obviously be used for many other use-cases as well, so I recommend giving part 2 a read.
The source code used while creating this article can be found on github.