Kubernetes — Debugging NetworkPolicy (Part 3)

Paul Dally
8 min readFeb 10, 2022

--

The first part of this series can be found here. It discusses features that are not available (at least yet) with NetworkPolicy, how to determine if your network plugin supports NetworkPolicy and other general debugging/optimization steps that will help you successfully use NetworkPolicy.

The second part of this series can be found here. It discusses which side of the conversation to debug from (if you can), as well as describes a few common scenarios where NetworkPolicy can be blocking traffic as well as provides examples of NetworkPolicy objects that can be implemented to allow that traffic.

This third and final part of the series discusses a potential approach for determining what egress traffic is being blocked for scenarios where the application configuration and/or logs don’t make that clear.

What should you do if you don’t know exactly what egress traffic is being blocked?

Unfortunately, sometimes it isn’t as easy as shown above to determine what destination and/or ports are being blocked. The ports used by the destination service might be dynamic, so inspecting the application configuration might not provide sufficient insight. Sometimes application error messages are… let’s call it “less than helpful”. And as we saw previously, vanilla NetworkPolicy does not provide the ability to log dropped traffic.

In a non-containerized on-prem world, we might have used a sniffer (or engaged our network services team so that they could use a packet sniffer/analyzer) to look for problems. In containers, there are a number of potential tools that you could use. For the purposes of this article, I’ll use tcpdump, a popular free/open-source command-line packet sniffer for linux.

A word of advice: don’t install debug tools in your application image or application containers

You probably don’t have debugging tools like tcpdump in your application image — and you probably don’t want them in your application image. This will make your application images larger and potentially slow down image build and Pod startup, makes your image more difficult to maintain and increases the chance that a security vulnerability might impact your application.

You might think to simply exec into a running container and install the desired tools within it, but that brings up other potential issues:

  • Running application containers as root is not recommended
  • You’ll need to reinstall every time the container restarts
  • Depending on the configuration of your NetworkPolicy or cluster, your container may not even be able to access the Kubernetes DNS server, package repositories, etc.

Keep in mind that Pods share network resources amongst all containers in the Pod. This means that tcpdump in “container A” can see the traffic produced by “container B”. As a result, you shouldn’t need to modify your application containers at all — just run your debug tools from another container in the same Pod.

Example

Let’s create two Dockerfiles — one that will represent our application, and one that will be used for our debugging tools.

Dockerfile.app

FROM alpine:latestRUN apk update && \
apk --no-cache add \
bash \
curl

Dockerfile.debug

FROM alpine:latestRUN apk update && \
apk --no-cache add \
bash \
tcpdump
CMD exec /bin/bash -c "trap : TERM INT; sleep infinity & wait"

and build them:

> docker build --no-cache --progress=plain -f docker\Dockerfile.app -t do-wget:1.0.0 docker\> docker build --no-cache --progress=plain -f docker\Dockerfile.debug -t debug-tools:1.0.0 docker\

and then deploy a Deployment that looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
namespace: local-demo-debugnetworkpolicy-ns
spec:
selector:
matchLabels:
app.kubernetes.io/name: my-deployment
replicas: 1
template:
metadata:
labels:
app.kubernetes.io/name: my-deployment
spec:
containers:
- name: do-wget
image: do-wget:1.0.0
imagePullPolicy: IfNotPresent
command:
- bash
- -c
- |
while true; do
curl example.com --connect-timeout 1 > file.txt
sleep 10
done
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
cpu: 50m
memory: 64Mi

We are now ready to start introducing tcpdump!

You might be able to use debugging tools in an ephemeral container

Ephemeral container support was introduced in alpha in vKubernetes v1.16 and is currently in beta as of Kubernetes v1.23. As such, your cluster may not yet support it, or you may not be comfortable using alpha features (especially if your cluster is handling critical workloads).

I haven’t tried this yet, but the documentation looks pretty straight forward. Your mileage may vary.

https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/

https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/#ephemeral-container

I’ll update this document once I’ve had a chance to give this a try with more details — until then, feel free to let me know if you’ve had either positive or negative experiences with using ephemeral containers.

You may be able to use a temporary sidecar container

To use a temporary sidecar, you’ll need to add another container to the podSpec of your application. We would want to modify our configuration so that it subsequently looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: my-deployment
namespace: local-demo-debugnetworkpolicy-ns
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: my-deployment
template:
metadata:
labels:
app.kubernetes.io/name: my-deployment
spec:
containers:
- command:
- bash
- -c
- |
while true; do
curl example.com --connect-timeout 1 > file.txt
sleep 10
done
image: do-wget:1.0.0
imagePullPolicy: IfNotPresent
name: do-wget
resources:
limits:
cpu: 50m
memory: 64Mi
requests:
cpu: 10m
memory: 32Mi
- image: debug-tools:1.0.0
imagePullPolicy: IfNotPresent
name: debugtools-sidecar
resources:
limits:
cpu: 50m
memory: 64Mi
requests:
cpu: 10m
memory: 32Mi

If you have access to patch Deployments (for example, you are working locally on something like Minikube or Docker Desktop), you could potentially use kubectl edit to update the Deployment. If you are using helm, you might be able to edit your values.yaml — but it depends on the specific implementation of your helm chart.

You could also potentially use the helm template command to generate the yaml that Helm would have deployed, and then edit that and deploy. If you are using Kustomize, you could simply use an overlay and patch the sidecar in. Or you could use helm template and then Kustomize the output… your available choices will vary depending on your exact configuration and CI/CD pipelines, etc.

Let’s assume we use Kustomize. In a new “debug” overlay, we might have a kustomization.yaml that patches the Deployment as follows:

bases:
- ../local
patchesJson6902:
- target:
version: v1
group: apps
kind: Deployment
name: my-deployment
path: inject-debugtools-sidecar-patch.yaml

The patch (in inject-debugtools-sidecar-patch.yaml) might look like this:

- op: add
path: "/spec/template/spec/containers/-"
value:
name: debugtools-sidecar
image: debug-tools:1.0.0
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 50m
memory: 64Mi
requests:
cpu: 10m
memory: 32Mi

We can then deploy the kustomization, and exec into the newly added debugtools-sidecar container and run tcpdump:

>kubectl kustomize k8s\overlays\debug | kubectl apply -f -
namespace/local-demo-debugnetworkpolicy-ns unchanged
deployment.apps/my-deployment configured
networkpolicy.networking.k8s.io/allow-allpods-to-dns unchanged
networkpolicy.networking.k8s.io/deny-all unchanged
>kubectl -n local-demo-debugnetworkpolicy-ns get pod
NAME READY STATUS RESTARTS AGE
my-deployment-5cccff8466-h4bws 2/2 Running 0 29s
>kubectl -n local-demo-debugnetworkpolicy-ns exec -it my-deployment-5cccff8466-h4bws -c debugtools-sidecar -- bash
bash-5.1# tcpdump -i eth0

Interpreting tcpdump output

You can use something like Wireshark to more easily parse/filter/analyze the output of tcpdump.

This article isn’t intended to be a full primer on how to use tcpdump. However, be aware that -l eth0 will not capture traffic to localhost, use -l lo instead. Wireshark specifically would require that you execute the tcpdump command with a -w <filename> argument, and perhaps use something like kubectl cp to retrieve the resulting file from the container’s filesystem.

The “tcpdump -i eth0” command that we ran above will result in a bunch of output. First we see some DNS lookups (I’ve snipped out a bunch for make this more readable):

tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
19:36:41.508244 IP my-deployment-5cccff8466-h4bws.59813 > kube-dns.kube-system.svc.cluster.local.53: 47110+ A? example.com.local-demo-debugnetworkpolicy-ns.svc.cluster.local. (80)
19:36:41.508427 IP my-deployment-5cccff8466-h4bws.59813 > kube-dns.kube-system.svc.cluster.local.53: 47910+ AAAA? example.com.local-demo-debugnetworkpolicy-ns.svc.cluster.local. (80)
<snip>

You can identify the DNS lookups because their destination is kube-dns.kube-system.svc.cluster.local.53. There may also be ARP traffic, which can be identified by the traffic type (ARP instead of IP after the timestamp). We aren’t typically that interested in these lines.

Soon, though, we see lines like this:

19:36:41.510322 IP my-deployment-5cccff8466-h4bws.49736 > 93.184.216.34.80: Flags [S], seq 3928263212, win 64800, options [mss 1440,sackOK,TS val 1896154571 ecr 0,nop,wscale 7], length 0
19:36:41.518392 IP my-deployment-5cccff8466-h4bws.43159 > kube-dns.kube-system.svc.cluster.local.53: 42685+ PTR? 10.0.96.10.in-addr.arpa. (41)
19:36:41.525286 IP kube-dns.kube-system.svc.cluster.local.53 > my-deployment-5cccff8466-h4bws.43159: 42685*- 1/0/0 PTR kube-dns.kube-system.svc.cluster.local. (116)
19:36:41.526170 IP my-deployment-5cccff8466-h4bws.55320 > kube-dns.kube-system.svc.cluster.local.53: 18320+ PTR? 34.216.184.93.in-addr.arpa. (44)
19:36:41.673054 IP kube-dns.kube-system.svc.cluster.local.53 > my-deployment-5cccff8466-h4bws.55320: 18320 NXDomain 0/1/0 (138)
19:36:46.517367 ARP, Request who-has my-deployment-5cccff8466-h4bws tell ip-10-244-120-64.eu-west-2.compute.internal, length 28

The flag [S] tells us that this packet is a “SYN” packet from the Pod to example.com (93.184.216.34:80), which is the first part of the TCP 3-way handshake. A “SYN/ACK” packet from example.com to the Pod should follow, but there isn’t anything. The lack of the “SYN/ACK” packet is quite meaningful.

If the conversation with example.com was not blocked, the output would look something like this instead:

19:54:30.534146 IP my-deployment-5cccff8466-x5kh9.46734 > 93.184.216.34.80: Flags [S], seq 4131307425, win 64800, options [mss 1440,sackOK,TS val 1393289043 ecr 0,nop,wscale 7], length 0
19:54:30.627479 IP my-deployment-5cccff8466-x5kh9.51298 > kube-dns.kube-system.svc.cluster.local.53: 13910+ PTR? 34.216.184.93.in-addr.arpa. (44)
19:54:30.636325 IP 93.184.216.34.80 > my-deployment-5cccff8466-x5kh9.46734: Flags [S.], seq 452881824, ack 4131307426, win 65535, options [mss 1460,wscale 2,eol], length 0
19:54:30.636388 IP my-deployment-5cccff8466-x5kh9.46734 > 93.184.216.34.80: Flags [.], ack 1, win 507, length 0

Notice that after the packet from my-deployment-5cccff8466-x5kh9.46734 to 93.184.216.34.80 with Flags [S], there is a packet in the opposite direction (e.g. from 93.184.216.34.80 to my-deployment-5cccff8466-x5kh9.46734) with Flags [S.] and and ack value (that is to say, a “SYN/ACK” packet).

For TCP, all SYN packets should be followed by a SYN/ACK packet. This may not be the very next packet in the tcpdump output if other network traffic is occuring in your Pod, but using the timestamps at the beginning of each line, it shouldn’t be very long afterwards in the output. If we don’t see the SYN/ACK, then something (NetworkPolicy or some other firewall) may well be dropping that request.

In our case, since we read part 1 of this series and as a result we previously tested the application without any NetworkPolicies in place and saw that the application worked, we now know conclusively that the problem is that our NetworkPolicies are not allowing requests to IP address 93.184.216.34 on port 80, and can add a NetworkPolicy to begin allowing that IP!

Security settings on some clusters may by default prevent tcpdump from running

Be advised that the security settings of some Kubernetes clusters will not allow tcpdump to function in a container by default. For example, OpenShift documentation suggests that this may not work and you should probably follow these instructions instead: https://access.redhat.com/solutions/4569211

Good luck, and may v2 of NetworkPolicy arrive soon and usher in a new golden age!

--

--

Paul Dally

AVP, IT Foundation Platforms Architecture at Sun Life Financial. Views & opinions expressed are my own, not necessarily those of Sun Life