Kubernetes — Debugging NetworkPolicy (Part 2)

The first part of this series can be found here. It discusses features that are not available (at least yet) with NetworkPolicy, how to determine if your network plugin supports NetworkPolicy and other general steps debugging/optimization steps that will help you successfully use NetworkPolicy.

In this 2nd part of the series, we’ll discuss which side of the conversation to debug from (if you can), as well as describe a few common scenarios where NetworkPolicy can be blocking traffic as well as provide examples of NetworkPolicy objects that can be implemented to allow that traffic.

If you are restricted to vanilla NetworkPolicies, as we’ve noted previously there is no logging. Since ingress traffic will be blocked before it gets to your container(s), if it is practical you are best to debug from the source of the traffic instead. At least from the source, you’ll have configuration and/or logs to look at.

If you don’t have any control over the source, then your task may be daunting. Make sure that you have triple-checked the NetworkPolicy, and consider whether the protocol connecting to your Pod might have unexpected ports or traffic directionality.

Active mode FTP will use the connection opened by the client for a command channel, but the server will attempt to open a connection back to the client on a random port for the data channel. If your Pod is the FTP server, it would require both ingress and egress for active mode to work (I recommend passive mode as the easier solution, rather than trying to make active mode and your NetworkPolicy work together).

SQL Server, can sometimes have dynamic port ranges. And other applications and protocols could have a variety of similar behaviors. Depending on your specific application, you may need to either change your NetworkPolicies and/or modify the configuration of either your client, server or both.

Your container logs will often tell you what the issue is. For example, when DNS is blocked, you might see “DNS resolution failed” or “Resolving timed out” type error message like we do with curl in the example below:

>kubectl -n local-demo-debugnetworkpolicy-ns logs my-deployment-5cccff8466-zmthp -c do-wget 
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0
curl: (28) Resolving timed out after 1000 milliseconds

You can add a NetworkPolicy to allow all Pods to access DNS as follows:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-allpods-to-dns
spec:
policyTypes:
- Egress
podSelector: {}
egress:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- port: 53
protocol: UDP
- port: 53
protocol: TCP

Note: If you are using Kubernetes <v1.21, you may need to apply the kubernetes.io/metadata.name: kube-system label to the kube-system Namespace. You can do that like this:

kubectl label namespaces kube-system kubernetes.io/metadata.name=kube-system

When application traffic is blocked, you might see a “Failed to conect” message. For example, again using curl:

>kubectl -n local-demo-debugnetworkpolicy-ns logs my-deployment-5cccff8466-zmthp -c do-wget
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (28) Failed to connect to example.com port 80 after 703 ms: Operation timed out

In this particular example, since example.com is external to Kubernetes, you would have to use an ipBlock to specify a CIDR block, similar to the following:

apiVersion: networking.k8s.io/v1 
kind: NetworkPolicy
metadata:
name: allow-deployment-to-examplecom
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: my-deployment
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 93.184.216.34/32

ports:
- protocol: TCP
port: 80

This comes with at least one major challenge for services that are outside your control — the DNS name example.com may have more than 1 IP address that it can point to, and it may not be easily possible to determine what they are (or will be). For example, it might be hosted behind a CDN or have a presence in multiple regions and be globally load-balanced. The provider of the service may change their configuration at any time, potentially without notice to you.

Specifying IP addresses or ranges of IP addresses for services that you do not control may often result in an application that is prone to failure. If the provider of the service is not publishing their IP ranges and making commitments about when they might change the IP ranges and how much lead time you will be provided, you might want to avoid using egress ipBlock and instead use firewalls, security groups and/or proxy servers that outside of Kubernetes that have more capabilities to implement the restrictions you require.

If you do go down the pathway of using firewalls or security groups or a similar construct restricting access to external services, you may wish to configure NetworkPolicy so that all external egress is allowed (since the firewall or whatever you are using will handle that) but still restrict traffic to private IPs with NetworkPolicy. Here’s an example of a NetworkPolicy that could be used for that:

apiVersion: networking.k8s.io/v1 
kind: NetworkPolicy
metadata:
name: default-block-private-networks
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: my-deployment
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16

ports:
- protocol: TCP
port: 80

The 3 private IP ranges listed are standardized as private IP ranges. Please treat this an example, however, as your network topology and/or application requirements may be different.

Additional NetworkPolicies would be required to open up access to specific ipBlocks within the 3 private IP ranges as necessary, but these NetworkPolicies may be significantly easier to manage than the firewalls or security groups external to Kubernetes.

I hope this has provided you with some assistance in how to approach your debugging task and that the examples are useful. Part 3 of this series is now available, looking at using tcpdump to identify egress traffic that is being blocked, for those scenarios when your application logs and/or configuration do not provide enough information!

--

--

AVP, IT Foundations Platform Architecture at Sun Life Financial. Views & opinions expressed are my own, not necessarily those of Sun Life

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Paul Dally

AVP, IT Foundations Platform Architecture at Sun Life Financial. Views & opinions expressed are my own, not necessarily those of Sun Life