Session Affinity and Kubernetes— Proceed With Caution!

5 min readMay 5, 2023

Sometimes, you may find that your Kubernetes application doesn’t work properly when you first run more than 1 replica.

In my experience, this is most frequently because the application is “Pod-stateful” — for example, the application may store session state in memory or is writing data required during subsequent requests to local-storage. An initial request is sent to Pod A and the state is initialized and some subsequent request gets sent to Pod B which doesn’t have access to that state, and as a result the application doesn’t behave as desired…

Often, the initial reaction is “Ok, how can I make all requests for this session go to back to the Pod that handled the initial request?” Kubernetes Services support affinity by ClientIP. If you are accessing your application via an Ingress, most Ingress controllers will support a feature called session affinity (example 1, example 2, example 3), usually using cookies.

Please don’t think, however, that these options are silver bullets! It’s worth noting that one of the Principles of Container-Based Application Design is Process Disposability. Containers should be as “ephemeral as possible and ready to be replaced by another container instance at any point in time”. When — for example — your containers are storing session state in memory, they can’t really be replaced by another container instance (at least not without impacting some portion of your active users). There are some really good reasons for this principle.

Let’s look at an example

Consider the following trivial java servlet:

import java.io.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

@WebServlet("/helloworld")
public class HelloWorldServlet extends HttpServlet {

    public static final String LANG_PARAM_NAME = "lang";

    protected void doPost(HttpServletRequest request, 
                         HttpServletResponse response)
            throws ServletException, IOException {
         
        HttpSession session = request.getSession();
        session.setAttribute(LANG_PARAM_NAME, 
                request.getParameter(LANG_PARAM_NAME);
    }

    protected void doGet(HttpServletRequest request, 
                         HttpServletResponse response)
            throws ServletException, IOException {
         
        HttpSession session = request.getSession();
         
        PrintWriter writer = response.getWriter();
        writer.println("Language: " + session.getAttribute(LANG_PARAM_NAME));
    }
}

On the first (POST) request, the application retrieves the language parameter from the request and stores its value in session state. Subsequent (GET) requests associated with this session are expecting the language parameter to be available from the session state, but it will only be there if all requests get routed to the same Pod — which by default won’t be the case if replicas > 1 (which, if you care at all about the stability of your application, you probably want).

Service Session Affinity

One option is to implement session affinity on the Service:

kind: Service
apiVersion: v1
metadata:
  name: myservice
spec:
  selector:
    app: myapp
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 80
  # The following adds session affinity
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 600

This works when accessing a ClusterIP Service directly, and when using a LoadBalancer type Service. Unfortunately, many highly-available NodePort Service and Ingress configurations probably won’t work by default, because the ClientIP that the Service gets will be the IP of a load-balancer or the Ingress controller Pods and not that of the client.

Ingress Session Affinity

Another option, if your request is coming through an Ingress, is to use cookie session affinity provided by the Ingress. There are a number of different Ingress implementations, but let’s take a look at just one — Ingress-NGINX Controller for Kubernetes — as an example:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: helloworld-deployment-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    nginx.ingress.kubernetes.io/use-regex: "true"    
    nginx.ingress.kubernetes.io/affinity: "cookie" 
    nginx.ingress.kubernetes.io/session-cookie-path: "/"
spec:
  ingressClassName: myingressclass
  rules:
  - http:
      paths:
      - path: /helloworld/(.*)
        pathType: Prefix
        backend:
          service: 
            name: helloworld-deployment-svc
            port:
              number: 8080

Problem Solved… Right???

After implementing one of the above “solutions”, you then try hitting the application, and… success! Right…?

Not so fast, please!

This has made things somewhat better, for sure... However, there are lots of scenarios where your in-memory session state will not be available, and these scenarios will often not be scenarios that you are testing explicitly.

Worker node restart

Your worker nodes will sometimes restart, and sometimes this will be outside of your control. Planned patching and upgrades and other maintenance may occur, but hardware failure (even with the major cloud providers) can and does happen without warning. Unfortunately, when a worker node restarts, the state that exists in any Pod running on that worker will be gone and users will be impacted.

Container Restart

If a container fails its livenessProbe, it will be restarted. The application may attempt to consume more memory than its configured limit, and the container may be OOMKilled. The container process might crash. Unfortunately, when the container restarts, the state is gone and users will will be impacted.

Autoscaling

HorizontalPodAutoscaling automatically updates workload resources (e.g. Deployments or StatefulSets), with the aim of scaling the resource to match demand. This means adding new Pods when the application is busy and removing Pods when the application is less busy. Unfortunately, when Pods are removed by the HPA, users will be impacted.

VerticalPodAutoscaling will also cause containers to restart at perhaps unexpected times. It updates container requests automatically based on usage, to allow the containers to have access to the resources that they need and allow proper scheduling of the Pods onto nodes so that appropriate resource amount is available. Every time the request is updated, however, the container is restarted and users will be impacted.

Worker-Node Resource Starvation

Unfortunately, compute resources are not infinite — and sometimes a worker node might find itself with insufficient resources for the scheduled Pods. This can lead to node-pressure eviction, where Kubernetes proactively terminates Pods to reclaim resources on the Node and ideally distribute the workloads in a more optimal fashion based on available capacity. Unfortunately, when the Pods are terminated and rescheduled on other worker nodes, users will be impacted.

Rollout of Application Changes

When application changes are made that change the Pod spec, the Pods will be terminated and a new Pod scheduled so that the changes can take effect (or, sometimes, a new Pod will be scheduled first and then the existing Pod will be terminated. Either way, one Pod goes away and a replacement Pod takes its place). Unfortunately, again, users will be impacted.

Uneven Load Distribution

Because clients are stuck to a particular replica, it can be extremely difficult to ensure that load is evenly distributed amongst your replicas. In extreme cases, this can lead to reliability and performance problems if an excessive amount of connections get stuck to some replicas. Adding additional replicas may not help resolve these issues. For example, imagine that you are starting a StatefulSet with 8 replicas, and for some reason that you’ve deleted the StatefulSet…

When you re-deploy and the first Pod starts, 100% of the workload hits the first (and only) Pod that has started. 7 more Pods start, but they will only get new traffic, and most of the workload remains on the first Pod.

Better Solutions

Impacting (adversely) application users is usually something we should try to avoid.

Consider using an external cache (example 1, example 2, example 3) or database (example 1, example 2, example 3) for your session state. If the state is files, consider using a shared file system like NFS or CIFS/SMB or perhaps even use Kubernetes APIs to update a ConfigMap or Secret. If a vendor is requiring that you configure session affinity for a containerized application, you may want to suggest that their application should be improved!

Either way, resist the urge to simply configure session affinity just because you can — perhaps you should be removing the “requirement” for session affinity instead.