HorizontalPodAutoscaler uses request (not limit) to determine when to scale by percent

2 min readJun 9, 2022

A HorizontalPodAutoscaler can be used to increase and decrease the number of Pods for your application based on changes in average resource utilization of your Pods. That’s really useful!

For example, an HPA can create more Pods when CPU utilization exceeds your configured threshold. When utilization drops such that fewer Pods would be able to operate at less than the configured threshold, the HPA will remove Pods. This threshold can be configured as an absolute value, but also as a percentage. This raises the question — what should the utilization be compared in order to determine that percentage?

According to the Kubernetes documentation, “When you specify the resource request for containers in a Pod, the kube-scheduler uses this information to decide which node to place the Pod on. When you specify a resource limit for a container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set. The kubelet also reserves at least the request amount of that system resource specifically for that container to use.”

In short, the limit is the maximum amount that the Pod can use, and therefore people often assume that the target utilization percentage applies to the limit. But this is wrong. HPAs apply to the request (at least for HPAs defined on resource metrics — for custom metrics, this may not be relevant), which is the reserved resources for the Pod. The documentation states “Utilization is the ratio between the current usage of resource to the requested resources of the pod.”

Suppose you’ve specified a Deployment similar to the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloworld-deployment
spec:
  template:
    spec:
      containers:
        - name: hello-world
          image: helloworld-webserver:v1.0.0
          resources:
            requests:
              cpu: 10m
              memory: 32Mi
            limits:
              cpu: 100m
              memory: 64Mi
              ...

and an HPA similar to the following:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: helloworld-deployment
spec:
  maxReplicas: 4
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: helloworld-deployment
  targetCPUUtilizationPercentage: 80

Your HPA will start scaling the replicas up when average CPU utilization exceeds 8m, not 80m.

If you wanted your HPA to allow CPU utilization greater than the request before scaling up, you can simply specify a target utilization percentage greater than 100. In this example, to allow the utilization to reach 60m before scaling up, you could specify a target utilization percentage of 600.

Kubernetes — Preventing HorizontalPodAutoscalers from Running Wild

When using Kubernetes, you might sometimes find that the number of Pods actually running isn’t the same as the number…

pauldally.medium.com

HorizontalPodAutoscaler uses request (not limit) to determine when to scale by percent

Kubernetes — Preventing HorizontalPodAutoscalers from Running Wild

When using Kubernetes, you might sometimes find that the number of Pods actually running isn’t the same as the number…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Paul Dally

No responses yet