HorizontalPodAutoscaler uses request (not limit) to determine when to scale by percent

Paul Dally
2 min readJun 9, 2022

--

A HorizontalPodAutoscaler can be used to increase and decrease the number of Pods for your application based on changes in average resource utilization of your Pods. That’s really useful!

For example, an HPA can create more Pods when CPU utilization exceeds your configured threshold. When utilization drops such that fewer Pods would be able to operate at less than the configured threshold, the HPA will remove Pods. This threshold can be configured as an absolute value, but also as a percentage. This raises the question — what should the utilization be compared in order to determine that percentage?

According to the Kubernetes documentation, “When you specify the resource request for containers in a Pod, the kube-scheduler uses this information to decide which node to place the Pod on. When you specify a resource limit for a container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set. The kubelet also reserves at least the request amount of that system resource specifically for that container to use.”

In short, the limit is the maximum amount that the Pod can use, and therefore people often assume that the target utilization percentage applies to the limit. But this is wrong. HPAs apply to the request (at least for HPAs defined on resource metrics — for custom metrics, this may not be relevant), which is the reserved resources for the Pod. The documentation states “Utilization is the ratio between the current usage of resource to the requested resources of the pod.”

Suppose you’ve specified a Deployment similar to the following:

apiVersion: apps/v1
kind: Deployment
metadata:
name: helloworld-deployment
spec:
template:
spec:
containers:
- name: hello-world
image: helloworld-webserver:v1.0.0
resources:
requests:
cpu: 10m
memory: 32Mi
limits:
cpu: 100m
memory: 64Mi
...

and an HPA similar to the following:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: helloworld-deployment
spec:
maxReplicas: 4
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: helloworld-deployment
targetCPUUtilizationPercentage: 80

Your HPA will start scaling the replicas up when average CPU utilization exceeds 8m, not 80m.

If you wanted your HPA to allow CPU utilization greater than the request before scaling up, you can simply specify a target utilization percentage greater than 100. In this example, to allow the utilization to reach 60m before scaling up, you could specify a target utilization percentage of 600.

--

--

Paul Dally

AVP, IT Foundation Platforms Architecture at Sun Life Financial. Views & opinions expressed are my own, not necessarily those of Sun Life