Why you should care about Kubernetes container requests and limits

Paul Dally
4 min readNov 14, 2023
Photo by Devin Spell on Unsplash

The Kubernetes documentation covers resource management for Pods and containers pretty thoroughly. Unfortunately, it seems that many developers have never looked (and have no interest in looking) at this documentation. But they (and you) should…

To quote the Kubernetes documentation “when you specify a resource limit for a container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set. The kubelet also reserves at least the request amount of that system resource specifically for that container to use.”

If are working in a “multi-tenant” cluster, you’ll probably want ResourceQuotas to ensure that everyone plays nice

…and in order to enable a ResourceQuota on your Namespaces for compute resources like cpu and memory, users must specify requests and/or limits for those values in their Pod specs.

However, you need to make sure that you are specifying reasonable values for the requests and limits…

If you set your requests too low, the performance of your Pods may suffer

A request is a reservation. If your container regularly needs a certain amount of cpu or ram to run successfully, then your request should likely be somewhat larger than this amount. This is because the Kubernetes scheduler ensures that, for each resource type, the sum of the resource requests of the scheduled containers is less than the capacity of the node. If the request is unreasonably low, the node may become over-burdened and performance may suffer.

For example, if your container is running a Java application and you are setting the minimum heap size (-Xms) to 512M, it isn’t ideal (and by that I mean extremely ill-advised and thoughtless, please don’t do this) to set your request to to 256Mi. Java applications use more memory than just heap, and these non-heap memory requirements may vary by application, and operating systems will sometimes share some memory between containers, so predicting an optimal memory request is not an exact science… but you probably shouldn’t be starting your memory request at significantly less than your heap size.

If you set your requests too high, costs will be higher than they should

As stated above, a request is a reservation. If you set the request too high you will waste worker node capacity and inflate your cost, since any resources reserved by your Pod are unavailable for other Pods to reserve.

Exactly how high too high is may vary, of course, but if you prevent Pods from consuming idle resources like CPU and memory because you haven’t put any thought into what your Pod may reasonably need, then those resources may go to waste.

If you set your Limits too low, your container may not be able to handle periods of peak demand

In contrast to a request, the resources specified by your limit (to the extent that they exceed your request) are not reserved. The purpose of the limit is to allow your container to occasionally exceed it’s reserved resource usage if resources happen to be available, while still restricting the container resource usage to a “reasonable” amount.

If, for example your container were to experience a memory leak, at some point you should likely want to say “this amount of usage is not reasonable” and prevent the container from continuing to consume more and more (and more…). However, if your limit is too low, then your container will not be able to access additional resources for occasional legitimate spikes in load, and performance may suffer or your container may even be restarted (e.g. OOMKilled). A restart is probably not a horrible thing in the case of a memory leak, but not what you want when you experience a non-sustained, “reasonable blip” in your traffic.

If you set your Limits too high, a misbehaving container might use unreasonable amounts of resources

If your limits are too high, misbehaving pods can consume/waste excessive amounts of resources. If – for example – the cpu limit is set to 2000m, but only rarely (under exceptional circumstances) consumes 100m, the risk is that if your application misbehaves and gets into a “non-yielding” or infinite loop hungrily consuming CPU and doing nothing of value and 1900m might be consumed and therefore be unavailable for more productive uses by other containers. If you know that usage above a particular amount for this container would be unreasonable, then that is probably a pretty good initial value for your limit.

How to determine what request and limit should be?

Ideally you should be conducting load tests (regularly) at both “expected” peak values and “exceptional” peak values. Also, you should be looking at the actual usage and metrics of your running containers, to make sure that your load tests are reasonable (if the load test, when running at levels that are close to actual production usage, doesn’t produce utilization metrics that are close to actual production metrics, then you are doing something wrong).

Ideally you’ll have robust tools for performing load tests and reporting on usage metrics. However, even without additional tools, you’ll usually have at least metrics-server (or, it can usually be easily installed if it isn’t already present) and you can use the kubectl top pods --containersrepeatedly against your production environment.

Set your request somewhat above the “expected” peak, and your limit somewhat above the “exceptional” peak, and make sure that you are regularly examining your resource usage both in production and from your load testing and adjusting the requests and limits accordingly.

References:

Resource Management for Pods and Containers

Node-pressure Eviction

Kubernetes best practices: Resource requests and limits

--

--

Paul Dally

AVP, IT Foundation Platforms Architecture at Sun Life Financial. Views & opinions expressed are my own, not necessarily those of Sun Life