Avoid running out of ephemeral storage space on your Kubernetes worker Nodes
Writing files to an emptyDir will consume disk somewhere. And capacity of that somewhere is probably not infinite! Just as you don’t want your home or business to have room after room filled with boxes full of junk you’ll never use, you also don’t want to have your Kubernetes cluster filled with endless amounts of junk files.
In my experience, somewhere has generally tended to be /var/lib/kubelet/pods/<poduid>/volumes/kubernetes.io~empty-dir/<volumename>/… on the underlying worker node, but that may vary depending on your specific configuration. Depending on your Kubernetes platform, you may not be able to easily determine where these files are being written, but rest assured that disk is being consumed somewhere (or worse, memory — depending on the specific configuration of your emptyDir and/or Kubernetes platform).
It doesn’t really matter exactly where the files are written to — any filesystem can fill up, and when filesystems get full bad things usually happen.
Most Important — Clean Up After Yourself
In the case of a log-handling sidecar, “cleaning up” might be making sure that logs are being rotated, compressing logs older than x days, copying logs older than y days to a secondary location like an S3 bucket and then deleting them would likely be appropriate.
What cleaning up looks like will vary depending on your specific use-case. But don’t neglect this part of the job. If you don’t clean up after yourself, you will likely be wasting your company’s money and your application may be unstable as a result.
Configure a sizeLimit on your emptyDirs
You should also strongly consider specifying a sizeLimit on emptyDirs. If you specify your emptyDir like this:
- name: www-content
then Kubernetes will evict your Pod if its actual usage exceeds the configured sizeLimit. Obviously, you may need a larger value than 2Mi. You should select a value that makes sense for your application and environment. But please put some thought into what an appropriate value might be, and configure your emptyDir accordingly.
Yes… if the sizeLimit is breached your Pod will be evicted (Kubernetes will terminate your containers and schedule a replacement Pod), but at least your Pod(s) will not greedily consume all the ephemeral local storage resources available to your Namespace (thereby potentially causing all other Pods in your Namespace to have issues) or on the worker node (thereby potentially causing all Pods across the entire worker to have issues). And while Kubernetes will eventually garbage collect evicted Pods, depending on your ResourceQuotas you may need to cleanup evicted Pods yourself in a more timely/controlled fashion. But this is better than having Pods run out of disk space.
I’ve created an walkthrough of how you can easily and automatically cleanup evicted Pods that you may want to check out.
Put ResourceQuotas on all Namespaces
If you are responsible for creating Namespaces, you should consider configuring a default
Across all Pods in the namespace, the sum of configured ephemeral storage requests or limits on those Pods cannot exceed the request and limit values configured on the ResourceQuota. This doesn’t refer to actual usage of storage, but rather the configured requests and limits.
Specify ephemeral-storage in resources section of Pod spec
By adding an ephemeral-storage request/limit to the resources section in your pod spec, you will do 2 things:
- Your Pod will be subject to resourceQuotas on local ephemeral storage if they are present. Your Pod will not deploy if it specifies a request or limit on ephemeral-storage that would cause the resourceQuota to be exceeded
- Your actual usage of local ephemeral storage will be restricted by Kubernetes, and if your Pod actually exceeds the configured limit amount it will be evicted (and therefore eventually subject to garbage collection, which will clean up the storage)
- name: www-content
- name: hello-world
- mountPath: /www
Optionally, provide a default ephemeral-storage request and limit on your LimitRange
A LimitRange is a policy to constrain resource allocations (to Pods or Containers) in a namespace. Amongst other things, it allows you to specify a default values for the resources section of containers that aren’t specifying these settings explicitly themselves: