How Kubernetes determines which pods to terminate in case it is running out of resources?

There are two ways to set limits and requests for resources on Kubernetes. The first one is with LimitRange resource on a namespace level, and the second is to do it directly on the resources you are running - pods, deployments, or stateful sets.

The first approach is better if you want to make sure that all resources within a specific namespace have default CPU and memory limits and requests if not specified. Additionally, you can configure minimum and maximum resource constraints on a namespace with this approach - ensure that the resources running within a namespace can have limits and requests set up to a certain value.

The second approach is to set it directly on a service level (pod, deployment, or stateful set) under spec.containers[].resources path. This approach gives you more control over the resource, and it is good to use it in combination with the first approach - set defaults, maximum and minimum values for CPU and memory on a namespace level, and put application-specific resource configuration on the service level (pod, deployment, or stateful set).

Now, the premise of this article is not going to be on how to configure any of these approaches, the instructions for that are great and can be found in the official Kubernetes documentation.

Here, I want to discuss why setting these requests and limitations is important, and what does Kubernetes do to rank your applications for termination, in case it runs out of the available memory and CPU.

How does CPU limitation work?

The CPU resource is measured in CPU units. One CPU, in Kubernetes, is equivalent to one hyperthread on a bare-metal processor with Hyperthreading, one AWS vCPU, one Azure vCore, and one GCP Core.[1] CPU limits and requests are there to help with the adequate usage of available CPU resources on a Kubernetes cluster.

When you set CPU limits and requests, you define a hard ceiling on how much that container can use (a limit) or sort of a weighting (a request) of a container on a CPU.

And this is really interesting, if a container exceeds its CPU limit, it might or might not be allowed to that for extended periods of time. However, container runtimes don't terminate Pods or containers for excessive CPU usage.[2]

How does memory limitation work?

The memory limits and requests are measured in bytes. You can express memory as a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. The important thing to note here are the suffixes. If you request 400m of memory, this is a request for 0.4 bytes. Someone who types that probably meant to ask for 400 mebibytes (400Mi) or 400 megabytes (400M).[3]

Why do we need these requests and limitations? Because we want to ensure that the resources of our Kubernetes cluster is used in the most efficient way.

If a container tries to allocate more memory than its limit, the Linux kernel out-of-memory subsystem activates and, typically, intervenes by stopping one of the processes in the container that tried to allocate memory. If that process is the container's PID 1, and the container is marked as "restartable", Kubernetes restarts the container.[4]

So, to summarize - if a container exceeds its CPU limit it might or might not be allowed to do that, but in general, it will not be restarted by the container runtime. And if it exceeds its memory limit, it will be restarted at some point by the container runtime.

How do QoS classes fit the picture?

Kubernetes Quality of Service (QoS) classes are used to control the scheduling and evicting of Pods, based on which class they belong to. And to determine which class they belong to, Kubernetes uses CPU and memory limitations! In other words, if Pods belong to different classes, they will be treated differently when Kubernetes worker nodes run out of resources. This is okay, because we are speaking here of lifeless applications.

So, those classes are the following:

  • Guaranteed - this class ensures that pods get top priority, and they remain running until they exceed their limits. To be classified as Guaranteed, every container in a pod needs to have both memory and cpu limits and requests, and those resource-specific limits and requests need to match each other. In other words - memory.limit == memory.request and cpu.limit == cpu.request.
  • Burstable - pods part of this class have some minimal resource guarantee, but can use more resources when available. A class with middle priority. To be classified as Burstable, the pod shouldn't meet criteria for Guaranteed and at least one resource-specific limitation or request need to exist on a container.
  • BestEffort - this class is the lowest priority, and pods part of this class will be the first one to go. To be part of this class, the containers in the pod mustn't have any CPU or memory request or limitations.

Why am I mentioning all of this?

Well, to start with, it's good to know the Kubernetes-specific priority of your applications within a cluster. If some applications got restarted or have been evicted, due to worker nodes getting out of resources, and others didn't, it's good to know why.

The difference in CPU and memory limitations is good to know - when container exceeds the CPU limit, it might or might not be restarted, however, if it exceeds the memory limit, it will get restarted by the runtime at some point.

Last, but not least, its good to configure resource limitations and requests on your Kubernetes cluster, so the resources are used in the most efficient way there is.

More information about the Kubernetes QoS


  1. ↩︎

  2. ↩︎

  3. ↩︎

  4. ↩︎