Summary of Container CPU and Memory Limitations

This is an article prepared for players who have little experience with containers, mainly discussing the CPU and memory limit behavior of containers.

CPU Limit#

First of all, when Mac or Windows players use Docker Desktop, they will set the CPU limit of Docker Desktop, which is set to 1 by default, meaning that Docker Desktop can only use 1 CPU. This is because Docker Desktop wraps a virtual machine (WSL2/Hyper-V under Windows, and possibly QEMU under Mac). This is equivalent to running Docker in a host machine with a specific number of CPUs.

First, let's talk about CPU limits. Essentially, it limits the CPU usage time slice of a process. In Linux, processes have three scheduling priorities:

SCHED_NORMAL
SCHED_FIFO
SCHED_RR

1 uses the CFS scheduler in Linux, and common processes are usually SCHED_NORMAL. Okay, let's skip the prerequisite knowledge.

Speaking of CPU limits in containers, in the mainstream context, containers specifically refer to a series of technology solutions based on Linux's CGroup and Namespace for isolation, with Docker as the representative. In this context, the implementation of CPU limits utilizes three CPU subsystems in Linux CGroup. The four parameters we are mainly concerned about are as follows:

cpu.cfs_period_us
cpu.cfs_quota_us
cpu.shares
cpuset.cpus

Let's talk about them one by one.

First, let's talk about cpu.shares. In Docker, the parameter used is --cpu-shares, which is essentially a soft limit lower bound used to set the CPU utilization weight. The default value is 1024. The relative value may be a bit abstract. Let's look at an example. Suppose a 1-core host runs 3 containers, one of which has cpu-shares set to 1024, while the others have cpu-shares set to 512. When the processes in the 3 containers try to use 100% CPU (because cpu.shares is a lower limit, it is only important when using 100% CPU, which can reflect the set value), the container with a setting of 1024 will occupy 50% of the CPU time. Now, let's take another example. In the previous scenario, if the other two containers don't have too many tasks, the remaining CPU time can continue to be used by the first container with 1024.

Next, let's talk about cpu.cfs_quota_us and cpu.cfs_period_us. These two parameters need to be used in combination to take effect. Essentially, it means that within the unit time of cpu.cfs_period_us, a process can use up to cpu.cfs_quota_us (both in us). If the quota is exhausted, the process will be throttled by the kernel. In Docker, you can use the values --cpu-period and --cpu-quota to set them separately. You can also set them using --cpu. When we set --cpu to 2, the container will ensure that cpu.cfs_quota_us is twice cpu.cfs_period_us, and so on (Docker's default threshold for cpu.cfs_period_us is 100ms, which is 10000us).

Now we have talked about three parameters, so when should we use which parameter? Generally speaking, for processes that are relatively sensitive to performance, we can use cpu.shares to ensure that the process uses as much CPU as possible. Business processes can use cpu.cfs_quota_us and cpu.cfs_period_us to ensure relatively fair distribution. However, this also brings a problem. For applications with large business traffic, our RT and other indicators may have spikes due to frequent throttling. Linux 5.12 introduced a new feature, cpu.cfs_burst_us, which allows processes to accumulate a certain amount of credit during low CPU utilization periods and then exchange it for a certain buffer during intensive use, achieving less throttling and higher CPU utilization (of course, this feature has not been fully supported by mainstream containers yet).

Now a new question arises. The probability of being throttled for both shares and cpu.cfs_quota_us and cpu.cfs_period_us is not low. What should we do if we want the process to better utilize the CPU? The answer is cpuset.cpus. The parameter in Docker is --cpuset-cpus, which allows the process to be bound to specific cores.

Well, that's about CPU.

Memory Limit#

Let's start with some background information again.

First of all, when Mac or Windows players use Docker Desktop, they will set the memory limit of Docker Desktop, which is equivalent to running Docker in a host machine with a specific amount of memory.

Then, in our context today, the limitation of memory resources still relies on the Memory Subsystem of CGroup. There are many parameters, but for now, we only need to pay attention to:

memory.limit_in_bytes

This parameter represents the maximum memory limit of the container. If set to -1, it means there is no memory limit. In Docker, the parameter is --memory.

As for the behavior, there are two cases:

If there is still available memory in the system but the container's memory exceeds the limit, the container process will be killed by the OOMKiller.
If the system's memory reaches the kernel threshold before the container, the OOMKiller will calculate a score based on factors such as load and rank it from high to low for OOM Kill operations across the entire system.

Of course, there is actually one more special case. You can set the value of memory.oom_control by using the --oom-kill-disable parameter. If set to 1, the container will not be killed by OOM if its memory exceeds the limit, but will be paused. If set to 0, the container will be killed by OOM if its memory exceeds the limit.

Well, that's about it for the behavior of memory.

Summary#

That's about it. This is an article for beginners, just a casual piece of writing, please don't mind.