ruk·si

☸️ Kubernetes
Resource Limits

Updated at 2021-09-14 21:06

If someone tells you that you can use a shared service without limits, they are either lying or the system will eventually collapse.

Efficient resource utilization is about how to properly share the resources. Most applications are designed to use as much resources as possible and this doesn't play well in a distributed cluster environment. You need to build some virtual fences.

Namespace Quotas

The most global resource configuration in Kubernetes are namespace quotas. These restrict how much resources can the namespace pods take in total.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-example
spec:
  hard:
    requests.cpu: 2
    requests.memory: 2Gi
    limits.cpu: 3
    limits.memory: 4Gi

Setting namespace quotas forces pods to specify requests/limits. Otherwise they are "unlimited" and won't ever get scheduled. Because of this, I find it a good practice always specify namespace quotas so you are forced to specify something.

Pod Container Quotas

You specify how much each container will take resources. This is defined under the resources key. Remember that these limitations are per container, not per pod. Pod limitations are the sum of it's containers.

# this is somewhere under `containers:` definition...
resources:
  requests:
    memory: 300Mi  # <- the pod requires 300 MB of memory or won't be ran
    cpu: 500m      # <- the pod requires 500 millicores (0.5 CPU core) or won't be ran
  limits:
    memory: 600Mi  # <- the pod will be killed if tries to use more than 600 MB of memory
    cpu: 1         # <- the pod will be throttled if tries to use more than 1 CPU core
  • If you haven't made your program multi-core; use limits.cpu: 1 or below
  • If limits.memory is not set, the container takes "as much as it can".
  • If limits.cpu is not set, the container takes "as much as it can".

It is imperative that you set both CPU and memory requests and limits. Otherwise the cluster will not work properly in the long run, and can even crash the virtual instance nodes in the works case.

A decent way to figure out the values: It can be hard to figure out what your requests and limits should be...

  • Start a single container pod.
  • Start as much processing as you can.
  • Record the peak vCPU and memory usage when processing.
  • Stop the processing.
  • Record usage after cooling down and when idle.
  • The idle values are your requests.
  • The processing values are your limits, maybe add +15% memory to avoid OOM.
  • If your memory usage can climb rapidly e.g. up by 1 GB in a second for a while, consider setting requests.memory and limits.memory to the same value because sudden memory bursts can strangle the host virtual machine. This is called "no memory overcommitment".

If a node runs out of memory, Kubernetes will start looking for pods to kill. The kill order is something on the lines:

  1. pods with no requests.memory set or requests.memory is set but memory is still under their limits.memory (which can be infinite if not defined)
  2. if there are multiple found, rank these according to priority and the terminate the lowest priority pod
  3. if there are multiple with the same low priority, terminate the one that is most over what it requested
  4. if none of the previous steps cause anything to be terminated, start killing any workloads so system components don't get stuck

kubectl describe pod lets you know if a container was out-of-memory killed. It will look something like this:

    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137           # <- tried to use more memory that its `limits`

You should monitor your node CPU and memory usage. If memory jumps close to 100% from time to time, some processes will die midway. If CPU jumps close to 100% from time to time, some processes will be slowed. If you have issues, remember to inspect CPU and memory usage by container, not by pod.

Follow if your total CPU/Memory limit goes over 100% of the node capacity. Sudden surge in memory usage can even crash the node and bring down all the pods in it. Going over 100% memory capacity is called "overcommit". It's safer to specify such limits that this won't happen e.g. by making memory request and limit equal.

Sources