My initial thinking for many years has been it's "best practice" to set all request and limit for resources in Kubernetes I might have sparingly seen why one shouldn't set the limit on pod. but i never gave it much thought, Like they say experience is the best teacher.
I've been troubleshooting sudden slowness on a certain service. Pod suddenly starts restarting after failing liveness probes. First thinking is pod is only failing probe because of overloaded CPU request. Can we move the probes to a different thread so this keeps service up and health check can be serviced on using this thread? After doing this issue still persisted severally, But then we noticed something strange as seen in the picture below,
How can a pod which uses far less than its limits be throttled? It barely uses above its requests at certain times. Then I started researching, you see when you set CPU on Pods what you really set is the amount of time allocated for a certain process. The limit is stored in a cgroup and passes those values to the cpu.cfs_quota_us. For CPU the request and limits have 2 separate control systems, while the request is always guaranteed, the limit acts as an enforcer with respect to the set limit. Which means whenever you're using more than the requested CPU resource you're already using a separate control system. This control system doesn't take into account your request but just your limit in relation to other processes in the system A longer explanation exists here.
When you set a CPU limit, you define a period and how much CPU you can have in that period. Take for example you set a CPU limit of 0.5CPU which equals 500 millicores,
Calculation
CPU Request/Limit: 0.5 vCPU = 500 millicores
CFS Period: 100ms (default for most systems)
CPU Quota Calculation
The CFS quota for a pod is calculated as:
CFS Quota (ms)=CPU Limit (millicores)×{CFS Period (ms)\1000}
Substituting the values:
CFS Quota=500×(100/1000)=50ms
Now borrowing this picture from https://x.com/danielepolencic/status/1267746412281819137/photo/1
Your container can only run process for 50ms out of 100ms even though there is still enough resource available on the node to service it. Then it has to wait for the next period to use 50ms again out of 100ms CPU time continuously until it processes a certain request finish.
My research led to me explore more services in my environment and multiple people discussing this subject, setting limits in Kubernetes is likely the greatest cause of CPU throttling, Memory is quite different because memory cannot be deallocated once given out and it uses same control system as request. For CPU limiting except you know your application traffic/processing pattern that deeply(which often is not the case) its best not too set it.
Comments