System Administrators and users working with Unix / Linux Servers must have noticed the "Load Average" but never really put much thought into how this number is generated. So let us dis cus this parameter in detail.
Load averages are generated using many different metrics, disk load, CPU usage, memory usage and much more. The load average number is the total number of processes waiting in the run queue.
One of the ways to view the Load Average is running the "top" command. Using "top" provides insight into the system’s general health status. The top command provides a view of the following main health statistics:
Uptime (including days, hours and minutes, the user count and current time and load average)
The total number of processes along with the number of running processes and sleeping processes
Memory usage including total memory, used and free memory
Swap memory usage (useful for troubleshooting slow systems)
The top command looks like this:
The topmost process on the top process list is the process using the highest percentage of CPU. The top command is available on most Unix and Linux variants.
As we’ll learn, CPU usage is not directly related to load average. Load average is an overall view of the system. Load Average value can be high generally for one of the following reasons:
1. CPU it self is busy/overloaded in processing things
2. Processes (typically called Blocking process) in run queue, waiting for I/O
If the first two figures %us and %sy are nearly 90% then Cpu is overloaded and needs to be upgrade. If the 5th figure in same line %wa is shows high numbers means there are some jobs in run queue waiting for I/O (may be trying to read data from mounting disk). Then look for that.
To diagnose which process causing this just run a command and look ‘D’ under 8th column STAT one. There may be lots of R and S as well.:
# ps faux
The explanation of the symbols D, R & S are given below:
D —> Waiting for either (CPU, Disk I/O, Network I/O)
R —-> Running
S —–> Sleeping
Also you can use the command below to find the process with stat D :
# ps axo stat,pid | grep D
One quick rule of thumb I try to use (to make sure systems do not see any latency … e.g. slow processes, slow page loads, slow queries etc…) is to keep the number of waiting processes in the run queue (the load average represents total number of processes that had to wait for resources in the last 1, 5 and 15 minutes) under the total number of processors in the machine.
To check the number of processors (recognized by the Unix/Linux OS) run the following command:
# cat /proc/cpuinfo | grep "processor" | wc -l
Keep in mind this command will return the total number of recognized processors. If you have a hyper-threaded Pentium IV you’ll see two processors when really you only have one core. The same rule applies with the load average rule of thumb.
Remember, keeping the load average under the total processor count will make for a healthy and fast-responding system.