Created on 2015-Feb-18
Updated on 2015-Feb-18
If your server goes down, and you get a message like:
Feb 16 03:00:12 server kernel: INFO: task httpd:16101 blocked for more than 120 seconds.
Verify memory usage:
sar -r
Verify CPU usage:
sar -u
For me the culprit was CPU reaching %idle reaching 99.18
01:40:01 PM CPU %user %nice %system %iowait %steal %idle ... 06:30:01 PM all 42.52 0.00 3.52 0.32 0.00 53.65 06:40:01 PM all 43.72 0.00 3.60 0.32 0.00 52.36 06:50:01 PM all 42.32 0.00 3.47 0.38 0.00 53.82 07:00:01 PM all 41.38 0.00 3.47 0.25 0.00 54.91 07:10:01 PM all 44.30 0.00 3.50 0.65 0.00 51.54 07:20:01 PM all 36.22 0.00 2.89 0.38 0.00 60.50 07:30:01 PM all 31.56 0.00 2.72 0.26 0.00 65.47 07:40:01 PM all 23.77 0.00 2.03 0.41 0.00 73.79 07:50:01 PM all 2.01 0.00 0.20 0.14 0.00 97.65 08:00:01 PM all 0.59 0.00 0.07 0.15 0.00 99.18 Average: all 36.88 0.10 3.22 0.39 0.00 59.41 08:06:39 PM LINUX RESTART 08:10:01 PM CPU %user %nice %system %iowait %steal %idle 08:20:01 PM all 50.79 0.00 3.62 0.08 0.00 45.51 Average: all 50.79 0.00 3.62 0.08 0.00 45.51
Temporarily apply new settings for a couple of days. See here for a detailed explanation.
sudo sysctl -w vm.dirty_ratio=10 sudo sysctl -w vm.dirty_background_ratio=5 sudo sysctl -p
If everything runs smoothly, make the changes permanent:
vi /etc/sysctl.conf
Enter the following:
vm.dirty_background_ratio = 5 vm.dirty_ratio = 10
Now reboot.
reboot
Thanks to blackMORE Ops for the detailed tutorial.