NEW: Rule of Thumb Series
Linux Server Performance Monitoring for Windows Administrators


Looking at the Big Picture of Linux Server Performance

Last month we took a look at the big picture of Windows server performance monitoring – what to monitor, how often, and basic rules of thumb. This month we jump platforms to Linux and discuss key Linux performance metrics that system administrators newly transplanted from Windows administration – or anyone new to Linux administration – will want to know about. Here’s what we recommend you look at when investigating performance issues on Linux Servers:

As Always: Check Memory First

If your system experiences low free memory – less than 15% - over significant periods of time (15 minutes or more), it may be suffering from a memory bottleneck. However, to confirm this on Linux you need to dig a little deeper. If you see significant page outs (more than 300-500 per second), swap outs (more than 1 per second), or excessive swap space used (more than 80%-90%), you can be reasonably sure that your system is memory constrained.

If your system is regularly experiencing any of these symptoms, first examine processes consuming large amounts of memory to rule out design flaws or algorithm problems (for example, an application could be continually allocating memory but failing to release it when it is no longer needed). If process behavior appears normal, you may need to consider shifting application workloads or increasing memory capacity. And remember that low free memory by itself, with no other symptoms, is not necessarily a problem – it could simply mean that resources are being utilized effectively!

Next Stop: I/O

Once you have ruled out memory problems (which can contribute to high I/O through paging and swapping), the next place to look is I/O. You will want to check the performance specifications of the disk(s) in question, but a good general rule of thumb is to investigate any disk experiencing a sustained I/O rate greater than 100 KB/second.

As with memory investigations, you will want to check the application performing the I/O for design or algorithmic issues, and then consider whether your application load is reasonable for the server and disk(s) currently in use. An easy way to do this on Linux is to look at which processes are consuming large amounts of CPU time; chances are, those processes are contributing to your I/O load.

Last but Not Least: CPU

It’s normal to see occasional spikes in CPU usage, but a CPU Busy% consistently over 60% (for 15 minutes or more) may indicate a problem, and a value consistently over 80% indicates a likely CPU bottleneck.

As with a Windows CPU investigation, you’ll want to check for individual processes using large amounts of CPU time. Any single process consuming more than 50% of the CPU on a consistent basis could indicate bugs or efficiency problems with the application; or, it’s possible that high CPU utilization is simply a signature of this particular application.

A good auxiliary metric to look at is the number of runnable processes – that is, the number of processes that could use the CPU if it were available – or, in other words, the size of the CPU queue. If you see high CPU usage and more than one runnable process over time, that means processes are consistently having to wait for the CPU – a sure sign of a bottleneck. On the other hand, a high overall CPU Busy% with no runnable processes likely means that the CPU is simply being utilized effectively.

Two more indicators to help track down CPU issues: the total number of processes on the system (to see if your application workload is approaching or exceeding the system’s capacity – your specific mileage will vary), and the presence of zombie processes – processes that are using resources but not doing any useful work (which could indicate an application bug or design flaw).

As Always: Your Mileage May Vary

In any performance monitoring exercise, remember that these are rules of thumb, and you will need to make adjustments based on the characteristics of your various expected workloads; however, if you monitor these metrics consistently, you will develop a clear picture of exactly what constitutes normal behavior on your system, and you will be better equipped to detect and deal with the subtle symptoms that spell trouble ahead.

Next month: Tips for SQL Server Monitoring