Home > Uncategorized > Storage and Windows systems monitoring criteria

Storage and Windows systems monitoring criteria

Evaluation of the following windows performance counters and cross matching with storage systems performance statistics can assist in identifying the application workload pattern. The performance criteria we should be evaluating is broken down into two main sections, Storage and Windows systems “Workload” performance counters.

Storage performance monitoring should make up the bulk of the information we review to identify storage related bottlenecks or performance problems. The latency and queue length counters can be used by the System owners and the application development team to self-diagnose potential storage issues. You could also potentially rollout the performance logging via SCOM to the servers and alert on the below defined thresholds for further investigation.

Storage

Storage Processor

Processor % utilisation
Processor Bandwidth MB/s
Processor Throughput IOPS
Processor Cache forced flushes

The above statistics and counters should be recorded and reviewed on the storage arrays regularly using the appropriate storage vendor tool sets (ECC and analyser). There are significantly more performance counters to track at the storage level but as a rule these will help set a baseline.

Latency

Windows Performance Counters

PhysicalDisk(*)\Avg. Disk sec/Read
PhysicalDisk(*)\Avg. Disk sec/Write
PhysicalDisk(*)\Avg. Disk sec/Transfer (combination of the above two counters)

The counter returns a value in seconds, therefore 0.010 is 10ms

Disk latency should fall within the following acceptance criteria:

• Less than 5ms is considered excellent
• Less than 10ms is considered good
• Less than 15ms is considered acceptable
• Less than 20ms is fair
• More than 20ms and less than 50ms is poor
• More than 50ms is substandard

Latency should be measured in the following locations to isolate the source of the latency when it is identified:

1. Within the Operating System (perfmon or top)
2. Within the Hypervisor (esxtop or resxtop)
3. At the Storage subsystem, LUN response times (latency)

Disk queue length

Disk queue length does not accurately reflect performance but can be used to assist in the diagnosis of performance issues. Queue lengths that grow significantly and that exceeding the expected performance of the underlying storage highlight bottlenecks but those bottlenecks can exist in several locations.

Windows Perfromance counters

\PhysicalDisk(*)\Avg. Disk Write Queue Length
\PhysicalDisk(*)\Avg. Disk Read Queue Length
\PhysicalDisk(*)\Current Disk Queue Length
\PhysicalDisk(*)\Avg. Disk Queue Length

Acceptance criteria:

• 2x 3x the number of disk spindles used to create the volume (assuming they are dedicated)
• 30 plus sustained is a reason for investigation

Disk queue length should be measured at the OS as well as all of the following queue lengths:

1. Disk queue length within the Operating System
2. Front end port queue length on the storage arrays
3. LUN queue length

Workload

The following counters can assist in identifying the type of workload within the Windows operating system

Windows performance counters

\PhysicalDisk(*)\Avg. Disk Bytes/Read
\PhysicalDisk(*)\Avg. Disk Bytes/Transfer
\PhysicalDisk(*)\Avg. Disk Bytes/Write
\PhysicalDisk(*)\Avg. Disk sec/Read
\PhysicalDisk(*)\Avg. Disk sec/Transfer
\PhysicalDisk(*)\Avg. Disk sec/Write
\PhysicalDisk(*)\Disk Read Bytes/sec
\PhysicalDisk(*)\Disk Write Bytes/sec
\PhysicalDisk(*)\Split IO/Sec (as few as possible)

Windows Systems

Additional Windows Performance Monitoring counters to collect to ensure there are no processor, memory, paging of other operations occurring within the operating system that could negatively impact the system and produce additional storage workload.

Windows perfmon counters

\Paging File(*)\*
\Processor(*)\% Processor Time
\Processor(*)\% User Time
\Processor(*)\% Privileged Time
\System\Processor Queue Length
\Memory\Available MBytes
\Memory\Pages Input/sec
\Memory\Pages/sec

Memory

Available Mbytes, should be greater than 100MB
Pages Input/sec, should be less than 10
Pages/Sec, slow disk subsystem greater than 100, fast subsystem greater than 600

Memory Manager

Memory Grants pending, at or close to zero, over zero or growing indicates an issue
Page Life Expectancy, should be greater than 300, lower or declining indicates memory pressure

Paging

%Usage
The amount of the page file currently in use, by percentage, should be less than 70%
%Usage Peak
The peak amount of page file used since the server was last booted, by percentage, should be less than 70%

CPU Activity

Processor: % Processor Time, 80% or less is ideal
Processor: % Privileged Time, should be less than 30% of the total % processor time
Processor: % User time, should be about 70% or more of the total % processor time
Process sqlservr % Processor Time, should be less than 80%
System: Processor Queue Length, should be less than 4, 5 – 8 good, over 8 to 12 fair

About these ads
Categories: Uncategorized
  1. February 20, 2013 at 12:39 pm
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 313 other followers

%d bloggers like this: