/* Database+Disk+Performance™: June 2006 */ Database+Disk+Performance™ /* */ /* -- -- */ /* Footer ----------------------------------------------- */ #footer { width:660px; clear:both; margin:0 auto; } #footer hr { display:none; } #footer p { margin:0; padding-top:15px; font:78%/1.6em "Trebuchet MS",Trebuchet,Verdana,Sans-serif; text-transform:uppercase; letter-spacing:.1em; }

Monday, June 26, 2006

Disk Performance Problem Isolation Techniques: Simple Tools

Introduction: “I told you SQL Server couldn’t scale…”

The usual conversation goes like this… “I told you SQL Server couldn’t scale…” and so the finger pointing begins. As a student of philosophy in college, I learned about rhetoric and baiting techniques. As a DBA, I had to learn to fight back and not be on the defensive from such attacks. From personal experience, I have managed many databases with tens of thousands of users that were multi-terabyte and mission critical… so I knew that these statements were spurious.

To me, the issue wasn’t whether the database could scale, but whether the customer architected a scalable architecture that could handle the load and meet the business SLA.

In a database server running an enterprise application, there are many moving main moving parts. Server, network, OS, app servers, web servers, batch servers, software, HBAs, fabric switches, application customization, etc… All can lead to issues. Regrettably, it’s usually the database that gets the blame.

This note will outline tools and techniques to eliminate the database software as the bottleneck and help the team start looking for the real issue, which for database scalability is usually the disk sub system.

Tools & Usage

The first objective is to test outside the SQL Server environment. The reason for this is to eliminate the database as an issue. If you get poor performance outside of SQL Server, then it’s not the database, it’s something else… usually the disk configuration.

Disk Copy

The easiest test to run is a file copy. Simply find a large file, for example, 2G in size and then copy it from one disk to another. You can create this test file by making a test database with a data file of 2G. Detach the database files using sp_detach_db. Then xcopy the files from your drive F: to drive G:.



( NAME = test_dat,

FILENAME = 'c:\tmp\test_dat.mdf',

SIZE = 2000MB)


( NAME = test_log,

FILENAME = 'c:\tmp\test_log.ldf',

SIZE = 1MB )


sp_detach_db test


Another way to create a large file is to use the Windows NT Resource utility "creatfil.exe" which can create a large fixed size file.

Then go to a DOS command prompt and run the following batch file:

time /t

xcopy F:\temp\test_dat.mdf G:\temp\test_dat.mdf

time /t

At this point there are few things to remember:

  1. Make sure you are copying data to physical different drives. Please remember that different LUNs can still share the same physical drives and controller.
  2. Run a PERFMON trace and look at the logical transfer rates. This is outlined in the previous note on latency. If you are seeing times greater than 10 ms, you probably have a configuration issue of your SAN. You may need more spindles.
  3. Look at how many MB/sec you are pushing through. Take the size of the data file (ex. 100MB) and divide it by how many seconds it took to copy it. To get an idea of throughput, the following table gives a broad idea to compare your results with:


34 MB/sec

Poor SAN

35 MB/sec

Good SAN

180 MB/sec

Great SAN

412 MB/sec

Lastly, run the same test on another server that is running “well”. Compare the results.


SQLIOSTRESS is a free tool that Microsoft offers to create a disk load that simulates SQL Server activity. Please note, you do not have to have SQL Server installed on the server to run it. Hence, it makes a good tool to run prior to creating your database and installing a lot of software that you may have to redo in case you have to flatten and rebuild your SAN to get adequate numbers.

SQLIOSTRESS can be obtained at this location:


The tool will give the DBA plenty of information. What I like to do is run the tool and then monitor the IO patterns with PERFMON.


IOMETER is another tool that creates a large workload on the disk drive sub system. The tool is not database specific and allows the user to configure the type of workload and IO patterns that need to be run against the disk This tool is also free. It can be downloaded from:


Again, use PERMON to watch while the tool is running. Look at the output reports from IOMETER.


Lastly, another Microsoft tool that can be downloaded is SQLIO. The tool is located at:


This tool will generate data and give an output report. A sample run on a small file from my laptop looks like this:

C:\Program Files\SQLIO>sqlio -kR -s360 -fsequential -o8 -b64 -LS -dE -Fparameter.txt
sqlio v1.5.SG
using system counter for latency timings, -1302407296 counts per second
parameter file used: parameter.txt
file C:\TestFile\TestFile.dat with 16 threads (0-15) using mask 0x0 (0)
16 threads reading for 360 secs from file C
using 64KB sequential IOs
enabling multiple I/Os per thread with 8 outstanding
using specified size: 100 MB for file: C:\testdir\test.dat
initialization done
throughput metrics:
IOs/sec: 6755.07
MBs/sec: 422.19
latency metrics:
Min_Latency(ms): 2
Avg_Latency(ms): 18
Max_Latency(ms): 47


Secure a good baseline by running some of the above tools and processes. Then compare the numbers to other “good” systems you have running and to other published data. Work with your drive vendor if the throughput numbers are not like you would suspect. Keep in mind, you get what you pay for. If you spend little, don’t expect 500 MB/sec throughput!

After running the above types of test, I’ll often run a BACKUP to a NUL: device and then compare a database operation to a non-database run. For example, what is the MB/sec throughput of a BACKUP to NUL: versus an XCOPY. If they are the same, then you know the database is not the issue and it’s the disk that is probably the bottleneck.

The key idea in problem isolation and resolution is figure out which part of the equation is causing the problem To do that, you need to break up the variables and test them independently. The above tools give you mechanisms in which to do this.


Microsoft TechNet Article: Ruling Out Disk-Bound Problems

Focuses on perfmon counters. Covers Exchange and some database.


IMPORTANT NOTE: When running these tests, make sure that the data used is greater than the size of the SAN cache. The reason for this is the SAN cache will buffer the file and distort the results. For example, if the SAN cache is 1G in size, then make the file used in XCOPY or SQLIO to be 2G in size.

Powered By Qumana

Finding the Top 10 Long Running Queries


When having a performance issue, the first thing the DBA needs is to define what the problem is. The first thing I ask when someone says, “it’s running slow…” is to respond, “can you please give me a list of the top 10 worst queries.” Usually, the response is, “I don’t know exactly what they are…”

This note will explain how to isolate the queries by letting the computer tell you where the problems are.

The process is simple, it encompasses the following methodology:

  1. Turn on SQL Server Profiler
  2. Run it for a few hours filtering on long duration or high reads
  3. Save the profiler trace into a temporary table
  4. Run a few queries against the data
  5. Prioritize them as a working list to attack

The key concept: Long running queries hammer the disk and cause poor cache hit ratios. If too many users run them, the disk subsystem can suffer because a few users are monopolizing all the resources.

Collecting the Data

Typically, I’ll start up profiler and run it for 2 or 3 hours to capture a representative sample of data. Then I’ll use this information to make my decisions. This data collected will serve as a baseline for whether I got better or worse as I tune.

  1. Start up SQL Server Profiler. Collect on these two events:
    1. RPC:Completed
    2. SQL:BatchComplete
    3. These two will show queries that have completed.
  2. Filter on columns:
    1. Duration and/or,
      1. The criteria should start off with 30,000
        1. The unit of measure is milliseconds, hence, 30,000 = 30 seconds.
    1. Reads
      1. The criteria should start with 10,000
        1. The unit of measure is 8K. 10,000 reads = 81,920,000 bytes of IO. If you are doing 81M of IO, you probably have a query that needs investigating!

Let the trace run for a while. Then stop is and “Save As” a profiler trace file. Once it’s in a file, the DBA can start analyzing the data.

Rolling Up Queries

Usually, the easiest way to analyze the information is from within SQL Server. Import the trace file and then run queries against it to find the problems.

The trace file itself has the issues in it. We’ve already filtered for long running queries. Now, we just need to organize the data a bit.

First import the trace file using the following SQL Server function call:

use tempdb


SELECT IDENTITY(int, 1, 1) AS RowNumber, * INTO profiler_analysis

FROM ::fn_trace_gettable('c:\tmp\profilerdata.trc', default)


Next, get an idea of what you are looking at. For example, how much IO occurred for the monitoring run? What was the overall duration for all the long running queries?

select sum(Reads)*8192. 'Bytes Read' from profiler where Reads is not NULL;


Bytes Read



(1 row(s) affected)

select sum(Duration)/1000. 'Number of Seconds' from profiler_analysis where Duration is not NULL;


Number of Seconds



(1 row(s) affected)

select sum(Duration)/1000./3600.'Number of Hours' from profiler_analysis where Duration is not NULL;


Number of Hours



(1 row(s) affected)

The following query shows the total amount of Reads by user:

select convert(char(20),LoginName)'User Name', sum(Reads)*8192. 'Total Bytes Read'

from profiler_analysis

where Reads is not NULL

group by LoginName

order by sum(Reads) desc


User Name Total Bytes Read

-------------------- ---------------------------------------

jde 178276974592

sa 53321981952

usera 20445822976

userb 10917101568

userc 5227069440

userd 2638151680

usere 2081947648

userf 2063392768

userg 1147445248

userh 670384128

useri 406921216

userj 316260352

userk 169639936

userl 55287808

userm 43941888

usern 19152896

usero 9584640

userp 4866048

userq 2252800

(19 row(s) affected)

The following query shows the total amount of seconds by user:

select convert(char(20),LoginName)'User Name', sum(Duration)/1000. 'Seconds Run'

from profiler_analysis

where Duration is not NULL

group by LoginName

order by sum(Duration) desc


User Name Seconds Run

-------------------- ---------------------------------------

jde 5456.860000

JDEService 1999.540000

sa 313.579000

usera 240.462000

userb 176.452000

userc 135.483000

userd 115.636000

usere 100.881000

userf 90.918000

userg 76.247000

userh 52.656000

useri 40.941000

userj 37.466000

userk 28.084000

userl 19.438000

userm 11.656000

usern 11.329000

usero 4.673000

userp 2.640000

(19 row(s) affected)

Finally, these two queries show the DBA the top 10 queries for Reads and Duration:

select top 10 RowNumber, Duration, Reads, LoginName

from profiler_analysis

order by Reads desc


RowNumber Duration Reads LoginName

----------- -------------------- -------------------- -----------

485 257230 3886609 sa

239 87690 1370174 usera

853 101810 1264835 userb

629 142370 1264577 jde

7 118890 1264197 JDE

747 8596 801035 sa

289 13970 740066 sa

264 7063 661617 sa

665 8576 356531 jde

193 3483 313031 userb

(10 row(s) affected)

select top 10 RowNumber, Duration, Reads, LoginName

from profiler_analysis

order by Duration desc


RowNumber Duration Reads LoginName

----------- -------------------- -------------------- -----------

503 335213 23 JDEService

502 333026 631 JDEService

485 257230 3886609 sa

528 224200 108896 jde

831 203590 2 JDEService

347 184183 103651 jde

532 181400 14 JDEService

627 175056 77320 jde

411 153933 307751 JDE

823 152746 23 JDEService

(10 row(s) affected)

To find the actual query for RowNumber 485 run a select statement and get the TextData column which will have the statement. The following analysis shows that the high IO and duration were due to an index being built:

select RowNumber, Duration, Reads,TextData

from profiler_analysis

where RowNumber = 485


RowNumber Duration Reads TextData

----------- -------------------- -------------------- ---------------

485 257230 3886609


The query for RowNumber 155 shows that it did 162,386 Reads for 1,330,266,112 bytes probably because the user put a wild card on the front and back of the criteria: F4220 WHERE (F4220.SWLOTN LIKE @P1) '%208547%' And forced a table scan.

select RowNumber, Duration, Reads,TextData

from profiler_analysis

where RowNumber =155


RowNumber Duration Reads

----------- -------------------- --------------------

155 10186 162386

(1 row(s) affected)

declare @P1 int set @P1=180151263 declare @P2 int set @P2=1 declare @P3 int set @P3=1 declare @P4 int set @P4=5 exec sp_cursoropen @P1 output, N'SELECT F4220.SWSHPJ, F4220.SWLITM, F4220.SWDCTO, F4220.SWSRL2, F4220.SWDSC2, F4220.SWSRL1, F4220.SWLOTN, F4220.SWLOCN, F4220.SWAITM, F4220.SWSFXO, F4220.SWDOCO, F4220.SWAN8, F4220.SWITM, F4220.SWMCU, F4220.SWDSC1, F4220.SWLNID, F4220.SWORDJ, F4220.SWKCOO, F4220.SWVEND, F4220.SWSHAN FROM PRODDTA.F4220 F4220 WHERE (F4220.SWLOTN LIKE @P1) ORDER BY F4220.SWLITM ASC, F4220.SWLOTN ASC, F4220.SWSRL1 ASC ', @P2 output, @P3 output, @P4 output, N'@P1 varchar(8000) ', '%208547%' select @P1, @P2, @P3, @P4


The above queries collect data and roll it up into a digestable format instead of a pile of random bits. Once the DBA has the performance data they can prioritize the issues and figure out a plan of attack. For example, change the configuration of the application so users cannot put in leading wildcards that force a table scan.

Knowing your system and where it’s issues are forms the backbone of being able to isolate and resolve performance problems. The good news is it’s to do with a small tool box.

The key concept is to let SQL Server collect the information, and using a few simple queries shown above, show you where the exact problems exist.

Powered By Qumana

Thursday, June 22, 2006

Disk Queue Length vs. Disk Latency Times: Which is Best for Measuring Database Performance


There are several common metrics that people use to look for performance bottlenecks. They are usually:

- Disk queue length

- Disk latency/transfer times

- MB/sec

- IO/sec

For the rest of this note, I’ll focus on the top two.

Disk Queue Length

The most common performance monitoring metric people quote is usually disk queue length. While this is important, now days it’s almost impossible to use as an accurate metric because the SAN used in an enterprise database application abstracts the DBA from a lot of the physical characteristics needed to make a reasonable judgment about what is good or bad.

The usual “rule of thumb” for “bad performance” is a queue length greater than 2 for a disk drive. But when you have a SAN with 100 drives in the cabinet, you have no idea how many are being used for your “drive F:” for your data file.

For example, the DBA watches perfmon and sees the disk queue length is 10 on drive F:. While this is a big number, it’s relative. If the F: is one physical disk, yes, you have a problem. If the F: is a logical disk composed of 8 physical disks, then it’s not an issue. Why? Because 8 disks x 2 queue length = 16… and the DBA sees 10… so 10 is less than 16 (our threshold for pain)… hence there is not an issue.

The main problem with queue length is this: very few people can ever provide me with a comprehensive disk map of their SAN showing all the logical volumes. For example, how many disks are participating in an EMC Sym hyper volume.

The following is a very small example of what a mapping would look like. Each of the “73” are 73G disk drives. You can see multiple drives participating in the LUN (Note, this is RAID 0+1 so you see 2 73G drives making up a 59.39G formatted capacity):

RG0, R1, 60.39 GB

RG1, R1, 60.39 GB

RG2, R1, 60.39 GB

RG3, R1, 66.61 GB

















Private System Partition

LUN 30 (1 GB), Unallocated (65.61 GB)

LUN 0 (1 GB), Unallocated (59.39 GB)

LUN 10 (1 GB), Unallocated (59.39 GB)

LUN 20 (1 GB), Unallocated (59.39 GB)

Without a map like this, disk queue length is meaningless to me. I have nothing to calculate what the queue length should be less than.

With regards to SQL Server, it will not let the transaction log volume have a queue length of more than 3. Thus, the DBA may never see a “problem”. The reason for this is SQL Server is trying not to swarm the disk drives and will throttle back the work to the log. Since the database follows the Write Ahead Log (WAL) protocol, if it can’t write to the log fast enough, it will not be able to write to the data volumes either… and it will slow down the whole system’s throughput.

Per the previous posting on throughput testing, if you don’t have enough spindles to support the volume of data being pushed through the system, the whole box will slow down. This would be especially important in an enterprise app such as SAP or PeopleSoft that run large batch jobs and are writing to the transaction log constantly.

Disk Transfer Time

As you can read from above, disk queue length can be deceptive. You need a lot of information to make a good judgment.

Disk transfer time is what the disk sub system is reporting to the operating system. It abstracts the DBA from having to know the absolute physical characteristics of the underlying disk.

The key point is that it keeps the DBA from rat holing on a lot of very specific information that most people don’t have… namely, a comprehensive disk mapping of their SAN. It also distills a lot of other complicated measurements down to a few simple ones that mean the most.

For example, if I have a disk queue length of 20 on my F: drive, but my response time is less than 5ms… then everything is working great. What this probably means is that the F: drive has 20 (or more) spindles behind it or that there is massive caching taking place.

In the context of disk latency, the following is an example of modern performance characteristics of a SCSI disk drive:

Maxtor Atlas 15K II

Ultra320 SCSI

Average 3ms, Max 8ms

Max Sustained 98 MB/sec

Burst 320 MB/sec

8M Cache


Source Maxtor: http://www.maxtor.com/_files/maxtor/en_us/documentation/data_sheets/atlas_15kii_data_sheet.pdf

To monitor the times:

  1. Start Windows perfmon.
  2. Use counter “Logical Disk”
  3. Select “Avg Disk/Sec” and choose Read, Write or Transfer depending on what you want to monitor. I typically choose Transfer.

Note, when you read the perfmon data and see a number like “.010” this means 10 milliseconds.

Additionally, when monitoring a SAN, they often have their own perfmon counter you need to use. For example, EMC and Veritas have their own.

There is plenty of room for intelligent people to argue on the following data, but I use the following table to determine the meaning of the data:

10 ms


10 ms

20 ms


20 ms

50 ms


50 ms +


I have a good friend that will credibly argue that anything more than 1 ms response times is a poorly configured SAN. He has customers that get in the 250 to 500 microsecond responds times.

Note, many times the performance problems are tied to firmware revisions, HBA configuration, and/or BIOS issues.


When you start focusing on response time, which is what really matters most, the queue length starts to become irrelevant. If you can get some kind of super drive that can handle everything, terrific! (Think: Solid State RAM Drives.)

What the latency metric does is cut through a lot of the red tape that architects usually have. This metric gives the DBA the ability to just tell the SAN architect when they lay out the LUNs is, “Slice the disk in a way where I get 5 ms disk access times. The rest is up to you.”

If the database can get sustained 5ms disk access times with a heavy stress test (ex. SQLIOSTRESS) then it can probably handle the load of the user base with the app running… which is what you ultimately want for success.

Powered By Qumana