Database+Disk+Performance™: Disk Queue Length vs. Disk Latency Times: Which is Best for Measuring Database Performance

Thursday, June 22, 2006

Disk Queue Length vs. Disk Latency Times: Which is Best for Measuring Database Performance

Introduction

There are several common metrics that people use to look for performance bottlenecks. They are usually:

- Disk queue length

- Disk latency/transfer times

- MB/sec

- IO/sec

For the rest of this note, I’ll focus on the top two.

Disk Queue Length

The most common performance monitoring metric people quote is usually disk queue length. While this is important, now days it’s almost impossible to use as an accurate metric because the SAN used in an enterprise database application abstracts the DBA from a lot of the physical characteristics needed to make a reasonable judgment about what is good or bad.

The usual “rule of thumb” for “bad performance” is a queue length greater than 2 for a disk drive. But when you have a SAN with 100 drives in the cabinet, you have no idea how many are being used for your “drive F:” for your data file.

For example, the DBA watches perfmon and sees the disk queue length is 10 on drive F:. While this is a big number, it’s relative. If the F: is one physical disk, yes, you have a problem. If the F: is a logical disk composed of 8 physical disks, then it’s not an issue. Why? Because 8 disks x 2 queue length = 16… and the DBA sees 10… so 10 is less than 16 (our threshold for pain)… hence there is not an issue.

The main problem with queue length is this: very few people can ever provide me with a comprehensive disk map of their SAN showing all the logical volumes. For example, how many disks are participating in an EMC Sym hyper volume.

The following is a very small example of what a mapping would look like. Each of the “73” are 73G disk drives. You can see multiple drives participating in the LUN (Note, this is RAID 0+1 so you see 2 73G drives making up a 59.39G formatted capacity):

RG0, R1, 60.39 GB		RG1, R1, 60.39 GB		RG2, R1, 60.39 GB		RG3, R1, 66.61 GB
73	73	73	73	73	73	73	73
00_00	00_01	00_02	00_03	00_04	00_05	00_06	00_07
Private System Partition						LUN 30 (1 GB), Unallocated (65.61 GB)
LUN 0 (1 GB), Unallocated (59.39 GB)		LUN 10 (1 GB), Unallocated (59.39 GB)		LUN 20 (1 GB), Unallocated (59.39 GB)
LUN 0 (1 GB), Unallocated (59.39 GB)		LUN 10 (1 GB), Unallocated (59.39 GB)		LUN 20 (1 GB), Unallocated (59.39 GB)

Without a map like this, disk queue length is meaningless to me. I have nothing to calculate what the queue length should be less than.

With regards to SQL Server, it will not let the transaction log volume have a queue length of more than 3. Thus, the DBA may never see a “problem”. The reason for this is SQL Server is trying not to swarm the disk drives and will throttle back the work to the log. Since the database follows the Write Ahead Log (WAL) protocol, if it can’t write to the log fast enough, it will not be able to write to the data volumes either… and it will slow down the whole system’s throughput.

Per the previous posting on throughput testing, if you don’t have enough spindles to support the volume of data being pushed through the system, the whole box will slow down. This would be especially important in an enterprise app such as SAP or PeopleSoft that run large batch jobs and are writing to the transaction log constantly.

Disk Transfer Time

As you can read from above, disk queue length can be deceptive. You need a lot of information to make a good judgment.

Disk transfer time is what the disk sub system is reporting to the operating system. It abstracts the DBA from having to know the absolute physical characteristics of the underlying disk.

The key point is that it keeps the DBA from rat holing on a lot of very specific information that most people don’t have… namely, a comprehensive disk mapping of their SAN. It also distills a lot of other complicated measurements down to a few simple ones that mean the most.

For example, if I have a disk queue length of 20 on my F: drive, but my response time is less than 5ms… then everything is working great. What this probably means is that the F: drive has 20 (or more) spindles behind it or that there is massive caching taking place.

In the context of disk latency, the following is an example of modern performance characteristics of a SCSI disk drive:

• Maxtor Atlas 15K II

– Ultra320 SCSI

– Average 3ms, Max 8ms

– Max Sustained 98 MB/sec

– Burst 320 MB/sec

– 8M Cache

– 15K RPM

Source Maxtor: http://www.maxtor.com/_files/maxtor/en_us/documentation/data_sheets/atlas_15kii_data_sheet.pdf

To monitor the times:

Start Windows perfmon.
Use counter “Logical Disk”
Select “Avg Disk/Sec” and choose Read, Write or Transfer depending on what you want to monitor. I typically choose Transfer.

Note, when you read the perfmon data and see a number like “.010” this means 10 milliseconds.

Additionally, when monitoring a SAN, they often have their own perfmon counter you need to use. For example, EMC and Veritas have their own.

There is plenty of room for intelligent people to argue on the following data, but I use the following table to determine the meaning of the data:

10 ms		Good
10 ms	20 ms	Reasonable
20 ms	50 ms	Busy
50 ms +		Bad

I have a good friend that will credibly argue that anything more than 1 ms response times is a poorly configured SAN. He has customers that get in the 250 to 500 microsecond responds times.

Note, many times the performance problems are tied to firmware revisions, HBA configuration, and/or BIOS issues.

Summary

When you start focusing on response time, which is what really matters most, the queue length starts to become irrelevant. If you can get some kind of super drive that can handle everything, terrific! (Think: Solid State RAM Drives.)

What the latency metric does is cut through a lot of the red tape that architects usually have. This metric gives the DBA the ability to just tell the SAN architect when they lay out the LUNs is, “Slice the disk in a way where I get 5 ms disk access times. The rest is up to you.”

If the database can get sustained 5ms disk access times with a heavy stress test (ex. SQLIOSTRESS) then it can probably handle the load of the user base with the app running… which is what you ultimately want for success.

Powered By Qumana

9 Comments:

E said...: One question - I have not seen EMC-specific counters before. Can you give an example and how it might differ from the ones described here?; 5:43 PM
Anonymous said...: La gran ventaja de los sistemas raid, pasa por la redundancia manteniendo los tiempos de transferencia, por eso los niveles 0 de raid no son los mas efectivos. Pero ojo, que a veces los raid fallan y la recuperación de sus datos se puede convertir en una pesadilla. Si en un momento dado necesitais recuperar datos de varios discos duros en raid os recomiendo las siguiente web : http://www.lineared.com/es/recuperar/raid-discos-duros.htm; 2:27 PM
Unknown said...: Hi this is one of the clearest descriptions i've read. I wanted to show it to a DBA in a large organisation (the NHS) unfortunately they have a policy of disallowing social networking sites. (which is becoming more and more prevelant). Would you consider us posting this article with any apporpirate acknowledgements on our site (which is not blocked) please?; 11:50 AM
Unknown said...: Hi this is one of the clearest descriptions i've read. I wanted to show it to a DBA in a large organisation (the NHS) unfortunately they have a policy of disallowing social networking sites. (which is becoming more and more prevelant). Would you consider us posting this article with any apporpirate acknowledgements on our site (which is not blocked) please?; 11:51 AM
Anonymous said...: 静岡　一戸建て静岡注文住宅

 ブランド品　買取
 インプラント　
 家具付　賃貸　
 東京インプラント　
 パーティー　
 矯正歯科名古屋　; 9:50 PM
Anonymous said...: アメリカンホームダイレクト: Estimates easily auto insurance risk-segmentation. Support for compensating the content on the website. Benefits are also available with special rates for hotel and leisure facilities, offering various services.; 9:43 PM
Anonymous said...: （法新社倫敦四日電）英國情色大亨芮孟的a片下載公司昨天AV片說，芮孟日成人影片前去成人網站世，sex享壽八十二歲；色情這位身av價上億的房地產開發情色電影商，曾經在倫敦推成人網站出第一場脫衣舞表av演。

色情影片
芮孟的財產成人影片估計成人達六億五千萬英鎊（台幣將a片近四百億），由於他名下事業大多分布在倫敦夜生活區蘇活區色情，成人因此擁有「蘇活情色視訊之王」日本av的稱號。
部落格

他的成人電影公司「保羅芮孟集團」旗成人網站下發行多a片種情色雜av誌，包括「Razavzav女優leavdvd」、「男性世界」以及「Mayfai情色電影r」。色情a片
a片下載
色情
芮孟情色本名傑福瑞．安東尼．奎恩，父av女優親為搬運承a片包商。芮孟十五歲離開學校，矢言要在表演事部落格業留名，起先表演讀心術，後來成為巡迴歌舞雜耍表演av女優的製作情色人。

許多評論a片家成人電影認為，他把情色表演帶進主流社會，一九五九部落格年主持破天荒的脫衣舞表演，後來成人影片更靠著在蘇活區與成人光碟倫敦西區開發房地產賺得大筆財富。

有人形容芮孟是英國的海夫納，地位等同美國的「花花公子」創辦人海夫納。; 1:13 AM
Anonymous said...: you tube中文版|
skype中文版下載
 kmplayer繁體中文版
 迅雷5下載
 bt程式下載
 cs1.6主程式下載
 winrar免費下載
 nds遊戲下載
 KMPlayer下載
 無蝦米正式版xp
海賊王漫畫連載
 emule繁體中文版
 directx9.0免費下載
 office 2007下載
 無蝦米vista正式版
 爆爆王無敵程式
 rmvb播放程式下載
 遊戲天堂楓之谷
 生份證產生器
 最新線上遊戲排行榜
 photoimpact下載
 kmplayer繁體中文版
 戰鎚online官方網
 sao突襲online =========================================; 11:13 PM
Anonymous said...: パーティー　広島市賃貸　静岡注文住宅　家族葬　名古屋　新宿区探偵さいたま市探偵　船橋市探偵　神奈川県興信所　家出人調査　探偵開業探偵起業　ブランド品　買取　家具付賃貸　探偵独立開業　愛知県　葬儀　１戸建て査定１戸建て買取　１戸建て売却　静岡　一戸建て　; 1:02 AM

Database+Disk+Performance™

Thursday, June 22, 2006

Disk Queue Length vs. Disk Latency Times: Which is Best for Measuring Database Performance

9 Comments:

About Me

Previous Posts