Ok, recap:
- vSphere 4 introduced a new multipathing option for block devices (NMP Round Robin).
- This option works on both Active/Active storage platforms and Active/Passive storage platforms that support ALUA (Asymmetrical Logical Unit Access)
- NMP RR distributes IO for a device down all active paths (in the case of ALUA, it doesn’t use active-non-optimized paths), and uses a single path for a given number of IO operations, governed by a parameter (iooperationslimit)
- This value defaults to 1000, but can be changed. If you change it, there is a known issue with vSphere 4 that on ESX reboots that the value changes to a random value. (this was discussed here). This bug is expected to be fixed in vSphere 4 update 2.
So – should you change the value? In my opinion, no. Duncan did a great post on this topic here, which I commented on. He mentioned he was curious to see testing data from storage platform folks to show the results.
The conclusion (before I show all the testing) – there’s not enough benefit of changing the IOOperationLimit to change it from the default. I’m not implying that statement to others, you should defer to your particular storage vendor.
So… Here was the EMC testing results around this specific question. Read on if you’re interested. It was done with FC and IP storage.
So… What did we test? Answers below. BTW – this is still (IMO) a “non ideal” test – as it didn’t show even further scaling in terms of datastores, VMs (which is expected to make IOoperationslimit values even more neutral in a comparison), or under network/port congestion (which is expected to to benefit the adaptive/predictive PP/VE model more in a comparison), but this is a useful set of data. The really weird IOoperationslimit value is what it changed to on ESX reboot (included for completeness)
BTW – for folks that are interested in seeing the full whitepaper (which also added the iSCSI results), you can find it on Powerlink here.
Storage: CLARiiON CX4‐480 @ FLARE R29
- 80 x 450GB FC disks – 16 x 5‐disk RAID Groups – 1 x 125GB RAID‐5 LUN per RG – Oracle Orion
- 120 x 450GB FC disks – 4 x 4‐disk R1/0 RAID Groups & 1 x 8 disks R1/0 – MSFT JetStress
- Page Size = 8k
- Read Cache = 500MB per SP
- Write Cache = 3.6GB
- High Watermark: 70
- Low Watermark: 50
- Fibre Channel attach to SPA0, SPA1, SPB0, SPB1
- Connectivity Status/Failover Mode of ESX HBA’s: 4
Host:
- 16 x 3GHz CPU, 132GB RAM – VMware vSphere v4.0.0 (164009)
Virtual Machines:
- 2 x Oracle Orion (Win2008 x64) – 4 CPU, 16GB RAM
- 6 x Microsoft JetStress (Win2008 x64) – 4 CPU, 24GB RAM
SAN:
- 2 x LPe11000 4Gb HBA
- Zoning: HBA1 – SPA0, HBA1 – SPB1, HBA2 – SPB0, HBA2 – SPA1
Phase I Test
- 1 VM with 4 x 125GB LUNs (20 disks)
- PowerPath/VE @ Test1: 8K – 32K Random IO 40% Writes
- PowerPath/VE @ Test2: 8k – 1024K Random IO 40% Writes
- VMware NMP RR iops=1000 @ Test1: 8K – 32K Random IO 40% Writes
- VMware NMP RR iops=1000 @ Test2: 8k – 1024K Random IO 40% Writes
- VMware NMP RR iops=1 @ Test1: 8K – 32K Random IO 40% Writes
- VMware NMP RR iops=1 @ Test2: 8k – 1024K Random IO 40% Writes
BTW – we then tested under a whole pile of other circumstances – listed below.
Phase II Test
- 1 VM with 8 x 125GB LUNs (40 disks)
- PowerPath/VE @ Test3: 8K – 32K Random IO 40% Writes
- PowerP ath/VE @ Test4: 8k – 1024K Random IO 40% Writes
- VMware NMP RR iops=1 @ Test3: 8K – 32K Random IO 40% Writes
- VMware NMP RR iops=1 @ Test4: 8k – 1024K Random IO 40% Writes
Phase III Test
- 2 VM with 8 x 125GB LUNs each (80 disks)
- PowerPath/VE @ Test3: 8K – 32K Random IO 40% Writes
- VMware NMP RR iops=1 @ Test3: 8K – 32K Random IO 40% Writes
Phase IV Test
- 2 VM’s with 8 x 125GB LUNs each (40 disks per VM – 80 in total)
- PowerPath/VE @ Test1: Tests 5 – 13: Constant 8K random IO at varying loads
- VMware NMP RR iops=1 @ Tests 5 – 13: Constant 8K random IO at varying loads
Phase V Test
- 2 VM’s with 8 x 125GB LUNs each (40 disks per VM – 80 in total) – Oracle Orion
- 1 VM with 32 LUNs (16 x 180GB & 16 x 40GB) ‐ 4000 Very Heavy Exchange 2007 users ‐ MSFT JetStress
- PowerPath/VE ‐ Tests 14 – 17 (Constant Orion & varying JetStress loads)
- VMware NMP RR iops=1 ‐ Tests 14 – 17 (Constant Orion & varying JetStress loads)
- VMware NMP RR iops=1000 ‐ Tests 14 – 17 (Constant Orion & varying JetStress loads)
Phase VI Test
- 1 VM with 32 LUNs (16 x 180GB & 16 x 40GB) ‐ 4000 Very Heavy Exchange 2007 users ‐ MSFT JetStress at
- varying JetStress Loads
- PowerPath/VE ‐ Tests 18 – 22
- VMware NMP RR iops=1 ‐ Tests 18 – 22
- VMware NMP RR iops=1000 ‐ Tests 18 – 22
- VMware NMP RR iops=1496702496 ‐ Test 22
- Test 23 – 1 path failure (1 of 4) at PP/VE, RR iops=1 & RR iops=1000
Phase VII Test
- JetStress with 4000 Very Heavy Users (1VM) to 24000 Very Heavy Users (6 VMs)
Chad - Great data, thanks for sharing. I would share that these results closely resemble what we have seen with our in-house testing, which is to say the difference in the EMC reported performance results 8.4%.
Look at the results in a relative relationship with the highest value representing 100% obtainable I/O
PP/VE = 100%
NMP RR default = 91.6%
NMP RR 1 IOP = 96.6%
NMP RR 1496702496 = 98.3%
I would suggest that vSphere customers should feel very confident that NMP can address their most demanding workloads, wouldn't you agree?
Posted by: Vaughn Stewart | April 11, 2010 at 10:32 AM
@Vaughn - I would say that NMP RR is excellent, and a great choice. PP/VE is better.
This workload didn't drive network or HBA congestion, have a high degree of random variation in the workloads and IO sizes from the guest, or target array port congestion - those are all the things where adaptive/predictive queuing is better. Creating that kind of test harness is difficult, but not impossible. Conversely, that's exactly the day-to-day at a large scale customers all over the world.
Lab tests tend to be relatively "clean". The real world is messy.
Also, the key is that PP/VE doesn't just change the PSP (path selection plugin) behavior, but the SATP (Storage Array Type plugin) behavior. Things like automated path discovery, proactive path testing (even in periods of no IO), the bigger you are, the more important those operational/manangement things are.
Look, that's not to pooh-pooh NMP RR, EMC supports it, embrace it, and it's free. If you look, there are boatloads of "free in the box" optimization in the native, free SATP around EMC platforms (along with others) - results of work between the engineering teams.
Like I've said: "NMP in the past = no so good, NMP in vSphere with RR = better; PP/VE = best" PP/VE is also not free.
Thanks for the comment!
Posted by: Chad Sakac | April 11, 2010 at 11:41 AM
Any testing allready done with VNX and vSphere 5 on this?
Also the full whitepaper you refer to on Powerlink does not include test results of other NMP tests, would be great to see graphics also on the others.
One other thing that I was wondering about: the test in this post refers to 4x FC HBA ports, but the esxtop VMware output in the whitepaper only shows 2x FC HBA's in use... Are you speaking about the same tests?
Posted by: Jan | March 27, 2012 at 02:58 AM