« I probably shouldnt do this, but my proudest EMC moment | Main | Oracle on VMware its time for us to speak up or shut up! »

September 21, 2009

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Vaughn Stewart

Chad – thanks for the effort in making this happen. I trust the VMware customers, regardless if the are Dell, EMC, HP, or NetApp customers, really appreciate the content.

What’s next – a thorough discussion of Multipathing with vSphere?

Cheers!

twitter.com/sthoppay

Tip for the readers:

If you like to see the iSCSI session, connection information, try the following command on ESX.

cat /proc/scsi/iscsi_vmk/5

Chad Sakac

Vaughn my pleasure. I'm always up for a collaborative effort. Much more fulfilling than the silly negative selling I see out there so often.

Cameron Smith

I'm having trouble with MPIO on LeftHand ...

My LH VIP : .100
My LH storage node IPs : .110 , .111, .112, .113, etc
My VM iscsi IPs : .210, .211, .212, .213, etc

When I rescan my iscsi adapter I get 4 paths with a default setting of "Fixed". This host has 4 NICs configured per the article with jumbo frames enabled. Each of the paths seem to be attaching directly to one of the nodes ".112".

With a "Fixed" setting I get roughly 90MB/sec writes.
With a "Round Robin" setting I get roughly 50MB/sec writes.

I have 3 LeftHand nodes with 15x 750GB SATA.

1) Any thoughts on worse performance with RR?
2) Should I expect better than 90MB/sec writes with RR working correctly?
3) I expected to see 4 different target IPs - is that the case, or is this what you were referring to above when you say that LeftHand systems use a single target? I assumed you were talking about the VIP with that statement and not the storage node addresses.

BTW: thanks for the article - excellent information to have all in the same place!!

Thanks

Rob D.

Chad,

Thanks for updating this topic to vSphere. There are so many changes it really did warrant a new article. I initially built our vSphere hosts with two vmkernel iSCSI pNICs and two dedicated pNICs for guest iSCSI connections. I find I'm not really using the two guest iSCSI NICs. At this point I think I'll add those NICs to the vmkernel connections, as it looks like ESX can take advantage of them now. Any additional thoughts on that subject in vSphere?

Dominic

Awesome article, thanks for posting!

Any word on when iSCSI multipath will be supported on a DVS?

Chad Sakac

@Cameron - are you testing with one LUN, or many (across many ESX hosts)? This behavior (RR being worse than fixed/MRU) is often triggered by the IOOperationLimit (default is 1000) when the LUN count is small (1000 IO operations before changing paths).

@Rob - glad you liked the post - it was a labor of love for my fellow partners and I. You got it on the configuration! some of the details vary depending on your iSCSI target. What are you array are you using - and I'll be happy to help further...

@Dominic - this is a case of qual more than anything (VERY important, but a different class of issue than known techncial issue) - working on it hard. Will get the target date and get back to you.

Cameron Smith

Chad,

I'm testing this with the VSA and realized that the VSA doesn't support jumbo frames (although the CMC thinks it does ...)

After turning off jumbos, RR is performing evenly with Fixed ~80MB and increases to ~90MB/sec when I fiddle with the iops threshold with values between 1-100.

My expectation was that bandwidth for any single LUN would scale linearly with the number of NICs (4 adapters == 4 x the amount of bandwidth), and that we would be limited by the bandwidth consumed by all VMs or by the bandwidth available at the SAN. If I'm way off on that assumption can you point me in the right direction?

Thanks so much!

www.facebook.com/profile.php?id=505438434

Hi,

Thanks for the great post, this helped me a lot to go after performance on our MSA2012. I set up the NMP and Jumbo Frames. Now I got a weird problem. Before I changed the iops parameter I got following readings in my performance test (just used dd on a VM):

BS=8192k, Count= 1000 Write 109MB/s Read 102MB/s
BS=512k, Count=10000 Write 121 MB/s Read 104MB/s
BS=64k, Count=100000 Write 111MB/s Read 105MB/s

Then I started to play with the following:

esxcli nmp roundrobin setconfig --device naa.something --iops 3 --type iops

BS=8192k, Count= 1000 Write 86,7 MB/s Read 90,9MB/s
BS=512k, Count=10000 Write 92,4 MB/s Read 86,1MB/s
BS=64k, Count=100000 Write 87,9 MB/s Read 96,2MB/s

So I decided to switch back to the original 1000 setting. But now I cannot get the same write performance anymore! Is there something that I have missed here? Should I somehow change something else as well in addition to the iops parameter?

Best regards,

Andy

Cameron Smith

Andy,

I experienced the same and the only thing that seemed to work for me was to set to "Fixed" in the GUI and then back to round robin ... that seemed to get me back to my baseline.

Best,

Cameron

www.facebook.com/profile.php?id=505438434

Hi,

Thanks for the info. I tried that already but still no effect :) I have to check out my configuration once more. Does changing to Fixed and back reset the NMP settings somehow?

What kind of performance are you getting btw?

Andy

Cameron Smith

Andy,

That's strange, it seemed to work for me.

I'm getting roughly 120-130MB/sec writes with the LeftHand nodes - when I enable the network RAID level 2 I get right around 75-80MB/sec.

Reads are currently peaking around 105MB/sec.

I think I'm using similar methods as you: dd if=/dev/zero and hdparm -t for reads.

I've also run bonnie++ with a similar result.

One thing I'm starting to realize is that (at least with my SAN), this post leads to better SAN performance and redundancy and not necessarily better individual VM performance. Anyone, please feel free to correct me if I'm wrong here.

DaveM

Hi Chad!

Thanks for all your hard work on this blog. Your recent iSCSI posts have definitely stirred MUCH discussion among my peers. :)

One thing we can't really find a definitive/good answer to is this...

Is there really any benefit to having more than a 1:1 ratio of vmkernel ports to each physical NICs? I've seen a few documents (mostly from EQL) that actually suggest binding 2 or even 3 vmkernel ports to a single NIC, but I just don't understand what is to be gained by that. Sure, it allows multiple paths to the same volume, but it's still all going through the same physical NIC.

If you could explain that, or cite some references elsewhere, I (we) would greatly appreciate it.

Thanks!

Dave

farewelldave

Chad,

Thanks a lot for this post. Very useful, and a great resource for many diverse bits of information in one spot.

Jeff Byers

Hello,

Great job on this post about vSphere and iSCSI multi-path!

It all works much as you show, except that I cannot get the
NMP 'policy=' settings to work properly after an ESX host
reboot.

When I set the policy 'type' to 'iops', with an '--iops'
value of '10', it works, but after a reboot, the '--iops'
value gets reset to a crazy large value of '1449662136',
which is much worse than the default of '10000'.

OK, so I decided to try the policy 'type' set to 'bytes',
with a '--bytes' value of '11'.

Unfortunately, although this policy value does stick after a
reboot, when the policy is set to 'type' 'bytes', no round-
robin multipath seems to occur at all, even before the
reboot.

What am I doing wrong?

I remember reading somewhere that there was a similar
problem with the '--iops' value not sticking in ESX 3.5, but
it now seems to be worse.

Thanks.

~ Jeff Byers ~

# esxcli nmp device setpolicy -d naa.600174d001000000010f003048318438 --psp VMW_PSP_RR

# esxcli nmp roundrobin setconfig -d naa.600174d001000000010f003048318438 --iops 10 --type iops

# esxcli nmp device list -d naa.600174d001000000010f003048318438
naa.600174d001000000010f003048318438
Device Display Name: StoneFly iSCSI Disk (naa.600174d001000000010f003048318438)
Storage Array Type: VMW_SATP_DEFAULT_AA
Storage Array Type Device Config:
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=iops,iops=10,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
Working Paths: vmhba34:C1:T0:L0, vmhba34:C2:T0:L0

# sync;sync;reboot

# esxcli nmp device list -d naa.600174d001000000010f003048318438
naa.600174d001000000010f003048318438
Device Display Name: StoneFly iSCSI Disk (naa.600174d001000000010f003048318438)
Storage Array Type: VMW_SATP_DEFAULT_AA
Storage Array Type Device Config:
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=iops,iops=1449662136,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
Working Paths: vmhba34:C1:T0:L0, vmhba34:C2:T0:L0

=====

# esxcli nmp roundrobin setconfig -d naa.600174d001000000010f003048318438 --iops 10 --type iops

# esxcli nmp roundrobin setconfig -d naa.600174d001000000010f003048318438 --bytes 11 --type bytes

# esxcli nmp device list -d naa.600174d001000000010f003048318438
naa.600174d001000000010f003048318438
Device Display Name: StoneFly iSCSI Disk (naa.600174d001000000010f003048318438)
Storage Array Type: VMW_SATP_DEFAULT_AA
Storage Array Type Device Config:
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=bytes,iops=10,bytes=11,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}
Working Paths: vmhba34:C1:T0:L0, vmhba34:C2:T0:L0

[root@esx-223 ~]# sync;sync;reboot

[root@esx-223 ~]# esxcli nmp device list -d naa.600174d001000000010f003048318438
naa.600174d001000000010f003048318438
Device Display Name: StoneFly iSCSI Disk (naa.600174d001000000010f003048318438)
Storage Array Type: VMW_SATP_DEFAULT_AA
Storage Array Type Device Config:
Path Selection Policy: VMW_PSP_RR
Path Selection Policy Device Config: {policy=bytes,iops=1000,bytes=11,useANO=0;lastPathIndex=0: NumIOsPending=0,numBytesPending=0}
Working Paths: vmhba34:C1:T0:L0, vmhba34:C2:T0:L0

Tim Tyndall

Chad,
Thanks for both this and the previous multivendor ISCSI post. I found both very informative and focused on solutions for those deploying ISCSI in their VMWare environments regardless of the storage vendor.

We are currently running ESX 3.5 in our data center with redundant 10gig nic connections (Neterion) with jumbo frames (I know it's not officially supported but were not seeing any issues with it) and a LeftHand SAN. With the LeftHand SAN providing a different target for each LUN and our using link aggregation, I'm seeing very consistent load balancing across both NICs in our current 3.5 deployment. Given that link aggregation will provide faster convergence in the event of a NIC/link failure are there other compelling reasons for me to use ISCSI MPIO instead of my current setup as we migrate to vSphere.

Tim

John

Thanks for a great post.

However, for the jumbo frame section, I would add a line about configuring vSwitch MTU. It seems that every article I could find about multipathing, iSCSI and jumbo frames includes configuration for the nic and hba but not the vSwitch.

esxcfg-vswitch -m 9000


Remy Heuvink

I'm struggling with setting the routing correct.
We use different VLAN for ESX console, iSCSI network and for storage network.
iSCSI connections with Netapp are succesfull and I see all my LUNS untill I add the iSCSI kernels to the software HBA adapter.
(esxcli –-server swiscsi nic add -n -d )
After using this command my LUN's dissapear after a rediscover?
I stil can ping my iSCSI kernels but the Netapp does not see any succesful connections. I think it has to do with setting up the routing correctly. (third topic in this post) Where/how can I add the route?

Ian

Note that iSCSI Multipathing is not currently supported with Distributed Virtual Switches, either the VMware offering or the Cisco Nexus 1000V. Changes are underway to fix this and allow any virtual switch to be supported.

Any indication when this will be supported?

Arian van der Pijl

Superb post!
Can anyone comment on the guest side of the iSCSI vSphere story?
Should the guest continue to use their own iSCSI initiator (like Windows 2008 Server R2) for performance reasons?

gallopsu

We are set up with CX4 / ALUA / RR. Whenever we reboot a host all our LUNs trespass from SPA to SPB. Anyone else experiencing this?

stuart

The solution is edit the rc.local file (which basically is a script that runs at the end of boot up) to set the limit IOPS limit on on all luns

Enter this command, the only variable you will need to change is the “naa.600” witch pertains to the identifier on array

for i in `ls /vmfs/devices/disks/ | grep naa.600` ; do esxcli nmp roundrobin setconfig --type “iops” --iops=1--device $i; done

http://virtualy-anything.blogspot.com/2009/12/how-to-configure-vsphere-mpio-for-iscsi.html

Robin

Hello

please can i have instructions on how to add / edit the script you have mentioned. also with the (naa.600`) do i need the full identifier for all LUNS or is that a catch all. thanks

Ed Swindelles

Regarding using Round Robin MPIO in conjunction with Microsoft Cluster Services...

Can the boot volume of the clustered server reside on an MPIO RR-enabled LUN? Then, have the shared volume(s) reside on either RDM without RR MPIO or use the Microsoft iSCSI initiator inside the guest?

generic viagra

This is the perhaps the most important change in the vSphere iSCSI stack.

Chimera

some major bug fixes to iscsi released in the past week, recommend people upgrade to esx4.0 build 244038 - specifically if you have multiple vmknic's to your iscsi storage. go read vmware knowledgebase about it...

Ryan

Love this article. Refer to it frequently. :)

Just as an aside, Cisco introduced iSCSI multipath support for the Nexus 1000v DVS in their most recent release, with some additional improvements (one of them addressing an issue with jumbo frames) coming hopefully later this month or early next.

Cheers, and thanks for the great blogs.

matt

The prescription posted by 'stuart' does no checking and attempts to apply RR to any and all devices even when it doesn't apply, or to devices that are pseudo-bogus. This is what I use instead.

# 6090a058 is my Equallogic's prefix. You can use naa.60* and the 'getconfig' test will skip an entry if not suitable. I also end up with '*:1' paths which can't be set.

# fix broken RR path selection
for i in `ls /vmfs/devices/disks/naa.6090a058* | grep -v ':[0-9]$'`; do
esxcli nmp roundrobin getconfig --device ${i##*/} 2>/dev/null && \
esxcli nmp roundrobin setconfig --type "iops" --iops=64 --device ${i##*/}
done

Steve M

I am very grateful for this post. If I could hug you I would.

Let me just say that I have ESXi 4 running with a 4 Port ISCSI SAN and was not seeing very good performance. My maximum speed was about 120 MB/s no matter if I was using 2 or 4 GB ports. So I spoke to my SAN vendor and they sent me some information which was kinda correct, but didn't increase performance.

Then I found this post. With some time, testing and patience I have gotten my speeds up to 250-300 MB/s on the 4 port GB cards (with other hosts connected to the SAN in the background). And that is without jumbo frames enabled.

With JUMBO enabled I would expect to see a 10-15% increase in speeds.

This is entirely due to this post and your information. I thank you for your time and patience in posting this as it helped me tremendously.

Steve

Chad Sakac

@Steve M - you are giving me a virtual hug :-) I'm really glad that the post helped you. In the end, your comment was the best thing I heard all day.

Happy to help!

Bhwong

We are facing a number of problems with configurating iSCSI on our EqualLogic and have a posting at: http://communities.vmware.com/message/1529506#1529506. The conclusion is that there are some serious bugs with vmware that has to be patched with: http://kb.vmware.com/kb/1019492A

Hopefully these problems will be resolved after we schedule a downtime for this patching.

me.yahoo.com/a/3_LXlMpiluTuRZz4krjXe_r22w--

Thank you very much for that outstanding post and the effort on investigations and documentation that will make many desperate people like me help to understand the complexity of iSCSI connections between ESX(i) hosts and storage devices and to improve their performance significantly.

I still haven't tried all these recommendations yet, but I will do that within the next days being very confident that they will solve the majority of my performance problems.

While reading that post I was wondering that the harmonization between the iSCSI settings in /etc/vmware/vmkiscsid/iscsid.conf and the corresponding settings on the storage array side was not mentioned at all in that post. I always believed that appropriate and well adjusted settings on both sides are the basis of a good performance. Was I completely wrong with my assumption?

Thanks in advance for your reply
Hans

Azriphale

Thanks for these posts.

I am still confused about the port aggregation (Etherchannel) advice. I get that load balancing can only be done by the storage and I understand why iSCSI boxes don't support aggregation, but what about connection redundancy for the esx host? Particularly with a single portal storage solution - where the storage doesn't create a full mesh of paths, switch failues could make the storage unreachable. As I understand it, I would need 6 pnics and 6 vmknics per esx host to get fully redundant connections. I can get that using 2 pnics bonded togther into one Etherchannel group and 3 vmknics. So if bandwidth isn't an issue, why shouldn't I? There's less overhead on the esx and fewer iSCSI connections - so performance could be better. What am I missing?

IT Consultants

This is very important change and for the better

Paul

Fantastic post; thorough and informative. Well done.

martin

First of all sorry if my questions are silly, I may not have understood whole post I am very beginner with network storage.

We are using 2 esxi hypervisors and plan iscsi san deployment for a shared datastore between them, eventually then use ha/vmotion.

We are pretty confident 1 GbE per host is enough, on a performance point of view, since we may not host so many VMs per host.

We may have a very simple configuration with 1 iscsi target shared between the two host. The idea is to have a shared datastore for HA. So 1 session per host, 1 same target.

Not thinking of path redundancy, what if we configure lacp only on san-side ? Will the network load balance between the two sessions / two hosts since they are distinct iscsi initiators ? Shall we have 2 different targets for the same lun/datastore (does this make sense in any case, 1 lun, 1 ip, 2 targets for 2 different initiators ?), one for each host ? Or shall we use 2 ips on the san, 2 targets to make use of the two san nics ? Or any other suggestion ? :D

And finally, for network path redundancy, why not having: 1 vmswitch with 2 nics, one is standby (we stay at 1 GbE throughput), connected to two different switches in a stack (like cisco stack, 3750) --> redundancy for esxi host. And have the LACP aggregation on the san side distributed on the two switches --> 2 gbits links which downgrades to one gigabit in case of switch failure.

lot of questions, and I understand the goal of the post was to provide more than GbE for one esx host and this is not our case. I am again sorry for being a noob I am sure there are answers in your posts I could not catch :)

thanks a lot for your post and further help :)

Veronainfo

Thanks for article...very good work
I'm confused about multipath to iscsi storage...
If i don't run esxcli –-server swiscsi nic add -n -d i cannot vmkping all iscsi target?

Domenico Viggiani

Chad, you said that with Clariion you can balance LUNs among targets.
I configured 4 iSCSI interfaces (=target) on my NS480 (Clariion side, not Celerra!) with 4 IPs but I'm not able to see where a LUN is "tied" to a target.
Peraphs, am I wrong?

Rickard Nobel


A very good good article on this subject. A small technical note from this part:

"This means the storage, Ethernet switches, routers and host NIC all must be capable of supporting jumbo frames – and Jumbo frames must be correctly configured end-to-end on the network. If you miss a single Ethernet device, you will get a significant number of Ethernet layer errors (which are essentially fragmented Ethernet frames that aren’t correctly reassembled)."

You will get Ethernet errors if missing the jumbo frames configuration for example a switch, but those will be either "CRC Errors" or possible "Giant errors". However there is no mechanism for fragmentation on Ethernet as stated above, so for a device in the middle the larger frame will just look corrupt.

Ray

Hi

We had an issue with ISCSI.
ISCSI Network was lost - one of the guys deleted the VLAN and all the nodes (configured with ISCSI) failed to respond. They were running but we could not connect to them with vsphere, so effectively they could not be vmotioned etc during the outage.

Have you come across this before? It seems that ISCSI if it goes offline, your servers diconnect from admin tasks.

Regards,
[email protected]

Dan Pritts

When's the vsphere 5 version of this paper coming? :)

Scott Smith

Great write up! Helped a lot in explaining *real* MPIO with iSCSI to a customer.
Anyone have an idea why I would see three paths to a single Lefthand LUN (on a single VSA) with only two vmkernel ports (vmk1 and vmk2)? ESXi 4.1 U1, Lefthand VSA v9.0

I see: vmhba33:C0:T1:L0, vmhba33:C1:T1:L0 and vmhba33:C2:T1:L0

Thanks!

Ben

I second the "When's the vsphere 5 version of this paper coming?"

I'm curious to see the changes for vSphere 5, if any.

TerryD

Thank you for a great article.
@scott smith. - I'm just configuring this on ESXi 4.1, 348481. With HP 4500s running SAN iQ 9.5. and see the same as you (3 paths to a single VSA). Did you work out an explaination to this?

Thanks.
..Also supporting a vSphere5 version of this paper..

TerryD

In relation to my previous post (@Scott Smith)I found this info on yellowbricks..

Mordock says:
Wednesday, July 8, 2009 at 15:59
I did some playing. If the multipathing is set up after the targets have been added, then the original path is still in the system (pre multipath) as well as the 2 new multipath paths. If you set up the switchs and the esxcli commands after enabling software iSCSI but before adding the targets, then only 2 paths are created..

Reply Mordock says:
Wednesday, July 8, 2009 at 16:01
Also, if you remove the targets after setting up multipathing and then readd them, then the extra third path will go away.

- I deleted the targets on my system and got two paths.
Regards,
Terry

Faisal Farooqui

I would like to see a vSphere 5 version of this post as well.

Brian

I am also asking for a vSphere 5 update to this documentation!

Rickard Nobel

As for the "extra path" after enabling Multipath I have seen it goes away with a reboot of the host too.

Multipath

2013 ... vSphere 5.1 ...anyone?

clive

I don't understand the advantage of having multiple iSCSI targets and/or more than two iSCSI initiators when using 10Gb networks. The way I understood this article, all performance gains are related to being able to use multiple physical paths between initiator and target. In year 2013, on 10Gb networks there are only 2x10Gb paths between each ESX server and iSCSI array. Though ESX supports up to 8 iSCSI initiators, I don't understand the point of enabling 8 VMkernel ports and binding them to 2 physical 10Gb NICs? Same applies to the number of targets; if there are only 2 paths to the volume (in case of Equallogic each volume is a target), which go through 2 physical connections to array, there is no performance gain in having more than one large target, which actually is proved by Dell articles "EqualLogic iSCSI Volume Connection Count Maximum Characterization" and http://i.dell.com/sites/content/shared-content/data-sheets/en/Documents/SAN-best-practices-for-deploying-Microsoft-Exchange-with-VMware-vSphere.pdf Am I missing anything? Does the actual number of established iSCSI connections have any impact on I/O performance in case when they share the same 10Gb pipe?

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.