UPDATE (May 22nd, 2010): At EMC World 2010, FLARE 30 was announced, which amongst many (MANY!) new features, also has some fixes – one of which fixes this underlying behavior. You can read about it at this post here.
I got a little tired of a couple lightweight (aka less technical) posts (important, but still..) – so here’s one that’s an important technical “gotta know” if you’re using the combination of EMC CLARiiON (any FLARE rev), iSCSI and vSphere.
So – there is a core issue that is not generally as well known as it should be. BTW – if you don’t want to read it (though you should if you are in that group – which many of you are) – this is being covered in one of my sessions (TA2467 – Wed 11-12) and a great session by Andy Banta and John Hall from VMware (TA3264 - Wednesday at 4-5:30pm)
If you’re not at VMworld, or want to understand this immediately (you should if you’re using CLARiiON/iSCSI/vSphere) – read on…
Lots of people are scratching their heads over this statement on page 31 of the (always excellent!) VMware iSCSI SAN Configuration Guide:
What’s that all about? And, since iSCSI multipathing with MPIO and the process starts with creating explicitly binding vmkernel NICs to the iSCSI software initiator – does this mean you can’t do it?
You can ABSOLUTELY drive simultaneous interfaces against a single target when using NMP Round Robin or PowerPath/VE and an EMC CLARiiON and the vSphere 4 software initiator. BUT there is one CLARiiON issue (this is really a bug, IMHO – and one that we’re fixing, so the the below is a workaround – but a workaround that you could leave for as long as you want – there’s not really a general downside).
Ok – here it goes….
Issue:
- EMC CLARiiON records an iSCSI intiator for each iSCSI session by IQN, not by the full SID (IQN+IP address) – btw – this the the bug.
- If the same initiator is noted logging in to a target, the other initiator’s session is logged off (this is the right behavior – but if the above was fixed, they would appear as seperate initiators)
- Didn’t occur in 3.5 (only one session ever), but in vSphere when using NMP RR or EMC PowerPath where multiple sessions to a single target are used – can be an issue.
Effect:
- If the iSCSI intiator tries to login twice, the first gets kicked off, then logs in, kicking off the second.
- Symptom = very slow iSCSI performance (race condition where they log off, on, and rinse, lather, repeat)
Workaround:
- Put iSCSI vmkernel NICs on separate subnets. This works because vSphere doesn’t route vmkernel traffic and the CLARiiON iSCSI model is one target per physical port. This means that the iSCSI initiator will only try logging on once to each target – the other one will return “network unreachable” to the vmkernel, and it doesn’t try to login (and there is no error).
Result when properly configured:
- No reduction in availability
- No reduction in performance – so long as you have 2x more target ports on the array than any single ESX host has initiator ports (i.e. a fan in ratios that less than 2:1 will not be able to saturate the host vmknics, because there will be less active target ports than initiators ports)
- ½ meshed configuration - in vCenter you should see 2x the number of paths (or more generally “number of possible paths * 1/vmknics that are bound to the iSCSI intiator”) as you have as you have iSCSI vmkernel NICs.
Sometimes a diagram helps, so look at this (note – wrong would be 8 paths for the LUN – and be very slow, right would show 4 paths in vCenter – and be nice and fast):
Hope this helps!!! Will update when the bug is fixed – but this is still in the most recent FLARE rev (FLARE 29).
See you at VMworld!
Ahhar - thankyou :-)
For what its worth, I'd agree this is a bug. Merging this in with other iscsi boxes that rely on single-subnet iSCSI redirection (e.g. EQL, Lefthand) will be fun. ;-)
Posted by: David Barker | August 27, 2009 at 10:05 AM
David - agreed on all counts. This is at best a workaround and a workaround that will work on a sustainable level for MANY customers. Most, but not all, don't have CLARiiON and Dell/EqualLogic or Lefthand.
But - just a workaround. I'm applying tons of pressure to get it fixed.
I did, however - want to get the info out there to the world ASAP.
Posted by: Chad Sakac | August 27, 2009 at 10:24 AM
Hi Chad,
Is this also required for hardware adapters? I'm using Qlogic 4062C dual port iSCSI adaptors?
Thanks,
David
Posted by: David | August 27, 2009 at 11:48 PM
Hi Chad,
we are using an Clariion AX4-5i with ESX 3.5. (SW-iSCSI, Path Policy: VMware Native MRU)
Please look at this vmtn thread:
"http://communities.vmware.com/message/1320834"
We are using exactly the same configuration as "oberon1973".
It seems that your described workaround doesn't work in this case (haven't verified this in our environment yet because we haven't time + money for an extra vSphere test environment only for iSCSI-Failover-Testing (SMB environment).
I'm not feel confident to migrate to vSphere if not even basic MRU failover works properly.
I'm a bit disappointed (EMC very often tells everyone about the very good storage integration with VMware, therefore i wonder about the qa testing processes inside EMC with vSphere (vSphere is released since 3 months ..).
By the way i can't find any informations about this case on powerlink (Is this not important to all (clariion) customers? Not every customer reads your blog Chad *g)
Please consider this as constructive criticism.
Bj
Greetings from Germany
Posted by: Bj | August 29, 2009 at 02:07 PM
BJ - thank you very much, it is indeed constructive criticism.
I've commented on the VMware communities post, thanks for pointing it out - and I can confirm that that configuration should indeed work, and you can go to vSphere with your AX4-5i and use NMP with confidence.
This issue is well known - and was known prior to vSphere GA - the core issue is a basic CLARiiON one, not a vSphere one. It affects any case where an iSCSI initiator logs in more than once (for example the MS iSCSI initiator in a guest or a physical also does this). The resolution is underway in the FLARE release train, and I'm tracking it.
The EMC Primus Knowledgebase article number is emc156408. If you call into customer support, they should know EXACTLY all about it, and find the case and workaround immediately. I might in fact blind-test this tomorrow :-)
If you wanted to test it, you could use the evaluation version of vSphere ESX/ESXi (free) on almost any server (including home-brew hardware) at almost no cost.
But - I agree that we could do more to make it well know (it belongs in the ESX/vSphere guide for CLARiiON for example - and am working to get it clear there as well).
Thank you for being an EMC and VMware customer!
Posted by: Chad Sakac | September 10, 2009 at 11:56 PM
Hi chad,
We are using vmware esx 3.5 on emc celerra ns40 . We are planing to upgrade vmware from 3.5 to vsphere .Do we need to reconfigure vmkernel NICs on separate subnets ?
Thanks ,
Posted by: cemal dur | September 23, 2009 at 09:16 AM
@Cemal: On the Celerra, you don't need to have the vmknics on seperate subnets.
The iSCSI stack (including the target) on the Celerra is done above the Celerra filesystem, and is different than the iSCSI stack on the CLARiiON (getting the best of both)
Over time, these will merge, but move forward with confidence. You can put them on the same or different subnets on a Celerra and there is not the issue noted in this article.
On the Celerra, you configure an iSCSI target with multiple logical ethernet interfaces in an multiple network portal configuration. Unlike a CLARiiON a LUN is only behind a SINGLE iSCSI target, so configuring multiple network interfaces/portals as part of that target, you can multipath. Ignore the "non-redundant" message you will see in the vCenter datastores and storage views panes - this is a bug (it looks for multiple targets logged into for a single LUN as a cue of multipathing).
Thanks again for being an EMC customer!!!
Posted by: Chad Sakac | October 23, 2009 at 04:20 PM
Hi Chad,
Thanks for the article. What I am not clear with is: The AX4 supports only one initiator login (iqn) per SP or per storage?
VMKernel1 should be in the same VLAN with SP0A and SP1A and VMKernel2 with SP0B and SP1B?
In the iSCSI configuration, should I configure the 4 SP IPs?
Thanks again
Posted by: Enrique | December 09, 2009 at 01:15 PM
Chad,
Any update on where this fix is in the FLARE code update cycle?
Thanks,
Dan
Posted by: Dan Lah | December 14, 2009 at 11:51 PM
Chad,
Thanks a lot for the article. Do you know if there has been any further progress on this? I was also curious if this affects a hardware ISCSI initiator also...
Thanks,
Mike
Posted by: Mike Bruss | February 03, 2010 at 05:00 PM
Chad, you're my hero.
Was tearing my hair out trying to figure out why this was happening. This is apparently NOT resolved in the latest flare code. EMC seems to hint at it in their CLARiiON/VSphere integration manual (page 20 I think it was), but doesn't outright say why.
Thanks again, saved me some grey hairs.
Posted by: Ryan | March 17, 2010 at 10:29 AM
The blog was written in August of 2009. Is this still an issue with the CLARiiON?
We are/will be using an NS120, which I understand is based on the CLARiiON line.
Bart
Posted by: Bart Perrier | April 23, 2010 at 09:41 AM
@Bart - the behavior will change shortly (very early Q3). The FLARE update that will change this (very much for the better) is now in Beta.
Posted by: Chad Sakac | April 23, 2010 at 02:36 PM
Thanks for the reply, Chad. Our initial environment will only have one datamover (we have an additional DM planned) with two iSCSI ports for each ESX host (pre-production). Should we expect to see the degradation in iSCSI traffic when we add the second iSCSI port?
Posted by: Bart Perrier | April 26, 2010 at 01:53 PM
@Bart - are you using iSCSI to the Celerra (connecting to the datamover) or the CLARiiON backend behind the Celerra (connecting to the storage processor)
The Celerra doesn't have this same issue (requiring the subnet workaround, and scaling is linear as you add ports.
The Celerra and CLARiiON iSCSI initiator are merging, and the fact that it works on Celerra and not on CLARiiON will be resolved in a CLARiiON update VERY soon (EMC world starts tomorrow :-)
Posted by: Chad Sakac | May 08, 2010 at 02:09 PM
@Chad -- we are connecting to the datamover. Glad to hear it doesn't exist on the Celerra. Thanks again, Chad.
Posted by: Bart Perrier | May 12, 2010 at 10:21 PM
For anyone following this thread, note:
UPDATE (May 22nd, 2010): At EMC World 2010, FLARE 30 was announced, which amongst many (MANY!) new features, also has some fixes – one of which fixes this underlying behavior. You can read about it at this post here:
http://virtualgeek.typepad.com/virtual_geek/2010/05/iscsi-clariion-and-vsphere-nice-fix.html
Posted by: Chad Sakac | May 22, 2010 at 10:45 AM
Well... round about every blog posts online don't have much originality as I found on yours.. Just keep updating much useful information so that reader like me would come back over and over again.
Posted by: Forex Brokers | June 02, 2010 at 07:24 AM
What about a fix for the AX4-5i? I spoke with a support person that said this hasn't been applied to the relevant software for it.
Posted by: Mike | June 07, 2010 at 01:52 PM
per
"Hi Chad,
Is this also required for hardware adapters? I'm using Qlogic 4062C dual port iSCSI adaptors?
Thanks,
David
"
this bug does not apply to qlogic dual port card, each qlogic port has different iqn name, you are good.
for software iscsi on ESX, each vmknic will use same iqn to login, w/ different IPs.
Regards,
- Kun
Posted by: Kun Huang | August 28, 2010 at 01:03 AM
Chad, we are on Flare 29 and having some disk latency issues using iSCSI. You mentioned this does not log an error, how do we know if it is affecting us?
Posted by: Owen | December 17, 2010 at 10:27 AM
I'm interested in this to be applied on my AX4-5i any news? we are in end of May 2011... :)
Posted by: Hussain | May 30, 2011 at 09:01 AM
@Hussain - I'm sorry, but the AX4-5i aren't going to be getting more major software updates. That means that the way initiator records are stored isn't changing, which means that on an AX4-5i, you need to follow the workaround (separate subnets).
The workaround doesn't lower your performance or availability, but is a little more complex.
Sorry!
Posted by: Chad Sakac | June 02, 2011 at 08:48 AM