« Are you a EMC ControlCenter and VMware customer? Good hotfix to know about.... | Main | So... What's the BIG picture stuff going on under the covers? »

January 26, 2009

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Stephen Foskett

This post is insanely great! Nice job, guys, and thanks for getting together on the topic.

Looking forward to ESX 4!

Vaughn Stewart

Thanks for reaching out on this initiative. Looking forward to doing more of these types of posts in the future.

Mike La Spina

Excellent article!
Thanks for sharing your knowledge.

Ian Beyer

And with ESX 3.5, there's probably not much you can do with an AX150i array. I hope multipathing with 4.x will help immensely.

No idea what EMC was thinking with that one, should never have left the design floor.

andrew

Couple of very important aspects your article does not discuss which are very important.

1. You cannot use Vmware VCB if using iSCSI Guest software initator.
2. You cannot place pagefiles on iSCSI volumes
3. You cannot use SRM out of the box
4. Consider lack of snapshots on iSCSI volumes
5. Windows 2000 problems with network shares

Anyone else care to add to the list?

Val Bercovici

Nice work here guys! This is the kind of collaboration and transparency I hope to see more of in the storage blogosphere.

Chad Sakac

Andrew - thanks for the comments - none of us has all the answers, each only have parts, so contributions from all are good - thank you!

Answers for your 5 points:

1) We should have pointed out that native ESX snapshots, and all the things that depend on them (VCB, svmotion, lab manager) don't work with guest iSCSI initiators. I will point out that usually the things requiring the high throughput (again, that can't be achieved via any other means in the VI 3.5 model without using another storage protocol type, which we put aside in this post) are usually things like large Exchange databases, or databases of another type.

VCB 1.5 doesn't support application-integrated backup for guest applications like these (including log handling and recovery modes) - so the loss of VCB isn't a big one.

These also happen to be the case where snapshot based backup tools using array replication can work very well using our tools (keeping in the spirit of the post - EMC's Replication Manager, NetApp SnapManager family, or Dell/EqualLogic's similar tools)

2) Can you point to something confirming that pagefiles can't be on iSCSI? I'm not familiar with that guideline, and would like to see it. It doesn't make sense, and is likely grounded in error.

3) We did point out that SRM support for the case of the iSCSI sw initiator in the guest is a "x" in the column. "It is notable that this option means that SRM is not supported (which depends on LUNs presented to ESX, not to guests) "

4) this is the same point as 2)

5) Like 2, can you please be more explicit, or link to supporting docs, my apologies, I'm unfamiliar with anything along those lines.

I will reinforce our overall comment (and this was the consensus view).

- Start with the answer to Question 1. Keep it simple, and just accept the ~160MBps per iSCSI target in ESX 3.5 for iSCSI. For many this is enough - period.

- Use the guest initiator model selectively when needed, for specific VM use cases where high bandwidth is needed in that larger set of VMs. recognizing the restrictions we pointed out and the one you added (no native ESX snapshots, or any VMware function that is built off ESX snaps, or SRM support for that particular VM in a recovery plan)

Chad Sakac

Val, thanks for your comment. Trying always to be above the fray - focus on customer, respect competitors.

"Be the change you want to see in the world" - Gandhi

David

Chad,
I have a Celerra (NS20). When you say to have multiple iscsi targets, does each Target on the Celerra require a separate IP address (currently I have 3 targets all using the one IP address)? I've always wondered if adding more IP addresses to the targets would help with throughput. BTW, The celrra is connected to the switch via 2 x 1Gb ethernet ports using Etherchannel.
Great article. I'm trying to fix/improve I/O performance at my work and blogs like these are great.

Justin Grote

Terrific Post Guys! I've had to explain most of this over and over and over again to my iSCSI clients who didn't have the knowledge of iSCSI, LACP, etc. and complained about their slow LUNs (thus perpetuating the "iSCSI is slow myth"), and after about a day of reconfiguration, dramatically increased their performance with the same SAN equipment.

I do have one point of contention is the discussion of there being no quantifiable benefit between LACP and just using MPIO. Of course you stated in the article that MPIO can take up to 60 seconds to fail over. However, in a properly configured LACP environment, failover is much quicker (on the order of 30ms-5sec) and transparent to VMWare (because it doesn't have to MPIO to another IP address) so the need to reconfigure guests with higher iSCSI timers is unnecessary. Of course, you can then have multiple LACPs with multiple IP address and load balancing and really ramp it up. So in that sense, LACP does have a quantifiable advantage over MPIO at this time and is a relatively KISS principle compliant solution assuming your switch and SAN support it in a painless way, although if the MPIO timers were tweakable I suppose you could possibly get the same result.

Another important point is that VMWare actually doesn't support LACP, which is the negotiation protcol for creating aggregate links. Instead, it only supports 802.3ad Static mode. Hopefully we'll get LACP support in ESX4, as that will help with both the setup learning curve (removing misconfigured ports from the trunk) and failover time.

My currently favorite config-du-jour is either the stackable Cisco 3750's or the Stackable Dell Powerconnect 6248's (which is a surprisingly good high performance, feature laden, and cheap L3 switch believe it or not) and 802.3AD cross-stack trunks from both the SAN targets (assuming it supports it) and the VMWare infrastructure.

Thanks for the excellent article guys! This definitely goes into my "read this first" pile for clients.

-Justin Grote
Senior Systems Engineer
En Pointe Global Services

Stu

I would _love_ to see a guest vs host s/w initiator comparison...

Chad Sakac

thanks for the comments all!

David - Thank you for being an EMC/VMware customer! Hope you're enjoying your Celerra!

Each iSCSI targets, maps to one or more Network Portals (IP addresses). Now, unless you have more than one iSCSI target, all traffic will follow one network link from ESX - period (for the reasons discussed above). BTW - in the next VMware release, you can have multiple iSCSI sessions for a single target, and there are round-robin multipathing and the more advanced EMC PowerPath for VMware (which integrates into the vmkernel - very cool!)

But, for 3.5, you will see more throughput if you configure differently.

Your NS20 has 4 front end GbE ports, so you have a couple of simple easy choices that will dramatically improve your performance.

It depends on how you have configured your ESX server - are you using link aggregation to the ESX host to the switch, or multiple vSwitches? (this is something we need to add to the post) Let me know, and I'll repond...

UPDATE (1/31/09). David, I haven't heard from you, so will give the answer here for all, and also reach out to you directly.

Long and short - with 1 iSCSI target configured, you will never get more than 1 GbE connection's worth of throughput. You need to configure multiple iSCSI targets.

Now, the Celerra is really flexible about how to configure an iSCSI target. You can have many of them, and each of them can have many network portals (IPs). BUT, since the ESX iSCSI software initiator cannot do multiple sessions per target, or multiple connections per target - in this case, create multiple iSCSI targets - at least as many as you have GbE interfaces used for vmkernel traffic on your ESX cluster. Each needs a seperate IP address by definition.

By balancing the LUNs behind the iSCSI targets you will distribute the load.

You have used 2 of the 4 GbE interfaces on your Celerra (there are 4 per datamover, and the NS20 can have two datamovers - the Celerra family as a whole can scale to many datamovers).

SO, your choice is either to plug in the other two, assign IP addresses, and assign iSCSI targets (just use the simple iSCSI target wizard)

OR

The Celerra can have many logical interfaces attached to each device (where a device is either a physical NIC or aggregated/failover logical device). You could alternatively just create another logical IP for the existing 2 linked interfaces, and assign the IP address to that.

Now, you also need to consider how you will loadbalance from the ESX servers to the switch.

You can either:

a) use link aggregation (which will do some loadbalancing since there will be more than one TCP session, since you have more than one iSCSI target) - make sure to set the policy to "IP hash"

b) use the ESX vmkernel TCP/IP routing to loadbalance - here you have two vswitches, each with their own VMkernel ports on seperate subnets, and then you need to have the iSCSI target IP addresses on seperate subnets. This ensures even loadbalancing.

Let me know if this helps!!!

Gregory Perry

In Answer #3 above you make mention of using hardware initiators only if iSCSI boot is required - what about the fact that hardware initiators support 9K jumbo frames whereas jumbo frames are not yet a supported configuration for the 3.5 vmkernel?

Wouldn't the performance benefit of jumbo frames alone merit going with a hardware initiator?

daniel baird

All the talk seems to be about throughput performance. I'm not seeing anything on latency. For example, in a laptop i have, changing to a SSD drive has improved my boot and app load times immensely. throughtput hasn't changed, i still get the same MB/sec as with the old disk but the latency is way down.

for applications where the usage is like this; i/o, cpu, i/o, cpu, i/o etc, latency is a huge issue. For this kind of use, i'm not seeing iSCSI being a good choice. Fibre Channel still has the benefit of lower latency. and when that's not enough, Infinniband.

what do people think?

eric

Daniel Baird,

I m not negating what you are saying, just complementing it: what you did to your laptop is a classic
"I have re-designed and moved a bottleneck somewhere else" thing. Its a never ending story mate..

The above applies to any design exercise.

Cheers,
Eric Barlier

Steve Chambers

Great post, Chad, can we syndicate (ie. promote) this on VIOPS?

Chad Sakac

Steve - you can ABSOLUTELY promote on VIOPS (which is awesome BTW).

Chad Sakac

Daniel/Eric - latency DOES matter.

EMC has a lot of experience with EFD - which we've been shipping for a year.

To understand Enterprise Flash Disk: think of these as the SSD you see in your laptop but on steroids. They are designed for many, many more read/write cycles (and have lots more extra cells). They have dual-ported interfaces, and a lot of extra firmware/SRAM between the interfaces and the actual flash.

EMC's view of the future here is that soon there will be only two types of disks - huge slow SATA, and hyper-fast EFDs. All sort of host interfaces (SAS/FC/iSCSI/NAS/FCoE....) but that will be the stuff that stores the data. All our arrays support EFDs now.

OK - back to the point - LATENCY does matter!

The reasons EFDs rock (even in high end cached arrays, and even in VERY high cache hit rates like 95%) is their ability to deliver 30x the IOPS (IOs per second) of a traditional disk. This actually means that while they cost about 9x more than a FC disk (commercial SSDs costing about 3x more today than a SATA disk), they are in the end cheaper. They also save a TON of power/cooling space.

Exciting stuff, and our customes LOVE IT.

It was, however out of the scope of the post. In talking with the guys, we decided to put aside FC (which of course deals with the same multipathing issues in 3.5, but has lower latency and much higher effective bandwidth at large block workloads) since that would become potentially political and exclude the iSCSI-only vendors, negating the point of the joint exercise.

Speaking as the EMC person, our view is the following:

When VMware is used for the "100% virtualized datacenter" there is no single pat answer about protocol choice, backend choice, connectivity choice - because you have some "craplication" VMs, some "important VMs", some "mission critical VMs".

Each one of those have differing IO requirements which are "orthogonal" - i.e. no correlation to the "importance".

In fact, my recommendation (personally) is that every ESX cluster should have block (and the choice of type varies - and in some cases is "several") **AND** NFS - as each have VMware "super powers" and also some features which work only on one or the other.

We pride ourselves as EMC as covering all those bases.

It makes for a more complicated discussion. It's simplistic to say "iSCSI is the ONLY way" or "NFS is the ONLY way" or "FC is the only way". Those are the answers of someone with an agenda, a bias, or a cult :-)

The "it depends" answer is correct and the answer of a pragmatist. It needs to be followed by a "let's talk about what you're trying to do, and how we can design a solution that meets those requirements and is the simplest we can make it at the same time".

Chad Sakac

Greg - re jumbo frame support. It works, and works fine, but as you point out it's not supported (and I try my darnest to never recommend something in a production environment that isn't supported).

I was surprised that the jumbo frames didn't help more when we did testing. We did a lot of testing around that, and I posted on it here: http://virtualgeek.typepad.com/virtual_geek/2008/06/answers-to-a-bu.html

I'm not saying it's not good, it does make a difference, but only with large block IOs (64K or larger - common during backup/restore or database DSS workloads).

In my opinion, this doesn't warrant moving to and iSCSI HBA. For the same dough, you can get an FC HBA, and all our arrays (exception of the AX series) support FC and iSCSI together at no extra cost. Yes, there is the price of the FC switch and ports, but they are a lot cheaper these days.

So, here's the logic:
- I **LOVE** iSCSI. If you are set on using iSCSI, use the guidance in the doc. the SW initiator is the focus of most qual, work, testing and the most widely deployed.

BUT

- If you have to drive high throughput to a SINGLE target, either wait until ESX 4, which will support jumbo, will have multipathing, AND multiple sessions.

CAN'T WAIT?

Rather than spend on the iSCSI HBAs, go FC. It will cost marginally more,

WANT TO BUILD FOR THE FUTURE?

Go with the 10GbE converged adapters, which look like two 10GbE NICs and two 8Gbps FC HBAs to ESX.

These are supported with ESX 3.5 already, and EMC e-Lab (a multibillion dollar interop effort) has qualified all the gear (FCoE switches, CNAs).

Like the earlier comment - no single answer is right for all customers, but there is an answer for EVERY customer.

daniel baird

Thanks for the replies Chad & Eric.

Seeing this article in the context of "when it makes sense to use iSCSI..." makes all the difference.

Eric, I agree I just changed where the bottleneck is. On my laptop, I get much higher average CPU utilisation now. That's now the bottleneck. I didnt explain myself very well, i think my point was more that reducing latency was what helped my performance issues. a disk with higher throughput with the same latency wouldnt have done much for me.

Chad, thanks for the lengthy reply! I'm new to virtualisation, i'm from a telco core engineering background. enterprise is a new area for me, so i'm having to learn a new solution flowchart/methodology. There's a lots to learn

e.g. with FC, you have the FC HBA and FC switch costs. with iSCSI, its NIC and Ethernet switch costs. you also have the cost of the extra CPU you're using for the TCP/IP overhead (less if you use a TOE card, but they cost a similar amount to a FC HBA). old style servers with OS directly on hardware usually have more CPU headroom which can be given over to iSCSI processing, but with ESX/Xen/etc you're trying for high average CPU util so there's not the "spare" cpu available.

lots of pros and cons. :) it would be cool to see a cost/performance comparision of FC, iSCSI and NFS with a set pair of storage and server boxes. for example several HP DL380s and an EMC Clariion running a number of VMs. the VMs would be running key apps like MS Exchange or Oracle etc. i find there's not enough data on how apps use storage to help you pick the type that best fits. in the past i've seen too many over-engineered solutions where expensive kit is throw in because the app's behaviour is not well understood. i'm sure we'd all love to have the time and budget to lab test all the hardware combinations to see what's the most efficient. apps that move big chunks of data around and are less affected by latency would seem to be prime candidates for iSCSI.

also, there's FCoE. that really kicks the ant's nest. but perhaps it can happilly exist alongside iSCSI. FCoE may kick out the existing full-stack FC. but i digress...

Martijn Jansen BT

Guys, from a networker that just stumled across this very good article: be aware that the networking side, if you use ether-channels between switches (servers not all connected to 1 switch) you need to take the load-balancing algorithm of the switch(yes...)

So that work on source/dest mac/ip. example:

cisco.com: Use the option that provides the greatest variety in your configuration. For example, if the traffic on a channel is going only to a single MAC address, using the destination-MAC address always chooses the same link in the channel. Using source addresses or IP addresses might result in better load balancing.
> (IOS CLI) port-channel load-balance {dst-ip | dst-mac | src-dst-ip | src-dst-mac | src-ip | src-mac}
http://www.cisco.com/en/US/tech/tk389/tk213/technologies_configuration_example09186a008089a821.shtml

Storage Solution

Hi,

Guys,A “Multivendor Post” to help our mutual iSCSI customers using VMware" is a good idea to share knowledge.We are working with Stonefly and DNF products.If some body have any question or query on integration of servers(FC,iSCSI,IP SAN) with Stonefly or DNF product plz forward it to me.

Stonefly and DNF product can be integrated with almost any solution like Microsoft,VMware EsX ,Solaris,Linux,Solaris etc.....

Regds.
StorageSolutionGroup

VISE

Virtual Iron which is an inexpensive server virtualization product also integrates with iSCSI in a similar fashion.

Ben

Should flow control be enabled for NFS as well?

Dan Israel

I have to agree this is exactly the type of colaboration that produces high quality work.

Couple of questions regarding ESXi and Clariion:

We done a few test that have conflicting results. Now, I somewhat understand why. Could you confirm or expand on these 2 concepts:

1) Essentially, because of the current limitation to a target, Exchange and SQL should use the GuestOS scsi intiaitor (and powerpath in my case) to provide greater throughput. Is that accurate and will that change when PowerPath for ESX is released?

2) We use a fully redundant fabric of two subnets, one for each processor of the AX4. In our case, it would be better to use two vSwitchs, each with their own VMkernel ports on seperate subnet over using a single vswitch with two nics committed.

Thanks again for this outstanding information.

Albert Widjaja

Hi,

what a wonderful article.After reading your article i realised that by using the following deployment http://img11.imageshack.us/my.php?image=deploymenti.jpg it limits the bandwidth to just one link (fail over) rather than 2 GBps which can boost performance.

I wonder if i can use trunking using directly attached 2x 1 GB Ethernet cable as iSCSI from the Dell MD3000i SAN into two ESXi Servers, would that be a faster solution rather than using a switch in between ?

cemal dur

Hi Chad,

First of all, i would like to thank you for your valuable information.

I have some questions about cellera and vmware esx.
we are going to start a new vmware project soon and we plan to use the methods you mentioned about. We are going to use vmware esx 3.5.

As you say, i know that i need to use multiple iscsi target at the side of ip storage in order to get max performance with the iscsi software init.

We will use a virtual switch which composes of 4 nics for the isci network and “ip hash” as nic teaming load balance policy.

we can use a few methods related to it.
1. we can generate two iscsi lun on every target by creating 4 isci target.
1.a is it possible to have a different isci session on each nic (virtual switch included 4 nics)?

2.we can generate one iscsi lun on every target by creating 8 iscsi target
2.a is it possible to have a two different isci session towards 2 different iscsi target on each nic(virtual switch included 4 nics)?

Do you have any suggestion about another trick for the best isci performance?

Does vshpere support multiple connections per session (mc/s) ?

We are planing to upgrade vmware from 3.5 to vshpere at the end of year .Do you think that we need a change when upgrading to vshpere in the structure above? ( by taking consideration about iscsi target and lun number)

Finally, there is 60 disks on ns40. Is there any suggestion of you for the max performance at the side of storage? is it the best method to use AVM?


We are looking forward to receive your answers.
Thank you for your attention.
Best regards,

Steve

Max throughput 160MB? I have always thought it was 1000/8=125MB. How are you getting 160MB across a 1Gbps link?

Chad Sakac

@Steve - remember that it's 1Gbps unidirectionally, and generally ethernet is configured in full-duplex.

125MBps is really an unacheiveable goal - even if there were no overhead, and there of course is overhead (ethernet frame, IP header, TCP header, iSCSI PDU header) and re-transmits, and control traffic.

So - 80MBps is a more acheiveable throughput with a 100% write or 100% read workload (which result in predominantly unidirectional iSCSI traffic), and about 160MBps with a mixed read/write workload.

Steve

I'm looking at the 160MB number and am curious how you are getting 160MB throughput across a 1GB link? 1000/8=125 please show how you are reaching 160MB.

Dan J

Great article. I learned the hard way that it will always use a single NIC when you only have one iSCSI target.

Question... I can set up multiple IP's on my SAN and set up multiple iSCSI targets, but do you only use 1 vSwitch for the vmkernel and service console or do you create multiple vSwitches for this?

Sean

Chad,
I also have a Celerra (NS20). 4 ports are active for the Server 2 Datamover, and the server 3 DataMover is in standby....right? That's what was explained to me.

So i carved out 5 LUNs @ 500GB each. I also have 12 NICS on each of my 3 ESX servers.

Initially I setup the Celerra for LACP on all 4 ports as a single target going to my 3750 switch. All my LUNs were behind the single target IP. After reading this, I broke them up over 4 target IPs which really made my VMotions slower.

What are my options for the best speed but fault tolerance for VSphere?


Kevin

Thank You for the great post.

What indicators should we be looking for to identify that we have maxed out an iSCSI session? We are using LeftHand equipment and we have link aggregation in place.

Obviously we can look for bandwidth bottlenecks on switch interfaces. But from a Windows Virtual Server, would you start seeing disk queue length counters climbing? Are there other perfmon counters that we would notice?

We are looking to place Exchange 2007 in a VM for 2500 users. Currently our Exchange environment lives on a FC Clarrion. I am a bit concerned after reading this that iSCSI may not have enough throughput for our Exchange environment.

Thank you

Chad Sakac

Kevin, as an EMCer, if this response doesn't buy me "well, at least he's honest" I don't know what will :-)

I wouldn't worry about that user load and Exchange and iSCSI.

Exchange is actually (in steady state), IOps bound - not bandwidth (MBps) bound. For example - assuming 0.5 IOPs/user, and the 8KB IO size of Exchange 2007 - your 2500 users = 10MBps, which is well under these limits.

Now - during a backup (if you're doing a traditional backup) it will be bandwidth bound (it drives as much as you've got)

There's an easy way to check before you migrate. Just use perfmon, and measure the physicaldisk stats for a week (capture the backup periods). If they are in those bounds, you're good. If not, you need to look at the workarounds we listed.

More important with Exchange is generally the number of spindles. just make sure you have enough in your lefthand configuration.

The key things to watch with iSCSI from a VMware standpoint are the network bandwidth statistics, the vscsi stats on latency (bad latency = unhappy apps), and if the backend storage is happy, but latency is not good, look at QUED (queue depth) using perfmon - and check to make sure the queues aren't overflowing.

Good luck - and let me know if I can help further!

John Doyle

Wow,thanks for this really good explanation on the link agregation/esx relatioinships.This cleared up a whole load of questions for me.
Many thanks

www.facebook.com/profile.php?id=1734056550

Can anyone comment on openfiler iSCSI SAN? To the best of my knowledge, iSCSI targets in OpenFiler does not support multiple connections per initiator. I am not even sure if you can assign an unique IP per LUN. So is this mean that I am limited to 1.0GB for the entire openfiler?

shannon

I love that quote and would be honored for Chad to know me!

It's hard prepping for interviews when I know who I am most interested in and who I can earn a career with, not just a job. One night I was supposed to be studying switches and instead wrote my elementary understandings of VDI. Don’t get me wrong, I did study switches as well. Today I am supposed to be prepping for other interviews and I want to study this blog because it's good and my high level friends tell me to. Again I will probably do both, with pneumonia, because that's who I am! I can’t help but notice my interviews are teaching me the entire network, maybe to understand some of the intricacies of consolidation? As I said in my profile, my writings display my passion and self motivation to ramp, not what I wish to know.

Shannon, ISR/Lead Generation

shannon

Opps, I was commenting on, "If you are passionate about these technologies, good in front of people, like working hard when it’s something you believe in, and feel like we’re at the cusp of a wave of technological change – I want to know you.”

What I will say about your above writing is it's not our place to tout competitors on our marketing blogs, but I love how you do this when it's required to service the client. I love how you stay professional in this competitive marketplace, and use facts. Kuddos to the customer centric way in which you do business.

Shannon

Alex

Dear Chad! Tank you a lot for sharing. This is a very relevant information for me, especially about cellera and vmware esx . I will bookmark your blog and will use this information in my custom paper writing. Wish you good luck.

Arthur Gressick

Chad, nicely done, I posted something similar to building a Linux machine with iSCSI. I would love to share that with the group, it takes less then 1 hour to build everything from ISO to working machine. Using your configuration your done in 30 more minutes. Great Job!

Domenico Viggiani

Chad, I'm still using ESX 3.5 and I'd like to understand hot to configure the iSCSI storage infrastructure to get redundancy (and not to increase throughput). Disk-array is an EMC Celerra.
Could you point me to a good reference?

Chad Sakac

@Domenico - yes, please see the Celerra Techbook, here: http://www.emc.com/collateral/software/technical-documentation/h5536-vmware-esx-srvr-using-emc-celerra-stor-sys-wp.pdf

basically - make sure you have the iSCSI target exposed via multiple virtual/physical interfaces. Those should go into a redundant switch fabric. You'll see the iSCSI LUN visible on multiple targets. In the LUN properties in vCenter, you'll see an active path, and a failover path.

Domenico Viggiani

Chad, thank you very much.
I heard you and Vaughn Stewart at VMworld speaking about storage best practies with VMware (the best session I attended!)

With EMC NS480, do I still need to setup different target/interfaces in different subnets?

David H

Hi all, since there's several on here with first hand EMC experience, figured I'd pose a question. We've just deployed a Cisco UCS cluster with ten gig out and an EMC Clariion CX4-480 with ten gig. The storage is on its own vlan configured for 9000 byte MTU (jumbo frames) on dedicated static vnics. I've verified with ping and the do not fragment bit that jumbo is working end to end. The server blades in the UCS are running ESXi 4.1 and EMC powerpath with four targets for each LUN (emc spa/0 & 1, spb/0 & 1). Our raid groups in the EMC are a meta lun striping two 5+1 RAID 5 arrays each.

So, that brings me to the question; the best throughput we've been able to achieve from a guest to the EMC running disk benchmarks to an idle CX4 has been 249 MB/sec block writes and 165 MB/sec block reads. As far as I can tell, there is no limit being hit anywhere; the drives in the CX4 are barely lit, the switches show no errors and have not bursted higher than ~1700 Mbit/sec, no exhaustion of buffers, cpu load on the server side looks fine. What I dont know is whether esxi 4.1 on cisco UCS is using hardware for the iscsi? If not, is perhaps this a vmware cpu issue?

Chad Sakac

@domenico - first, thanks for being a customer! On an NS480 - there are two ways to do iSCSI: 1) via the Datamover (which we are going to phase out over time - for example, there won't be VAAI support for it, or provisioning via the vCenter plugin); 2) via the Storage processor (this will be the "winner"). BTW - you can non-disruptively add the second type if you're currently not using it. This was a difficult decision, but it didn't make sense to have two iSCSI targets in the same array.

I ask because the best practices differ depending on the target. Let me know which type you are using and I will help.

@ David H - also - thanks for being a customer! EMC, like a lot of folks in the industry (this has hit pretty well most vendors supporting 10GbE) it is very likely (but not absolutely surely) due to the Broadcom chipset. First, are you running the F29 patch for 10GbE customers? This is also fixed in FLARE 30.5 (the most current rev).

I talked about that here: http://virtualgeek.typepad.com/virtual_geek/2010/10/nice-updated-emc-unified-iscsifcoe-tidbits.html (download the VMworld session and have a look-see)

Setting flow control on the switches is also important.

With all those set, the only question I would have would be the spindle config. As described (why not use Storage Pools rather than Metas? Storage pools are the future!), you MIGHT not have enough backend to saturate the interfaces (if it's a small block size) - though I DOUBT this is the case (based on your description).

Let me know if the suggested fixes help you, otherwise, please open a case, and we'll work it.

David H

Hey Chad, we're running what's showing as 4.30.0.5.508 Chad, is that the "F29" patch? Or better? I'm very new to the EMC side of things so just jumping in feet first. I did read some of your threads before we even got started and made sure we'd be taken up to Flare 30 before we deployed since I knew it had the multi-path iscsi initiator fix. As far as i can tell, we're not experiencing any logout issues with our four paths (via powerpath) to the targets which I think was the issue in pre-FLARE 30.

I have absolutely no problem blowing away our meta luns and raid arrays and re-doing them as raid pools if it would be advantageous; we only have a few virtual machines booted so I could easily move them to one meta lun, rebuild the rest of the storage and then hot migrate them to complete the final rebuild. Would that be worth doing? I got the feeling we only went that route because the install-vendor was more used to doing it that way.

On the Cisco side, I'm showing:

MTU 9000 bytes, BW 10000000 Kbit, DLY 10 usec,
input flow-control is on, output flow-control is off

David H

Hi Chris, just an update, what I've done this morning is reconfigure my CX4 back down to 1500 byte MTUs and deleted/recreated the iscsi vnic's on the ESXi side to go back to 1500 byte MTUs and now my read speed on an 8-drive (2 x 3+1 raid 5) meta LUN jumped from a best ever 165 MB/sec to 297 MB/sec.

On the Cisco switch side I'm not showing any buffer overruns or drops since they have a pretty large 175 MB buffer on the 4900M switches. Plus the PowerPath I can tell is correctly splitting the load across two switches since SPB owns the MetaLUN in question and its port 0 and 1 are on different switches. Also on the Cisco side I can see a lot of pause frames coming back from the CX4 which makes me think maybe at 9000 byte MTU's the buffers on whatever NIC hardware the CX4 10gig cards use are not sized in a way that works well with 9000 byte frames.

Do you know if there's any internal EMC testing data that shows what the ideal MTU is for the CX4's 10gig cards? I can step through the 11 pre-defined options between 1500 and 9000 if not but that's slow going thanks to the vsphere side of deleting/recreating the vnic's. :-)

David H

Sorry for messing up your name in the last post; been playing with storage more than sleeping. lol I deleted a few metaluns/raid groups and created a pool of 10 drives but did not see much change in the performance unfortunately.

Domenico Viggiani

@chad, I understand what you say, thanks.
I'm evaualating pro's and con's of all methods to "attach" storage to VMware (and not only to it... also Linux and Windows boxes will share same "fabric"):
If possible, I prefer FC that works at its best without many efforts of configuration.
As an alternative to FC, I'm looking at iSCSI (with MPIO as failover/load-sharing option) and NFS (with network level solutions for failover/load-sharing), as you suggest in your posts.
I'm trying to avoid any prejudiced position.
I know that NS480 has iSCSI on the front-end, peraphs it's the best solution (sincerely, I already have a few old Celerra's and its datamover architecture is not my love!). If you have some spare time and point me to right direction, I surely will avoid a lot of mistakes! Thanks in advance

reghards from a long time EMC customer

J

I am trying to understand the difference between ISCI multipathing vs. Nic Teaming (Active/Active)--What are the differences?

The comments to this entry are closed.

  • BlogWithIntegrity.com

Disclaimer

  • The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.