I got a lot of questions on the 10GbE post, which is great. Some of the answers involve graphics, so rather than comments, I'm just going to do a post....
Here are the original questions: http://virtualgeek.typepad.com/virtual_geek/2008/06/10-gigabit-ethe.html#comments
Ok - one at a time (apologize for the delays, have been swamped):
jumbo frames don't hurt, but don't help that much either. Below is a table from some pure research with an Exchange 2007 ESRP workload (handy because it exhibits both small block 8KB random I/O during the normal workload, and large block (64K and larger) sequential during the checksum and backup phase of the test. I also like it because it's a app workload, not artificial. This test was with VMware 3.0 with the MS iSCSI initiator in the guest, but similar more recent tests show the same behavior using the VMware iSCSI SW initiator in 3.5.
J1 = Jumbo frames on; J0= off.
Takeways: for the IOPs-driven small block workload - the difference is a fat nada. For the throughput large block workload, there's some difference - in a 4hr backup job, it shaved off 12 minutes.
I'm going to do an IP best practices post (get a lot of question re: multipathing configs) and I'll post the data we found. Jumbo frames **CAN** hurt you though in the sense that you need to have the discipline to make sure you enable them from the source to the target, otherwise you get fragmented and those nice iSCSI PDUs get sliced up incorrectly. What's more important is something basic - design the IP network like you would an FC SAN. Isolate it (phyiscally or via VLANs), and build in redundancy. Don't route it. Also, disable the Spanning Tree Protocol on the ports used for iSCSI (or the switch as a whole). This Cisco doc says it best:
"For the purposes of creating the iSCSI test environment, it is recommended to keep the configuration simple and assign an IP subnet that is to be used for both the iSCSI initiator (s) and the Cisco MDS 9000 Series IP services module's Gigabit Ethernet port"...
Ah - KISS. Now, where I have seen iSCSI problems is in high-thoughtput (usually streaming backup) workloads. For this as I said earlier, multipathing (VMware ESX, or for that matter any OS) is important. You should have multiple iSCSI targets configured (VMware doesn't support multiple connections per session yet with their software iSCSI target, and depends on multiple target IPs for network level load-balancing, or multiple targets for storage-level multipathing). EqualLogic does this by default (in their case every LUN is an iSCSI target), NetApp, Celerra, CLARiiON (they have LUNs behind iSCSI targets) all require a bit of thinking by the admin (but not a lot). The Celerra is particularly sweet - you can configure up to 1000 iSCSI targets.
I hear you loud and clear, Michael, and IB is an engineer's dream of a transport. To tell you the truth, I don't KNOW why it's not compelling, but it seems to have been relegated to the sidelines (certainly not by what I post), but rather by the market as a whole. For example, there's a startup called Xsigo that has a really cool IB switch and have made a IB CNA focused specifically at this VMware use case. Yet, in spite of all that goodness, I haven't seen it go far, and have been following closely. My theory is that it has more to do with the cable plant - which is why I suppose (and this is 100% supposition) that 10GbE won't take off (i.e. capture the mass market and rapid adoption curve) either so long as the physical layer is an optical cable. That train of thought is what made me ask my last question: does it have to be CAT6, or would something that is similar (like the SPE+Twinax) be good enough, or - is this a solution in search of a problem (I don't think so, which is the basis of the post). Why do you think IB hasn't taken off?
Charlie - you know me, I will never lose my iSCSI passion :-) This new FCoE hasn't even tempered my thinking on it - I'm still (and will always be) mr iSCSI. Like I said in the post - **iSCSI WILL BE THE MARKET MAJORITY OF BLOCK IN THE FUTURE***. Heck, I even have a standing bet with Chuck and others on the exact degree of market share dominance by 2012 (and I say that by any measure: capacity/ports/revenue). What I'm saying is that for some smaller (but very important) portion of the market, the need for lossless transport is a deal-breaker, so something will need to coexist with iSCSI. I see your argument - that convergence is the full stack, but I haven't heard that from customers - I've heard that to most, it means the CapEx. But, let's be clear - Ethernet as the physical/link layer is a given, and iSCSI immediately makes that a converged solution, period.
Ole, of course, we've tested it a million ways. Long and short: iSCSI works well with IOPs focused workloads with normal latency requirements. in the 8KB IO size wihout jumbo frames, it adds a few percentage points of overhead (i.e. iSCSI PDU fragmentation with the standard Ethernet frame size and associated TCP/IP overhead). A few percentage points is not a big deal. It's higher on large I/O sizes (64KB and larger), but still small enough that most people don't care except in the academic sense.
But you're right - the point isn't that iSCSI is routable - routing iSCSI is a BAD idea. Storage expects latency that's measured in milliseconds if block, and hundreds of milliseocnds if it's NAS (of course NAS can have millisecond latencies also, but as a protocol, it's designed to expect and operate with more). Ethernet switches add latency characterized by microseconds. This means they are effectively invisible from a performance standpoint. Routers of course add latency in milliseconds. iSCSI is a block protocol, so you do the math, and come to the conclusion you did - the routable thing isnt' why iSCSI is successful.
We've done versions of the table I posted earlier - here is a recent (joint EMC/VMware solutions testing effort) big brother at 8 times larger scale:
Here was the ESX view of the storage subsystem - note the IO reaches around 16K IOps, but maxes out at 100MBps - iSCSI piece of cake:
Here was the CLARiiON's world-view - the SPs were pretty busy, and I can tell you that with this workload, read cache is totally useless - the host-generated IO is almost exactly the backend I/O:
So, iSCSI (if you design the iSCSI network like an FC network), performs fine, the encapsulation and Ethernet frames add a few percentage points of overhead, and adds one or two milliseconds - not bad. The processing power? We've found that to saturate a two full NICs with iSCSI burns about 1 core on a modern multicore CPU. For those of us in the era of processor-bound systems, you look at that initially as a big deal, then think about it and go "so what - buying a quad core vs. a dual core costs me $100"
Now - what about a throughput dominated workload like Business Intelligence (or an aggregate guest-based backup from an ESX server with 20+ VMs on there? hence Avamar). In that case, throughput matters, and while eminently possible on iSCSI, with 1GbE, it's a bit unwieldy - I mean, who is REALLY going to have a bundle of 8-10 cables just for IP storage per ESX box? Think about the cabling mess as you scale that up. Now... 10GbE - another story entirely.
BUT - you are right in the sense that you if you have an existing FC infrastructure (both hardware and people versant in WWNs), iSCSI solves a non-existant problem for you - which is why in spite of it's incredible growth (36% CAGR or more), there is very little iSCSI in larger enterprises. Not because it's not good, but rather it solves a problem they solved before.
It brings me back to my original view - if you have the FC infrastructure (hardware and knowlege), don't underestimate the power of being able to ping your storage target. If you do, what will be the impetus for the next change (change being inevitable). I think it's that VMware and mass consolidation, in it's second wave is going to make 10GbE mandatory just for networking, which then leads to "ah, just converge the IP and Storage networks dammit!"
Brian - thanks for the input - I'm curious what others say.... I'm surprised that there hasn't been more of a move here already towards and optical plant. But, most enterprises are still very much using Cat 5e/Cat 6 even for core uplinks and aggregating/trunking like made. Heck, Verizon runs FIOS to the home :-) It just makes me wonder if there is a cost threshold that must be crossed, or perhaps as I think, it's more basic. People who are networking people are comfortable with twisted pair - and it's less logical than it is emotional (we are all human beings after all :-)