By now, it should be obvious to everyone (but surprisingly it's not) that VMware is driving 4 large scale IT trends from an Infrastructure standpoint.
- A raison-d'etre for multi-core CPUs - if you look, it's not uncommon to see 20:1 consolidation ratio with today's dual core and quad-core processors in the dual socket platforms (blade or rack-mount - that's for another post :-) ). Heck with some workloads (VDI as an example), and 8 cores and lots of RAM and ESX-server's memory dedupe, much higher numbers are possible even today - 100:1 even. If you look at the very near future - Nehalem will intro with 4-core dies - but will scale to 8-cores. Each core will have two parallel execution paths (Anandtech does a great job of covering this as always: http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3326&p=4
- Consolidated IO workloads - even before point 1 - even when you move from 1:1 to 20:1 consolidation of I/O becomes a core design bottleneck. It's always shocking to me that people don't realize this. Let's say you're an Dell shop, and standardize on PowerEdge 2970s. How many NICs do you have in a standard physical server? It's usually 2-4 (let's say it's 3 for the sake of being even handed). When you consolidate that only a honkin' R800 with tons of RAM with ESXi, how many NICs do you have? perhaps 12 (4 LOM, 2 x quad-ported PCIe NICs on the riser card). Two go for service console redundancy (you ARE using a redundant service console, right?!?!), then you lose two (maybe one) for VMotion - so you have 8 left. If you use either vmkernel IP-based storage (iSCSI or NFS), you will use some for IP storage, some for pNICs sitting off vswitches supporting Virtual Machine vNICs. So quick math - 60 old physical NICs are now consolidated on 10 NICs (with the potential of less if you're using some for IP storage).
- Shared storage becomes critical - This is true in a good way and a bad way. It's true in a good way that you're bringing stuff that was on lower-availability infrastructure onto things that are more solid and consolidated, and that you're gaining the massive business flexibility benefits you get from VMotion/DRS, Storage VMotion , VM HA, Site Recovery Manager etc. It's true in a bad way in the sense that it's making it a cost prerequisite for many things that use to be a "C:\ on physicaldisk0" is now happily living as a VMDK on a VMFS or NFS datastore. This is forcing all the storage vendors (EMC included!) to think about our infrastructure in a new way.
- Management tools need adapt - It just takes one thing to make this sink in. Let's say you setup an SRM recovery plan. Life is good. Then you svmotion one of the VMs out of the container being replicated - how valid is that Recovery Plan now? Management tools need to integrate with Virtual Center, and have as a core premise that the app container is a virtual entity. This has impacts that aren't immediately obvious - but perhaps more important than all the other stuff put together (topic for another post).
Today, I'm focusing on 2 - and sharing why I think that 2009 will go down as the year when 10G Ethernet takes off, and why VMware will be the thing that makes it happen. Interested? Read on.....
Back when I was in the valley (so this was 4 years ago), a buddy of mine worked at a tiny IP (not Internet Protocol - Intellectual Property) company focused on high-end IP blocks for networking and storage (going after Hi/fn and the others in that space). He showed me their A0 spin, and told me "This chip will do 10GbE BaseT over Cat6 cables, full TCP offload including segment offload, and all iSCSI offload". COOL. "oh and we think we can mass produce it for $25 per chip". DROOL.
Well, fast forward 4 years, and they are out of business :-)
The beautiful thing in Silicon valley is that people aren't afraid of failing, they are afraid of failing to try. And you know what - they were right. They were only early.
At the time, I was at an iSCSI storage startup, so of course it was only natural that I thought it was cool.
Ok - so - what's the point. VMware drives 10 Gigabit Ethernet demand - the reason is the simple point of #2 - consolidated network workload (also why our general recommended backup solution for customers very focused on VMware is Avamar - which does deduplication before the data leaves the ESX server)
In ESX 3.5, VMware added support for a series of 10GbE NICs (NetXen, Neterion, Intel's 10G XR) - http://www.vmware.com/pdf/vi35_io_guide.pdf (check out starting page 24). ESX performance with 10GbE is fantastic (very efficient networking stack). This is covered every VMworld - I'm sure this year will be no different - here's the graph from VMworld 2007 (a great session: "TA43 High Performance Virtualized I/O in 10 Gigabit Ethernet Era" presented by Howie Xu from VMware)
Funny story: We're on call with the VMware team as we were working on qualifying the ESX 3.5 release and asked :
EMC: "iSCSI over the 10GbE interfaces - is it supported?"
VMware: "no, why would anyone do that"
EMC: "oh, trust us, they'll do that. What about NFS datastores then?",
VMware: "no - no vmknic (BTW - this is VMotion and and IP storage) over the 10GbE interfaces supported at the 3.5 release".
Now, a few quick things:
- Don't take this to mean that VMware isn't an IP storage supporter - they absolutely are. They are just resources constrained, like we all are. They are also some of the smartest engineering folks I've ever met.
- iSCSI and NFS absolutely work with the 10GbE interfaces. The support model is clear (at least to me) - if the SW iSCSI initiator is on the HCL with an iSCSI target, or if the NFS server is on the HCL, then you should be good with any Network interface that's on the HCL.
So why did VMware say that? Answer below.
This is the data from this year's IDC "Server Virtualization 2007 Study". This is an annual study (it is published at the end of each year - so this is Dec 2007) of every topic of Server Virtualization and they survey a broad set of customers across a broad, broad set of questions (it has great info like what people are virtualizing, on what platforms, uptake rates of Hyper-V and VI3, how customers are justifying it, and what they're seeing). IDC also tends (at least to my eyes) to be very independent.
One thing - it's not market share, but rather "a market study"- a study of 410 customers - no more, no less. There is FASCINATING info in there - one day I might find time to dig out other nuggets. It's also done annually, so you can see annual changes too, which is nice.
"Chad what's your point?" - there are 30 percent of the customers that have I/O consolidation issues with IP storage, but 100% of customers has an I/O consolidation issue with straight up networking - that's why VMware focused there first.
So - what's it going to be on the storage side? There's no disagreement that the future is an Ethernet-connected future.
Note: Customers currently investing in FC - it ain't going away anytime fast, and you basically are investing in something that solves the consolidated I/O workload for you today - IT isn't about the latest shiny toy, it's about things working - if FC works for you - FANTASTIC.
I am so not into protocol and transport wars - BUT that still doesn't change the fact that the future is Ethernet-connected. So, then what about protocol? iSCSI, NFS, or FCoE? Well - NFS will continue to do well - it works well, there's nothing wrong with it - and it will always have the strengths that it has in the VMware context (so easy to create massive datastores that span ESX clusters or even sites). iSCSI will continue to grow wildly (it is the fastest growing in the market at large, and in EMC's portfolio) and is (IMHO - I'm still in love) the future of the block storage market en masse. BUT, I'm starting to come around on FCoE. There are three reasons:
- In working with the largest customers, there are some workloads at our larges customers that demand "lossless" (look up per-session pause) and ultra-low latency (where literally a few ms is make/break. I'm not claiming that they are everywhere - that's the iSCSI market - but where they exist they are very specific - so those customers need an answer. BTW - When people try to apply that "lossless", "ultra-low latency" DCE (datacenter ethernet) to the storage workloads as a whole (i.e. claim that iSCSI is the wrong way - and FCoE is for everyone), my answer is simple:
"iSCSI works great for many customers today. It does that as is. Don't underestimate the power of being able to ping your storage target"
- The vendors aren't introducing "HBAs that have Ethernet" - they are all releasing "converged network adapters" - a single device that is a NIC (with all fancy offloads) and an FCoE HBA at the same time. If you can have both, and the incremental cost is zero - why not? You can always run an iSCSI stack on top of the NIC!
- Here's the Qlogic example: http://www.qlogic.com/Products/Datnetworking_products_landingpage.aspx
- Here's the Emulex exmaple: http://www.emulex.com/products/fcoe/index.jsp
- Here's the Intel example: http://download.intel.com/design/network/prodbrf/317796.pdf (I'm assuming Intel will go the way they generally do which is a software stack - seems crazy at 10Gbps, but it's not - most customers I talk to are using the ESX native SW initiator or the Microsoft iSCSI initiator and getting great results at 1Gbps - which everyone said would be crazy. In many cases - it performs the same as the hardware implementations)
- All the players are supporting it - there is some writing on the wall factor - and no - it's not a conspiracy (that's a Chad theme - don't trust conspiracy theories - the simple, obvious answer is the right one - Occam's Rule applies). It's simple - it's the only way you hit all the use cases at once.
EMC's been selling a 10GbE target for a while (the NS X-blade 65) but there are very, very few customers. The customers that exist are in very specialized vertical markets - there hasn't been a broad-based reason for 10GbE, particularly at the historical price points. Now, 10GbE LOM is close, and there is a new compelling reason. VMware is that reason. We've done performance testing in that RTP facility with ESX 3.5 and the NS x-blade and got killer performance results.
10GbE also solves the consolidated workload in one fell swoop - important today - CRITICAL in the massively multicore future of 100+:1 consolidation
One thing I wonder about is whether it HAS to be Cat6 (or some other form of twisted pair) to get mass acceptance. Historically this has been the case. I mean 1GbE didn't get adopted until 1G base-T, and the next thing you know, your laptop has it built in. I'm not sure if that's going to happen this time. Twisted Pair is getting hard at these really high frequencies (man - looking back my university thesis was a free-space optical 10Mbps link using hot-off the presses laser diodes that cost a fortune - amazing how fast things move). The other issue is power - very high frequencies, with high loss means very high transmit/receive power.
This is a big question for me - what's the "how much do I need to change factor?" - as has been well covered in the "Innovator's Dilemma" - disruption comes bottom up, not top down, and eventually good enough is good enough.
It's not going to be Infiniband, that's for sure (again, notable exceptions - and EMC will support every protocol - trust me), and I don't think it's optical. But if it can't be Twisted pair (yet to be determined - but taking way to long to be a good sign). I dunno. I think Cisco might be on to something with their new SFP+ and Twinax http://www.cisco.com/en/US/prod/collateral/modules/ps5455/data_sheet_c78-455693.html. It feels more like twisted pair, and passes my Chad "I like grey, cheap, flexible cables, not orange, expensive, cables"
So - if you could have a converged network supporting your ESX servers, with a truckload of bandwidth to each host, with the ability to carve the pipe up for network and IP storage (regardless of NFS, iSCSI, or FCoE - and in some cases both), applying QoS to VM-specific channels and have that carry all the way through the host, the adapter, the fabric to the array - why wouldn't you do it?
Now - here are my questions for the intrepid readers:
- Do you agree with my core premise - i) VMware's consolidated I/O demands a converged, but virtualized I/O fabric; ii) that fabric will be 10GbE, and 2009 is the inflection point year for 10GbE
- What's are your thoughts on the cable plant question?
- What do YOU run today - and what will you be using in 2009, 2010 and beyond?