This is the sister post to the one on a new bandwidth record – here.
At EMC’s mega-launch in January, we commented that 2011 would be the year of EMC breaking records. We weren’t kidding. See here, here, here, here, here, here, here, and so on,….
So.. With the release of every major vSphere release, the EMC and VMware performance engineering teams get together and brainstorm: “what would be a ridiculous, over the top test to see where the performance envelope is today?”
This time – the gang at VMware (Chethan Kumar and others) and EMC (Radha Manga and others) said “lets see how much bandwidth we can drive through a reasonable config”. So they got cracking in the lab in Santa Clara. A few weeks later – here you have it:
The new world record benchmark for bandwidth (MBps) through a reasonable vSphere configuration is 10GBps.
That’s around 4 the previous record. This is the story behind the story :-) Read on for more.
To pull this off, we thought a single VNX7500 could easily do 3x the bandwidth of the previous generation CX4-960s, so we got a VNX7500 with 320 spindles.
For the host, we could have gone for a monster host (see the 1,000,000 IOps test as an example), but fast back of the napkin math showed we would likely (? – see below, more work to do) be bottlenecked by the PCIe and the host adapters even on beefy hardware.
We reached out to Intel who nicely sent us a series of Urbana 2U hosts (Nehalem-generation hardware). We also reached out to Cisco who nicely sent some Cisco Nexus 5548 switches. So – we had the below:
Here’s an interesting twist.
Intel mentioned they would love to see testing with the Intel x520 Network Interface cards.
Now – for those of you that don’t stay super-close to to this space, Intel’s approach is to use commodity adapters commodity silicon and then apply a software FCoE stack. This is opposed to the other models on the market, which follow more “traditional hardware FCoE” implementation. vSphere 5 also supports the software-based FCoE intiator on the Intel x520 hardware – so we said “hey, what the heck – let’s see what we can find?!”
Here’s how the VNX was cabled (basically default)
For the IOmeter workload, here’s how it was configured:
- 1 VM per Server, 2 x vCPUs, 4 GB Memory
- 100% Read
- 100% Sequential
- IO Size – 1MB
- IO Offset – 1MB
- Number of outstanding IOs – 12
- Number of Managers – 4, 1 from each VM
- Number of workers per manager – 4
- Number of LUNs per manager – 4, 1 to 1 mapping
- Ramp-up – 5 minutes
- Measurement duration – 20 minutes
Net?
Wow. That’s 80+Gbps. BTW – we still had headroom on the VNX in that config. For another perspective, that’s the equivalent of the max spec’ed bandwidth of a first-gen UCS6100, and about half of the max spec’ed bandwidth of the more recent UCS6248UP.
DISCLAIMER: Remember – this isn’t intended to be a “realistic workload”. It is intended to simulate a realistic workload. Realistically – there are few workloads that on 4 hosts generate this kind of sustained bandwidth (yes, yes, they exist of course – but let’s be honest, they are rare). It’s not an irrational workload (for that we would have picked much larger IO sizes). It’s the kind of workload that stresses both vSphere 5 (including the IO stack and the Intel Open FCoE driver), the network, and the EMC VNX.
Why do we do this? Well, first of all, because it’s fun, and kinda cool :-) Second of all – what we’re highlighting is that when people say “I can’t virtualized workload ____ because of IO”, the actual limits are so far beyond the realm of mortal workloads, people should virtualize the things that matter with confidence.
Now – all kudos here go to Chethan and team, and Radha and team. Lots of followup work planned. The monster server from the 1,000,000 IOps work is en-route back to the west coast for post-VMworld testing, so we’ll push the envelope of what you can drive through a single host – what else would you like to see?
Are there only 4 back end buses active on the VNX7500?
Posted by: Tony Watson | September 01, 2011 at 06:46 PM