Ok – once the marketing panache settles (view it as good, as bad, as fun, as evil – your call – but it does what marketing does, and does it well) – what’s the TECH behind what was announced today?
There are 4 big parts:
- Part 1 – Next Generation VNX.
- Part 2 – Important “Halo” releases around VNX – Appsync, XtremSF
- Part 3 – ViPR General Availability
- Part 4 – “Project Nile”
I’m doing to break this into 4 blog posts – and then link them, but let’s start with the one that will get the most attention/argument/debate/competitive response (and by definition the most important tactically – as this is a $5B product in a $15B TAM) and then I’ll end with the one I think is the most important (long term), which is “Project Nile”.
What’s new in the next generation VNX?
- There is a ground-up re-rewrite of the core block abstraction stack that is the foundation of any storage stack called “MCx”. This is a foundational chunk of software that will last for the next decade in EMC’s implementation of “scale-up + multipurpose storage” – including the next major revs at the extreme low-end, revs of the NAS stack, revs of software-only variants, revs that include running other workloads co-resident. There are 43 patents pending on this. It’s a HUGE change – and something new and unique in this market segment. To get a peak into what I’m talking about at the engineering level, read this blog post from Steve Todd.
- There are huge performance increases across the board (result of the first MORE than the hardware refresh, but yup, there’s a hardware bump too)
- There are important improvements in the data-efficiency improvements (thin, snap, replicate, dedupe, compress)
- There are important improvements in the NAS stack.
- Important changes where more is included with the platform (VNX Monitoring and Reporting; Unisphere Remote for monitoring multiple arrays; Unisphere Quality of Service; Unisphere Analyzer are all included)
Often lost in these things are the “voice of the customer”. That frankly matters more than any pontification on my part or anyone else pontificating. Hear E*trade below:
Want more? Here’s Cbeyond. Here’s another:
“The new VNX is capable of handling our File needs without performance degradation. 98% of our environment is file, which our legacy vendor was not able to support.”
-Principal Storage Engineer @ $2.5B Global Semiconductor Company
I have a ton – but their voice is consistent – happy with their VNX, excited about the next-generation.
Yet, invariably, when something is done by a leader in the industry, you see debate. Some is in the mud slinging category (I never approve of it when it comes from EMC), but there are two reasonable debates that I’ve seen commonly seen amongst people I respect. They are these two, which I’ll take on directly here:
- Statement: “scale-out is the future, scale-up is the past”
- My response: There’s no doubt that scale-out offers a simple scaling model that is nice and elegant. There’s also a reason why there are no true scale-out (I’m making a differentiation vs. “layer federation on top of Scale-up” models – which IMO tends to end up being not super-hot when it comes to all the reasons scale-out can be good) designs that are really good in a multi-function way, covering a broad set of use cases, protocols, and functions. Most focus (arguably a good thing) on either “transactional”, or “streaming”, or “NAS” or “VMs only”. Fundamentally, the scale-up designs have simpler code stacks than scale-out designs – which is why they get features, function, use cases, broader storage sweet spots. That “broad sweet spot” why there is a “multi-purpose” storage product category. There’s also an interesting observation – it’s really hard to scale a true “scale out” design down – WAY down. At small scale (think of 1-4-ish nodes), the scale-out design strengths start to turn into weaknesses. What do I mean? Well – in these terms:
- compute/storage density being intrinsically linked (every node has storage cost + compute/memory cost – which don’t generally rise linearly together in terms of requirements). At large scale, it’s a simplicity tradeoff. At small scale – adding a node to just add 10-20TB of capacity adds unnecessary cost.
- IO density not being as good (in general, most server/storage platforms don’t have the disk density of enclosures)
- Dependency on “multiple copies” in a RAIN model starts to have a cost impact that is material. (while you CAN have SOME dedupe/compress/thin – all these models depend on looking at the server as a failure unit – which means they DEPEND on having multiple copies of content). Again, at scale, this can be outweighed by simplicity drivers – but at small scale, it means that utilization rates that are really low are common with scale-out.
- Am I saying “scale out BAD?” NOPE. Remember, EMC doesn’t roll that way. XtremIO, Isilon, ScaleIO, VMAX are all EMC scale-out assets, each that rock. None of them scale down into the VNX5200 (and below band). None of them have the “broad sweet spot” of the multipurpose arrays like VNX. But – conversely, they each have places where their design sweet spot means they smoke a VNX, even the new ones. That’s the core “why” of the EMC strategy of serving all these workloads the best we can, and not getting myopic about one architecture always.
- My response: There’s no doubt that scale-out offers a simple scaling model that is nice and elegant. There’s also a reason why there are no true scale-out (I’m making a differentiation vs. “layer federation on top of Scale-up” models – which IMO tends to end up being not super-hot when it comes to all the reasons scale-out can be good) designs that are really good in a multi-function way, covering a broad set of use cases, protocols, and functions. Most focus (arguably a good thing) on either “transactional”, or “streaming”, or “NAS” or “VMs only”. Fundamentally, the scale-up designs have simpler code stacks than scale-out designs – which is why they get features, function, use cases, broader storage sweet spots. That “broad sweet spot” why there is a “multi-purpose” storage product category. There’s also an interesting observation – it’s really hard to scale a true “scale out” design down – WAY down. At small scale (think of 1-4-ish nodes), the scale-out design strengths start to turn into weaknesses. What do I mean? Well – in these terms:
- Statement: the “mixed purpose” arrays are under competitive pressure from “single workload” arrays and “converged” IO/compute stacks.
- My Response: Yes, they are :-) Short answer – that’s why EMC is diverse, and aim to be best of breed in every category. It’s NOT a good strategy to be a single architecture storage player long term. Longer answer - usually when people bring this up, they are referring to new entrants that are either “all-flash arrays” (of which new ones appear and in some cases die off so quickly, I won’t list them), or things like Tintri (arrays targeted at a single use case – VMware), or things like Nutanix or Simplivity (that are distributed software storage stacks that run co-resident with compute on vendor-provided servers). Every customer has choices. Some customers choose to deploy one thing and run every single workload as a VM. I get that. Some customers choose to deploy multiple platforms to support their diverse workloads. I get that too – particularly at larger scale. Other choose to deploy in a model where they are looking for a single platform to support a very broad set of requirements really well. That’s the market that VNX serves – and none of the others listed above target that, and that’s A-OK. Conversely, if customers are looking “all-flash”, EMC’s play there is XtremIO – and we think it’s great (read about how XtremIO supported the VMworld HoL here, and here). If customers are looking at storage stacks that can be deployed on servers in a distributed software stack (and frankly, bring the best servers to bear), choose VSAN if you’re all VMware, choose ScaleIO if you’re not. The reasons customers and partners choose EMC (and more do every day) – we offer choice. Purpose built storage? Check! General Purpose storage? Check!
I did a post on the Next-Generation VNX new features at VMworld here.
There’s a great post that Jason Nash from Varrow (a great partner, but even he made some of the observations above) did here.
I’ve been asked what current generation VNX customers should expect to receive. The NAS improvements port back. Some of the VAAI XCOPY improvements are targeted to port back. Many of the other MCx improvements are highly dependent on system memory – and I wouldn’t expect them to port back.
Now, I’m going to try to contribute something NEW to the online dialog here – and share a WHACKLOAD of performacne data beyond the marketing slides, and discuss what it means… Read on!
Now, while the theme of “speed2lead” was all about maximums, and people will focus on “hero numbers”, the point of the VNX family is that they scale DOWN, and people have expectations of data services, and a wide variety of mixed workloads (if you have just one workload you want to run – there MAY be better ways to do it).
That’s why the point of the MCx re-write is so important. It is the basis for everything – data services (thin device behavior, snapshot behavior, replication behavior, dedupe behavior, VAAI behavior), and also for NAS behaviors.
I want to share some of the performance testing internally to EMC, but also from our customers.
TLU Performance:
TLU = a thin device. This is a pooled LUN, and one that uses an array thin-provisioning model. On all arrays (of any kind) the mechanisms for “indirection” vary, but all use a form of a meta-data map of some sort. Implementations differ wildly (for example, XtremIO has a model where 100% of the IOs are “indirect” and go through a distributed hash lookup – this is the essence of their inline dedupe). Regardless, this is a place where our customers demanded improvement in VNX – performance is one thing, but performance with layered data services is another. The design target of VNX “Rockies” aka the next-gen VNX was to deliver TLU performance that was as good (minimum) or better than think device performance on the previous generation.
In the graph below (the test basis for the 2x increase in the number of VMs recommended in VSPEX configurations) – we tested a VNX 5700 vs. a VNX5800 with a similar configuration, but where the older VNX5700 was DLU, and the current generation was configured using TLUs.
Notice that the total IOPs delivered was close, but the CPU utilization was around half (driving a 2x higher total scaling factor). But remember – one was thick, one was thin.
Here is another example. Here, it was a jetstress test of a 5500 using the Inyo codebase, and a 5600 running the Rockies codebase. There’s a DOUBLING of TLU performance with a pure like-to-like configuration, but if you use a little big of flash in the pool, it’s a 198% improvement in the number of IOps.
This last example in this sequence is comparing all three device types (TLU:DLU:FLU aka “thin pooled”:”thick pooled”:”classic non-pooled”) under an OLTP workload.
The note is important – the key with these thin/snapped/deduped workloads is that the amount of metadata doesn’t exceed the amount of system DRAM, necessitating reading/writing some of the meta data from persistent media when that happens changes the performance envelope dramatically. This is one of the reasons why backporting MCx is not only hard, but could actually hurt some customers (more dependency on system RAM than ever).
What about NAS?
We have customers of every size and scale that use VNX for blended use cases that include NAS. In this blog post, I talked about a lot of the NAS improvements – around non-disruptive VDM mobility (think “vmotion of enterprise NAS”).
One of the largest VNX customers in the world (I think they might be THE largest?) did a whackload of testing (thank you to the CSE team for sharing!) of CIFS/SMB performance deltas comparing the previous generation (no slouch after all – VNX was well known for having great performing CIFS/SMB implementation. Not only does the VNX refresh include full support for SMB 3.0 - which translates to a great choice for running Hyper-V and other apps stacks, but also for features like BranchCache, but also HUGE performance increases.
Here, we ran a bunch of IOmeter tests with increasing worker counts against a VNX5700 and VNX7600. Each datapoint represents more workers/threads – increments of 36, 72, and 144. This test was really to look at the throughput (IOps) and latency at various loads.
This first one is 100% random writes, 8K IO size.
The curve is markedly different between the previous gen and current gen. BTW – a compare between a 5700 and a 5800 would have been better, but the shapes are similar.
The next test is a 100% random read example. Here the older gen did a little better – but you can still see a night and day difference.
And this last one is a 50/50% random read/write mix.
Moral of the story – huge performance increases – at all scales, and across broad use cases, and protocol models.
So… Are you a VNX customer? Happy with what you have today? What do you think about the new stuff?
Wow. Amazing. Glad to see analyzer included with the core product now. Incredible performance improvements. I'll always have a soft spot in my heart for the Clariion, glad to see it keeps getting better and better.
Posted by: Nicholas York | September 04, 2013 at 10:41 PM
Hey Chad,
VNX customer, totally pissed off that a 1yro,2yro arrays won't receive the juice that MCx is expected to offer. It's all good and all evolution and because of memory and because of new hardware (hey, wasn't it "all about the software" ?), but there's one thing that stinks to high heaven: investment protection. Total, epic, fail.
I'm not talking about a dusty rusty clariion or celerra... and even IF I'd be willing to spend MORE money for another hw upgrade (which I'm not), why should I believe that my new investment would be protected this time?
Seriously, this matters to me (and most, I do believe) a lot more than a x% improvement on a workload most of us won't ever ever have.
If there's one good thing about all this stuff is that it gives even more credit to the SDS "vision", and the inherent/implied evolution that make us think and hope that this kind of "strategy" (buy big, cry in shame) isn't going to be viable for the future.
Posted by: PJ | September 05, 2013 at 03:26 AM
@Nick - hope you're well, and yes, it does keep getting better and better.
@PJ - I'm sorry. Now, there is a whole bunch of things that customers like you are getting. So - if you liked what you bought (a VNX), and are happy with what it is doing - it is going to get better - AT NO COST TO YOU.
- You will get VDMs and VDM mobility - non-disruptive mobility of NAS workloads. THIS IS HUGE.
- You will get a big lift in transactional NAS/SMB performance.
- You get SMB 3.0 (and BranchCache).
- You will see XCOPY performance improvements all getting back-ported.
One more thought - don't misunderstand the full software-only storage data services planes - they all also have minimum requirements, just not a specific hardware platform - this actually is similar to the root thing here.
If that's insufficient - I've emailed you directly, and will work with you and your local team.
Posted by: Chad Sakac | September 11, 2013 at 02:03 PM