Today – EMC acquired a small startup called ScaleIO that joins our Flash business unit – welcome to the folks there! You can read about that here: http://www.emc.com/about/news/press/2013/20130711-01.htm
It’s in the family of distributed storage stacks (think of the storage stacks in VMware vSAN, Simplivity, Nutanix, Isilon, XtremIO, VPLEX, VMAX) rather than the family of “clustered” storage stacks (think of VNX, NetApp FAS, Nexenta). The distributed storage stacks have as their core design “spread the data around”. Clustered storage stacks have the data being behind a given brain, and accessed through that brain (sometimes with an additional federation/abstraction model)
SOME of these are ones that have a tight “software + hardware coupling” and others don’t, and in turn use a “bring the hardware you want” model.
ScaleIO is definitely in the “it’s software” + “bring your own hardware” model. It gloms all the SSD, PCIe Flash, HDDs in any number of hosts (could be tens, could be thousands) and then they can be sliced up and used in any number of ways. There’s also a distributed caching layer in there as well.
It’s very interesting stuff. While early on, it will accelerate EMC’s XtremSF (SF=server PCIe-based flash) and XtremSW (SW = software that adds cool stuff to the server storage) solutions – you can imagine all sorts of interesting ways the technology could be applied. [Update – one interesting thing to think of as a ViPR data service on top of COTS hardware]
I think it’s clear that at EMC, we think the world of how information is persisted and handled is being disrupted on a whackload of fronts, and are investing to win, and willing to disrupt ourselves as needed. On the one hand, you have distributed object stores (HDFS/S3/Swift/Atmos) where we innovate organically like crazy (Atmos, ViPR). You have distributed NAS stacks (Isilon). You have this new category that leverages persistence layers in the server (ScaleIO, vSAN and others). I’m not saying by definition we’re going to win in all categories, but darn it – we’re going to do everything we can to make that be the case.
As always – there are those that focus on one architectural model and claim that it’s the way always – that’s not us. Our view is there’s a continuum from the server to cold capacity-optimized storage (including where backups/archives live), and everything in between.
At VMworld – I’m going to explore these architectural models a little more at the engineering level in some of my sessions.
A couple asides:
- It’s interesting to me that often everyone often biases towards thinking “distributed” is intrinsically better. It’s not always true. Inherently they spend a lot more resources on distribution, caching, lookup than the the mature clustered stacks (VNX) and tightly coupled software/hardware distributed architectures (VMAX). This ultimately manifests in $/IOps and $/GB metrics. Distributed architectures are also sometimes harder to add data services.
- It’s interesting to me to watch that in the “Software Defined” marketing orgy that is going on now (that yes, EMC contributes to as well), it’s interesting that most folks auger in on “software implemented + bring your own hardware”. I have a news flash. All storage arrays have been “software implemented” for a long time, with only a few hold-outs focusing on some marginal hardware-based mojo. Don’t get me wrong, the “bring your own hardware” model has some advantages, and I get that. Here’s an interesting question – what’s wrong in this picture?
Do you see it? Yup, in the 3rd cabinet on the right, 10th enclosure from the bottom, 4th disk from the left has failed. There’s a little light on. This is a VNX without the faceplates (in fact, it’s one of the VNXes that supports the EMC vLab in Durham, NC), so in Unisphere, there’s an alert, and if you are using further management software to manage many platforms, the error propagates upwards.
When you want a ZFS stack – you can pick Nexenta, or many of the other variants out there. They work on almost any hardware. BUT, if you want an HA configuration, the requirements narrow down much more specifically. This is mostly because handling failure conditions, and doing physical and logical notification is important, and not easy on “any hardware”. This isn’t to pick on ZFS. While there is a NetApp edge platform that is a VM-only, likewise, it’s a non-clustered, relatively simple thing. Again, this isn’t “bad”, it’s a reflection of this intrinsic engineering challenge.
For the “distributed” stacks – they seem to go at it another way:
- Tightly couple the hardware (while Nutanix and Simplivity are of course “only software”, they are packaged as appliances) – Isilon and XtremIO fall into that category for EMC. Count on the fact that we’re continuously trying to reduce any and all hardware dependencies – but a lot of it has to do with these failure conditions.
- Others (vSAN in it’s first release will fall into this category, and so does ScaleIO) offer much more rudimentary notification and handling of hardware failures, and depend on data distribution as a protection model.
This question of handling failure conditions, and how the architecture of the software storage stack is built is fundamentally why although all storage stacks are implemented as software, some have hardware/software packaging.
That’s why I keep sticking to my guns on the technical definition when it comes to all things “software defined”:
- Control plane abstraction and separation from data plane.
- Control plane implemented in software, and able to operate on a wide variety of data plane implementations (any range of software/hardware models). Each data plane implementation drives all sorts of “operational envelope” results – economics, performance, failure behavior, etc.
- Ultimately the services (independently of whether we’re talking compute, network, storage, or other infrastructure services) consumed through the control plane as programmable software – with nice, clean APIs.
Just my 2 cents :-) Feedback – as always welcome – this is just one person’s opinion!
Sounds like more EMC hardware bias. I want discrete hardware stacks with a common management platform. Give me VMWare with whatever storage best meets the need for the particular business function and cost management container for the case.
Posted by: GW | July 11, 2013 at 04:54 PM
When I look at technology like this combined with hyper scale server platforms such as project moonshot I begin to finally see an emergent path for infrastructure side of technology. This provides the groundwork for the evolution of software from silo'd management heavy deployments into massive scale self managing systems.
I can now see imaginations becoming reality. Not a far stretch to see how systemic things will become as developers are less and less constrained by technological lock in and minutia of deployment. It's but a short time before applications are able to determine where their data lives and how it's protected. Real time analytics built in making decisions about dependencies, access, security, IO profile, etc. DR and HA determined by SLA and self implemented with open information available from the internet and the internet of things regarding trending fault potentials of the hardware said application lives on, data centers those reside in, and networks they communicate over. Costly migrations a thing of the past. Can't wait to see how far this goes.
Thanks EMC for yet again being innovatively disruptive at the right time with the right move. Still wondering where you found that magic crystal ball...
Posted by: Joseph Angeletti | July 14, 2013 at 01:13 AM
Just like the success of VMware has been related to Intel’s capacity to increase its CPU’s power and functionalities, I think that the success of this “distributed” model of infrastructure will heavily depend on the capacity of new PCIe standards to maintain what they are promising.
The new PICIe features (increased bandwidth, PCIe switches and fabrics, cable connections to interconnect different chassis) could really be disruptive with respect the current approach to the Data Center infrastructure.
Just search the internet for things like: PCIe 3 over Cable or PCIe Fabric...
Posted by: Icilio Pascucci | July 15, 2013 at 01:05 PM