This is an important, strategic, but relatively complex (IMO) topic – and dear readers, if you don’t know me by now, that translates into what’s likely to be a long post :-) I’d like to thank you in advance for your time and attention investment! Perhaps it will help you decide if you want it, but some of this I presented to the VMware SEs and Services folks on Saturday at their Tech Summit. If you want to get into the head of the VMware and EMC boardrooms and the respective engineering teams, read carefully.
So – at VMworld 2013, a huge central theme is around the path to ITaaS - which requires BOTH:
- a hybrid cloud approach that embraces the public cloud (like VMware’s vCloud Hybrid Service) – because some workloads a better fit economically due to:
- transience [I know I don’t need it for long]
- lack of certainty of scale target [sometimes if you’re starting small, avoiding the “step in cost” of on prem capex can be a win]
- workload variability [if a workload will naturally and permanently vary in scale]
- a well-run on-premise cloud approach that delivers a similar agility value – because some workloads are a better fit economically due to:
- non-transience [if this is known, 1-3 year TCOs tend to favor on-prem capex models]
- certainty of scale target [if this is known, 1-3 year TCOs tend to favor on prem capex models]
- other factors like data gravity [large datasets are difficult to move in and out of public clouds], IO volume [public cloud storage economics outside vCHS are not currently favorable to IOps centric workloads], adjacency to internal-proprietary sources of information [lots of ongoing transit can sometimes be difficult], or workload-specific compliance/security/SLA constraints that for some reason aren’t met by public cloud choices [this isn’t intrinsic, I’m not suggesting the public cloud CANNOT meet these requirements, but sometimes an application cannot be re-written to expect lower degrees of infrastructure resiliency and build in more software resiliency]
The thesis of both VMware and EMC is that continual movement to a more “software defined” model is critical for customers and service providers to succeed at BOTH 1 + 2.
So – if we’re so kumbaya and sympatico between VMware and EMC – what’s up with VSAN? Doesn’t that compete with EMC’s core storage business? So, if so, is EMC ScaleIO a retaliation? What about ViPR? Doesn’t that do something similar as the Storage Policy Based Management (SPBM) and vVol ideas? What do I do when two sales teams tell me to do different things? Who’s in charge here – aren’t you a family?
Dear reader, read on past the break for my view.
First, it’s important to understand the EMC/VMware (and for that matter, Pivotal) core principle of FEDERATION.
IMO, Joe Tucci is a freakin’ genius. One of his mantras is “customer first” – and a philosophy that so long as you think of the customer first, and always, good things happen – and EMC shareholders benefit. I happen to agree (and this is an EMC Presales Manifesto principle).
He realized early that “holding on too tight” is a recipe for trouble, and also opens up more risk of missing a disruption (i.e. it’s better to have multiple parties adapting for survival in a way that is linked, but not so linked that it single-sources things).
The result of which is:
- EMC is independent of VMware and Pivotal. EMC can partner and innovate freely across the whole ecosystem. For example, EMC wants to have the best infrastructure for Hadoop – regardless of whether it’s Pivotal HD or Hortonworks or Cloudera. Customers like that. Pivotal may not, but hey. We will partner with VMware on NSX, but also with Cisco on Insieme.
- VMware is independent of EMC and Pivotal. VMware can partner and innovate freely across the whole ecosystem. For example, VMware wants to deliver a great public cloud service, and does now via the vCloud Hybrid Service. Customers like that. EMC feels uncomfortable about that, because other partners like AT&T are some of our biggest customers, but hey.
- Pivotal is independent of EMC and VMware. Pivotal can partner and innovate freely across the whole ecosystem. For example, Pivotal ONE works well on AWS. Customers like that. VMware feels uncomfortable about that, but hey. Likewise, Cloud Foundry (the open variation of Pivotal ONE’s PaaS stack) was selected by IBM to be their lead PaaS solution. Customers like that. EMC is an IBM competitor in many dimensions (and partner in others), so we feel uncomfortable about that, but hey.
Do you see the pattern? Choice and openness is a very unique thing about the larger EMC federation – I can’t think of another example structurally like this in the industry.
Do you see another pattern? Customers are happy. The Federation members need to work a little harder to win on their own merits and partner naturally (vCHS for Pivotal vs. AWS, Seregetti being the best way to deploy Pivotal HD, EMC storage to be the best with Pivotal HD or with vSphere, etc) – but, the outcome is good.
Do we partner? Yup – like no one else in the industry. We share a common goal, a common vision, and common board – but respect these boundaries and don’t “conspire” to constrain.
So answer to the Q: “Who’s in charge here – aren’t you a family?” is a simple one. A: “Yes, we are a family – but the answer of ‘who’s in charge’ is YOU the customer”.
Before getting any further (and I get DEEP into tech goop that is the outcome), we have the core answer:
- EMC and VMware share a strategy around the SDDC.
- Where it's right, we even share intellectual property (in an open way, using things like cross-licensing) to help
- We look at the world slightly differently.
- We both partner and compete in the SDS space.
- ... and that's good for customers.
Ok – with that out of the way, now lets nerd out (as well as explore the strategic implications), and consider Software Defined Datacenter (SDDC), and more specifically Software Defined Storage (SDS).
What is SDS?
- Decoupling and abstracting control and policy (control plane) from physical stuff that does work – this is important for abstraction, automation, pooling. This is what the NSX Controller does for SDN in the VMware world.
- Where the physical stuff that does work (data plane) can be software on commodity hardware, do it that way.
- Programmable infrastructure APIs: automate everything
If the idea of "control plane" and "data plane" sounds abstract, here's an analogy:
With trains, the control plane is the switch (controls where the train goes, changes infrequently), the dataplane would be the tracks (carries the train's payload, and needs to have the right gage and characteristics for the particular type of train).
Ok - with me so far? Let's examine the first principle (control plane abstraction). If you are EMC and VMware, and you are structured to think through this “customer first” + “federation model” lens, this leads to the following…
Let’s call the diagram below “the world of workloads and infrastructure”. I’m obviously trivializing, but take a look, and then read on.
As you can see – there’s a “choke” point, with an “hourglass” that fans out – LOTS of workload variation, and a ton of infrastructure variation. This point is the natural place to put a control plane that does policy control for compute, networking and storage – and lo and behold, that’s where it is emerging.
If you’re a federation member like VMware, and asking “how do we do SDS control plane for storage”, you end up building this:
You build a policy control layer (SPBM, and over time, vVols add VM-granularity), and look to the ecosystem and say “hey, build VASA providers so we can communicate your behaviors” and “build vCO/vCAC adapters so we can automate you!”. That’s GREAT. But, note that this means that BY DEFINITION it excludes Hyper-V, Xen, and KVM. It doesn’t exclude Openstack, but kind-of-does (example, Openstack will use Cinder to programmatically tell the storage what to do). Is that a problem? Not if you as a customer are all in with VMware. Many are, and are happy – and that’s great.
If you’re a federation member like EMC, and asking “how do we do SDS control plane for storage”, you end up building this:
You build a policy abstraction/automation layer that looks at SPBM, vVols, vCO, vCAC as a critical interface point, but not to the exclusion of Hyper-V, Xen, KVM and anything else. Is this better? Well, through one lens, YES. If you prioritize that openness at that layer. If, conversely looking through a lens of "you're all vSphere ESX" - it adds an unnecessary layer of abstraction. Which is better? Answer comes from YOU the customer (and IMO, the answer varies customer to customer).
BTW - This is what installing ViPR looks like below. Brain dead simple - as both the controller (analogous to the NSX contoller in SDN), but also ViPR data service - in this case webstorage (more on that later) are virtual appliances. It's important that ViPR is VERY different from other "put an array in front of another array" models (think HDS or NetApp vFilers). It's not that those are intrinsically bad (after all, EMC does this with VPLEX and Federated Tiered Storage on VMAX). It's that those models are in the DATA PLANE. The characteristics of the storage dataplane become the characteristics of the abstractor. In ViPR's case, it's analagous to NSX - it deals with the control plane alone, and leaves the data plane of block and NAS storage the way it is.
You can download the high-rez version of this video here!
And, here's how ViPR can abstract, pool, and automate storage heterogeneously (in this case EMC and NetApp), by creating virtual arrays and virtual storage pools, and then plugging those into vCO and vCAC.
You can download the high-rez version of this video here!
But, furthermore, here's it doing the same with KVM and Openstack.
You can download the high-rez version of this video here!
Ok - with me so far? Now let's talk about the 2nd principle of "Software Defined" (data plane done in software on commodity if possible). If you are EMC and VMware, and you are structured to think through this “customer first” + “federation model” lens, this leads to the following.
There are two important distinctions to draw here between the networking and storage domains when it comes to dataplanes. Consider the following comparison and examples of the dataplane of compute/networking/storage:
Do you see a pattern? Well the control plane operations can ALL be done in abstracted software on commodity hardware - they occur on timescales that are infrequent compared with data plane operations.
Do you see one big difference? Networking dataplane requires hyper-latency sensitive hardware operations, which is why switches and routers have merchant silicon, FPGAs, and ASICs for various functions. Could you do it all on commodity hardware? Sure. You can use any off-the shelf linux distro and make a functional switch or router, but it's performance would suck (the little compare operation would take WAY too long crossing PCIe lanes, reading in and out of registers, doing the compare and then the transit again). That's why "SDN" doesn't mean "remove all your switches/routers", it means "overlay/abstract them using a control plane" (whether the control plane is NSX or something else). Will we see new networking dataplanes from the major vendors that presume this model and innovate around it? I would expect we will :-)
But what about the "one of these things is not like the other" in that list? Storage dataplane operations occur in timescales that are measured in milliseconds, or in many microseconds (in my example, 5ms = 5,000,000ns). That's the nature of PERSISTENCE (persistent media, even Flash, is still much slower than DRAM or SRAM – at least for now). That means that the whole world of storage CAN run on pure software stacks running on off-the-shelf hardware. And, as a matter of fact, almost every single array (all in the case of EMC) on the market uses this model - of software using pure off the shelf hardware. If that's the case, why do arrays exist? Why doesn't everyone have a "bring your own hardware" model?
People tend to think the reason is "performance". That's not the answer. The answer is handling failure conditions. This picture tells the story. It's a VNX (previous generation) that is supporting a cloud use case (automated via a control plane and vCAC out the yin-yang) - specifically our EMC vLab. Notice anything?
Well - first of all, it's clearly commodity hardware (the little bit in the lower left are the "brains", connected via SAS to a bunch of disks and flash). But - the observant of you will see the little orange light. Yup - up in the rack on the right, 10 enclosures up, disk 3_10_4 is faulted. Making that little light go on is surprisingly difficult if it's "any hardware". Getting the notification that there is a fault is easy, but manifesting that linkage to the hardware today requires some engineering.
BTW - there is hardware innovation AROUND the commodity components (for example, the storage processor enclosure is commodity, but architected differently than an off-the shelf server tends to be designed - with a bias to bandwidth, more PCIe lanes, and very functional hot-swappability).
Does this mean that all the emergent software stacks - including the storage stacks behind S3, Atmos, Swift (that don't have this hardware dependency) are wrong? What about VSAN and similar approaches?
The answer comes down to our train analogy from earlier. Look at these 3 trains:
One is a cargo train, one is a high-speed/hi-volume passenger train, one is a mag-lev train. Do they use the same tracks (data plane). NO.
This concept applies in storage land. Different workloads (payloads) tend to favor different dataplane architectures. This is why there is such crazy diversity in storage data-plane land - from server PCIe flash, to hybrids, to all-flash arrays, to scale-out NAS, to exa-scale object storage models. So long as we have diverse workloads, anyone who says "one architecture always" is out of their gourd, and you should walk away slowly.
Here's an evolution of a "storage architectures" slide that captures these core types that exist, and examples of folks that play in each type. Note the distinction of whether they are software/hardware appliances, or software + bring your own hardware, or "storage stacks that run co-resident with compute" is not architecturally a difference to the storage stack. To make that sentence make sense for the VMware aficionados out there, technically VMware has two persistence stacks - the VSA and VSAN. VSA fits into the "Type I" model. VSAN fits into the "Type II" model (in the left set of the architectural variation of a loosely coupled distributed stack).
So. Let's get back on track. Let's stay you're VMware - and thinking like a federation member, and say "we need to create a SDS dataplane". You end up building something that looks like this:
You build a VSAN, and you use dataservices like snapshots and vSphere replication. You're completely OK with the fact that VSAN presumes vSphere (and excludes Hyper-V and Xen, and KVM). Thusly, you specifically embed it in the vmkernel, tightly couple it with vCenter (for awesome simplicity and ease of use), and guide the use case where every workload is a virtual machine object and lives on a datastore. This is the best design point for a customer who is 100% vSphere (remember, to some degree VMware is heterogenous too - a customer could use the vCloud Suite without vSphere ESX).
Now let's stay you're EMC - and thinking like a federation member, say "we need to create a SDS dataplane". You end up building something that looks like this:
You use a ScaleIO to act as a software distributed storage stack that work across vSphere, Hyper-V, Xen and KVM. You're completely OK with the fact that ScaleIO doesn't have the tight coupling with VM-only use cases, or "so embedded in vCenter it's invisible" model (because that would eliminate those other use cases). You use your IP for rich data services like vRecoverpoint (expect a post on this later this week), and vVPLEX. This is the best design point for a customer who is not 100% vSphere.
This is what ScaleIO looks like in action - very cool.
You can download the high-rez version of this video here!
I often get asked whether this is a better model than tightly coupled software/hardware appliances (think EMC VNX/VMAX/Isilon/XtremIO, NetApp FAS and Engenio, HP 3PAR, etc...)
The answer is that if your storage scales with compute, there's an argument for it. There's also a degree of operational simplicity of not needing to deal with shared storage. I would suspect that for customers whose entire storage needs can be met with IaaS (i.e. they don't have anything else), models like VSAN and ScaleIO will be very compelling.
The downside of these "distributed transactional storage stacks" are they all use multiple copies of data across nodes (as the "server" is the "FRU" or "field replaceable unit") and are more "expensive" when it comes to "$/GB" and "$/IOps" terms (a little counter intuititve, but hey, VMAX Cloud Edition is lower cost than Amazon EBS – while delivering a lot more performance and availability – perhaps is the ultimate proofpoint). For customers with multiple workloads outside IaaS, I would suspect that Hybrid arrays (clustered and distributed) will continue to dominate, and for specific workloads, the all-flash arrays will be really important.
The other case is where the workload has extreme reliability/availability/serviceability characteristics. This is where tightly coupled distributed storage stacks in appliance model (think VMAX/HDS) really tend to win.
Will be interesting to see how this plays out over time. One observation – once again, customers win. So do EMC and VMware shareholders, as this is a new storage market, one we're playing in to win!
Now, what about the other common type of "software only" storage - the web object storage market? In this market - best known for AWS S3, Openstack Swift, Facebook's Haystack, and EMC Atmos, you don't deal with an array at all (you just use object APIs), and it's VERY distributed.
If you look at the 3 architecture buckets I listed above, this is the "type 3" model - distributed object stacks with "exascale" design points. BTW - "non-POSIX filesystems" are things like HDFS, a kissing cousin of object storage models.
Not familiar with this type of thing? I'm not surprised. It doesn't exist commonly within enterprises, and never underneath vSphere (because they are not transactional). BUT YOU USE IT EVERYDAY. Huh?
If you use Facebook and post photos, you're using that type of storage. If you use Dropbox or Syncplicity, you're using that type of storage. If you post an add on eBay, you're using that type of storage. If you use iCloud, you're using that type of storage. If you make queries on Google, you're using that type of storage.
This new category of storage is IMMENSE. They are all software-only, and like the distributed transactional stacks (VSAN/ScaleIO) assume the "server" as a unit of failure. To put it in perspective, EMC's biggest customers by capacity, (with tens to hundreds of PBs being a "unit of acquisition") are in this category. We've started calling these "sea of storage" use cases, with HDFS variations being "data lakes".
This is YET ANOTHER another type of "software defined storage" data plane model, one that is designed for this specific type of workload.
Hey - I know it's VMworld - but look, there's a big EMC megalaunch coming up on Sept 4th... TUNE IN :-)
Wrapping up this long rambling diatribe, I hope it helps explain. Yup - when it comes to SDS - there is a LOT going on. It's more than just marketing (though also clearly a huge buzzword today). EMC and VMware are 100% aligned, but also in a sense compete for your business:
- EMC and VMware share a strategy around the SDDC
- Where it's right, we even share intellectual property (in an open way, using things like cross-licensing) to help
- There is no single storage architecture which is “right”. There are some that can service broad workloads, but none that service all well
- EMC and VMware look at the world slightly differently about what varies in the overall “stack”, and what is constant.
- We both partner and compete in the SDS space.
And that's a great thing for YOU as a customer and for EMC shareholders. I would love your feedback. Does this help? What do you think of the strategy and the technology? Has this blog post been annoyingly long? :-)
How this news will affect the relationship with CISCO and/or VCE?
Posted by: Luisma | August 27, 2013 at 08:20 AM
Excellent analysis as always Chad. EMC and VMware SEs and, of course, customers appreciate your perspective and analogies!
Posted by: Jeremykeen | August 27, 2013 at 11:48 AM
Chad,
And ViPR Control plane plays very well into this world of "Differentiated Workloads" by pooling together different storage designs under a single pool of storage with multiple "Service Levels" based on Scale, Performance, Availability and deployment models.
Posted by: Yochai Gal | August 28, 2013 at 05:47 AM
You answered my concerns on partner/compete strategy in a wider view. Really appreciate the analogy and the stack breakdown. Should be labeled as a MUST READ for engineers working in SDN world!
Posted by: ching ching ching | August 28, 2013 at 05:38 PM