In a nutshell: Today we announced and demonstrated “Project Caspian”.
What is it?:
Project Caspian is intended to be the best industrialized software stack – offered in software-only, and industrialized converged infrastructure to deploy “cloud native applications” for customers pursuing a “pure open source software + commodity hardware” platform 3 design center.
I want to be clear – the hardware is the LEAST interesting part of Project Caspian (in fact, the commodity hardware is very similar to what you saw in VxRack).
What’s more interesting is the software stack – which is designed with a “clean sheet” approach looking at exclusively P3 workloads.
Project Caspian “industrializes” a pure (directly from the trunk) OpenStack implementation into a turn-key solution. On stage, in the demo, we also demonstrated Project Caspian’s roadmap as well – being a turn-key delivery platform for Cloud Foundry and all the major Hadoop distributions (Cloudera and Hortonworks/Pivotal ODP). It is a Rack CI offer, and one designed for webscale, cloud native applications.
Why care?
Well, on one level - it’s really cool. It’s also an open source story at it’s core.
On another level – it reflects the 2nd way that the federation is tackling these new workloads (the first way being the VMware VIO and “Cloud Native Apps” efforts) – and is a proof point of our core tenet of “Choice”. Some will choose VIO + Photon + CF. Some will want a 100% open source model.
On perhaps the most important level – it reflects a important new way of tackling net new workloads and generating innovation by industrializing open source for the enterprise, something many are tackling, and EMC is doing it head on.
Ok – read on for background!
In my experience, there’s an interesting “fork” occuring in how many customers look at infrastructure.
- Some look at their existing “platform 2” application requirements and the virtualized/hardened infrastructure that supports it and say: “a) I can keep improving OPEX through ‘software defined’ automation/abstraction and CAPEX through applying newer technologies like industry standard servers and SDS/SDN – call that “platform 2.5”; AND b) I can use that same infrastructure and team for these new funky platform 3 ‘cloud native’ apps!”. The same stack, same team, same operational model.
- Some look at their existing “platform 2” application requirements and their “platform 3” application requirements and say: “these two are so different (both architecturally and operationally) that I will go ‘full bimodal’, with one stack optimized for platform 2, and another optimized for platform 3. I will also likely have two teams – one that operates around a focus of ‘reduce risk through careful process’ (like ITIL), and another team that has an integrated DevOps culture and structure and operates with a focus around ‘speed and iteration’”. Two stacks, two teams, two operational models.
This is really a fork in the road – and the jury is out on which of these is the “right choice” (I suspect the answer varies).
If you put together VMware’s announcements from the beginning of the year with today’s tech preview of Project Caspian, you can see how the federation is tackling BOTH of these routes.
For customers that see themselves in the first route… VMware Integrated Openstack and VMware’s approach for “Cloud Native Applications” (See more on Project Photon here) represent the first of the two approaches. This approach looks at workloads and starts with a foundation of looking at the abstraction model assuming a kernel-mode VM (in which you can have everything, including containers). This approach assumes the base persistence strata is mostly a transactional layer (VSAN/ScaleIO). It’s the manifestation of the “One Cloud” approach. EMC will support that with hyperconverged offers in both open and flexible engineered systems (VxRack that we demonstrated on monday) and appliances (the EVO program).
For customers that see themselves in the second route… Project Caspian is the “pure platform 3” approach, and the answer when you start with a “clean sheet of paper”.
This picture is a visualization of the idea:
There’s no value judgement in this – Prius is awesome, so is Tesla. For some customers, evolution is the answer, for some, revolution.
There’s an important note that flows from that…. Nothing in Project Caspian focuses at all at apps that need infrastructure resilience. Put a “Pet” workload in it – and it will not do well. Caspian’s software stack is built exclusively for “Cattle”.
It’s built using an “open source always” model. It views the workloads as having some Nova instances, a lot of containers (Rocket, Docker, Diego), some bare-metal (for next generation data fabrics – which have their own abstraction) as the low level abstraction layer. It also has less transactional open SDS than VxRack – and a LOT of Object and HDFS via ECS and in the future DSSD. Object and HDFS tend to be the volume persistence layer for “pure P3” apps.
This customer decision tree looks something like this – and the “go left” or “go right” choices have no “value judgement”:
Also note that Project Caspian could very well fit into part of the “RACK” taxonomy of CI I talked about on Monday herehere.
Project Caspian’s software stack is also really about scale. It’s just not optimal if you don’t have a fair amount of scale. It’s not that it can’t be small, it’s just not the sweet spot. Also, it’s not just about scale.
Remember, “RACKS” and “APPLIANCES” can both use hyper-converged storage/compute designs – but “RACKS” bias towards “Flexibility” (in other words, a broader variation in personas, and hardware configurations) and “Appliances” (even those targeting rack scale deployments) bias towards “Simplicity” (narrower variation in personas and hardware configurations). Project Caspian has to cover a broad range of more disaggregated compute/memory/persistence – as at web-scale, people don’t use appliance form factors.
Put it this way – it would have to be able to run on a broad range of the stuff that’s in the Open Compute Project.
Here’s the Project Caspian Demo we did today:
As I noted, the hardware used in Project Caspian is not the main point. The main point is the softare. The software is a cool story in itself – and has “OPEN” at its core:
- OpenStack. The CloudScaling acquisition had multiple purposes. One was to get EMC better at open source software. One was to inject experience in “industrializing” Openstack into the Project Caspian team. CloudScaling’s approach to Openstack was to try to stay as close to the “vanilla” OpenStack core as possible – and that’s the approach here.
- The Fabric. This is a fascinating story for another day (more work needed), but if you imagine what is needed to make something like this work and be elastic, you need a Cluster Manager. What do I mean? Well, Kubernetes and Mesos are examples of Cluster Managers. So are core elements of Cloud Foundry. Cluster managers are things that manage deployment, health and state of units of software on a strata of infrastructure. All of those examples are optimized around stateless/ephemeral workloads. What do I mean? Well – when you blam out a bunch of Redis instances, if one goes “poof”, you just restart a new one. But – there are some workloads that are STATEFUL. There is some relationship between them. The example that makes people “get” this the most quickly is to imagine an object store. When you deploy an object store, there is a relationship: put the first instance here, the second instance in another rack, and the third instance in another datacenter. When something goes bump in the night there is a relationship: oh oh – the 105th instance of the object store just flipped out because the hardware died, I need to restart it – but can’t restart here, I need to do it where I can maintain the state system rules. This highlights that a stateful cluster manager needs to have a more sophisticated set of mechanisms for handshaking, dependency mapping and more. In Project Caspian – the Fabric is a sophisticated home-grown stateful cluster manager. Of course all of the webscale folks have their own – so far they have only open-sourced their stateless examples (Google open-sourcing Kubernetes as an example). There’s more to this story – and more another day – but this could be a valuable contribution to the community at large.
- The Persistence Layer. Platform 2 workloads that want to use some of the ideas from “platform 3” (like “scale out SDS + commodity HW” = persistence) tend to bias towards a lot of transactional stuff. That’s why VxRack 1000 (open persona) and the VxRack EVO:RACK persona and VSPEX Blue are best characterized as “built for P2.5” and “able to run P3 workloads”. When you target P3 – you end up with only enough transactional storage (ScaleIO) to boot stuff and a small amount of performance-oriented database… But you end up with a ocean of object, HDFS (and hyper-performance in-memory persistence). Project Caspian uses a little of ScaleIO as an open transactional persistence layer, but not much of it (just enough for Cinder to consume in support of the Nova instances). It’s primary persistence layer is the Object/HDFS layer provided by the evolution of ECS Software.
- The Hardware Abstraction layer. I’ll do a standalone post on this topic (big enough and cool enough to get it’s own post) – so read about it there, but Project Caspian will use OnRack (tm)
What about the hardware? Here are 3 examples of Project Caspian builds – each with different core/memory/persistence mix.
The orange one uses the next-generation version (Haswell/Broadwell based) of the 4 module/2U design used in the VSPEX Blue hardware. It would be good for general purpose, and would use a mix of ScaleIO and ECS Object/HDFS as it’s persistence layer (it has a moderate amount of storage/IOps). The persona is a mix of Openstack, CF, and a moderate amount of Hadoop/Object.
The yellow one targets a much denser core count, and a smaller amount of persistence capacity (but lots of IOps via local SSDs). The persona mix a large amount of Openstack, CF, and a small amount of Hadoop/Object.
The blue one targets a persistence capacity design center, and you can see two things: 1) the fact that Project Caspian builds on the ECS appliance experience; 2) the next-generation ECS is actually a Project Caspian variation – one that is very capacity-centric. Also look at the crazy capacity density! It’s on this that the mix is a small amount of Openstack, CF and a large amount of Hadoop/Object.
In the future, we will also include DSSD in these configurations when the persona mix includes a lot of in-memory data fabric and hyper-transactional workloads. You can see the space in the middle of the racks (and the fact that we separate the racks into networking/failure domains) above, and if you look at the examples below – you can see that DSSD can nicely fit right in there – and use PCIe/NVMe connectivity to all the hosts in the rack… Hence DSSD D5 is “Rack Scale Flash”. You can see that these are IOps/latency persistence layers that just melt faces.
Netting it out – Project Caspian has a laser-focus: creating the industries best hardened and industrialized Platform 3 stack for customers who are going “full bi-modal”, with a full embrace of open source and commodity hardware models.
It’s a “tech preview”, but we’re dead earnest. The first customer council to bring customers into the inner fold will be in June, and expect more on Caspian a little later in the year.
In this new phase of the OpenStack’s, Cloud Foundry, and Hadoop community and lifecycle – it’s a race to try to make these open-source models work well in the enterprise.
There is a false meme out there that no one knows how to make money around open source software. That’s not true.
- Some companies are taking the “services” centric model (think Mirantis). That’s great, and works.
- Some companies are taking the “support” model (think RedHat). That’s great, and works.
- Some companies are taking the “value in proprietary on top of opensource” (think Cloudera). That’s great, and works.
- Some companies are taking the “appliance” model (think Barracuda).
We’ve recently seen many of the early “industrialized OpenStack” offers (think Nebula) move on, and there is a need in the marketplace for people who make this easier for enterprises to consume.
Enterprises who have tried to deploy and maintain OpenStack have highlighted how hard this can be. Those that have tried to deploy Cloud Foundry on premise have said “unbelievably awesome once it was up – but even harder to get running right than OpenStack”.
Project Caspian is our effort to create the best way to deploy an “industrialized” platform for platform 3 “cloud native apps” via is a clean sheet design for vanilla OpenStack, Cloud Foundry, and the major Hadoop Distributions..
It will be interesting to see if the giants (as I’m sure others are working down this path) can make it work for “P3 purists” – and ultimately customers will chose. Those that turn to EMC will choose their path. Pure platform 3 = Project Caspian. “P3 on top of P2.5” – will go VIO and VMware’s Cloud Native Apps efforts, and run CF and Hadoop on top of the vSphere big data extensions.
As a federation, we’re all in, and playing to win!
Would love your input, your thoughts! Interesting stuff!
Great post as usual Chad. Thank you for the update on P3 automation. Lots of research by lots of companies in this space especially with the growth of object storage.
Posted by: DennisFaucher | May 08, 2015 at 07:13 AM
One of the most exciting announcement for me... Can't wait to see what comes next!!!! Really smart/nice move, specially with more and more applications being built cloud-native, a common/scalable/flexible platform becomes mandatory.
Posted by: Victor da Costa | May 10, 2015 at 03:49 PM
Chad, do you know when Caspain will be available for download or if there's an early beta program for the software? Thanks.
Posted by: Thomas Tar | May 12, 2015 at 10:48 AM
@Dennis - thanks for the comment!
@Victor - it still needs work, but it's cool to get a peek at what we're working on!
@Thomas - it's a tech preview, which means "not before the end of the current quarter". We're starting the first customer council here in June.
Posted by: Chad Sakac | May 13, 2015 at 06:00 PM
Kewl stuff...customers rarely understand what their needs are and this decision tree allows for their preference to drive the decision.
Posted by: sandy haddon | June 18, 2015 at 04:57 PM
Can't wait to see Caspian in action. Any ideas on integrating Isilon with Caspian for Hadoop workloads?
Posted by: Macario | June 25, 2015 at 05:33 PM