A couple of quick trite observations, and then on to business :-)
- My friends, colleagues at VMware are great technological serial innovators, but are can sometimes stumble at product naming and numbering :-) I’m sure it’s a charge that’s been lobbed EMC’s way – so hey (why ECS 2.2 was not called 3.0 is a perfect example – at least we coordinated our .2 use :-) To call this Virtual SAN release “6.2” does this as much a disservice as calling NSX 6.2 “6.2” or ECS 2.2 “2.2”. This is a BIG, BIG release. It’s more correct to think of it as Virtual SAN’s “fourth generation”, and on a personal note, a congratulations to the engineering and product team!
- SDS is an idea, not any one product. Just like there’s no “one ring to rule them all” in storage stacks in general – there will be multiple SDS stacks for different use cases and products. A nice feature is that being software, in some cases, they will coexist on the same hardware strata. EMC and VMware together are have overwhelmingly the strongest SDS portfolio – bar none. Think of it this simple way:
- Virtual SAN is our best Federation high-performance transactional SDS stack with the deepest vSphere integration for customers uniquely focused on vSphere use cases – if that’s what you’re looking for, look no further. Get it here.
- ScaleIO is our open, heterogeneous high performance transactional SDS for customers who need an SDS for a broad range of use cases (inclusive, but not limited to vSphere) – and therefore will never be as integrated with vSphere as VSAN, but gains several strengths in return. Get it here.
- Isilon SD Edge is our unstructured NAS/Object stack. Get it here.
- ECS is our Object/HDFS geo distributed stack. Get it here.
- ViPR Controller is our open-source storage control plane (also a form of SDS – all of the above are data planes). Get it here.
- Hyper-converged = overused buzzword, but an important idea. To me, Hyper-converged represents a system architecture – one where compute and persistence are co-located. This system architecture is enabled by software. But – there are multiple consumption models:
- “DIY SDS that can enable hyper-converged architectures” (think VSAN, ScaleIO, NDFS, and all their ilk as an “ingredient”)
- “DIY SDS with some validated hardware in hyper-converged building blocks” (think VSAN-ready nodes, think VxRack Nodes”
- “Full Hyper-converged Systems” (which have SDS, run on industry standard hardware, and tightly integrated management and orchestration – and depending on scale, need other things too).
- Even that last category: “Hyper-converged Systems” is not a single “one thing”. I personally (and this is the strategic view of EMC – hey, I run this business :-) think that “hyper-converged things” have multitudes of system-level design variations. Here’s a taxonomy that works for me:
- “designed to start small” = Hyper-converged appliances
- “designed to scale BIG” = Hyper-converged Rack Scale systems
- If you like soundbites, stop at A and B – but if you seek to understand, keep reading. Pause and think about it more deeply – are these in fact different? After all – they look similar, and are commonly marketed together. Today, Gartner doesn’t (yet – I think they will need to) make the distinction (though in my experience customers do, consciously or unconsciously). They REALLY look similar: They are hyper-converged. They both use software on industry standard hardware – so aren’t they the same? With an appliance it’s “just use any switch you want – whatever”, and it’s about “click, order and go – don’t think about any other factors than the appliance”. Conversely, when you scale big – things that are “optional” at small scale – like they inter-rack spine/leaf network fabric, SDN overlay/abstractions, the necessity for broader industry standard hardware abstraction for more modularity/composable stacks…. these cease being “optional”, and become an absolutely critical part of the system. That means not just thinking about how to design it in, cable it, etc. – but rather deep integration into the full package and management stack. There’s a reason why none (!) of the hyper-scale or web scale players use hyper-converged infrastructure appliances. NONE. What they have is they have architected their own hyper-converged rack-scale systems.
There’s a lot of marketing mumbo-jumbo that conflates using software defined (using open or closed source software stacks), and industry standard server hardware as “web scale”. Yes, that’s how web scale architectures are constructed – but that analogous to saying “trees are green, and have leaves – but not all things that are green and have leaves are trees.”
Enough preamble, let’s talk about the awesomeness that is Virtual SAN 6.2!
So what did VMware launch today?
Virtual SAN 6.2, which is the following:
- Virtual SAN 6.2 is an industry leading SDS for transactional workloads – and with 3000+ customers, the run rate is massive. I talk to more and more customers using VSAN, and the feedback is exceedingly positive. In a trip last week that included visits to several of the giant New York financials, one mentioned that they were very, very happy with their recent test using the Beta of 6.2 – so anyone who thinks VSAN is only for “small” customers is off. BTW – another giant financial I visited that same day is moving fast and furious with ScaleIO (because they wanted heterogeneity in spite of being mostly vSphere, and needed deployment variability from fully hyper-converged to a dense storage two tier model where clusters presented SDS to distinct blade farms). So – anyone saying “SDS transactional storage stacks are hype”, or “having anything more than one SDS stacks is bad” is, well, “off” based on what I hear from customers.
- Virtual SAN 6.2 is a Hyper-converged software “ingredient” that is deployed and used in all forms (software only, VSAN-ready nodes, and in hyper-converged infrastructure appliances)
- Virtual SAN 6.2 is a modern SDS that is loaded for bear with the latest data services, all-flash configurations – and can enable $1/GB usable price points in all-flash configurations. At EMC, we think that we have crossed the point where all-flash configurations are THE ANSWER for all transactional workloads in a modern datacenter. While we support hybrid configs – for transactional workloads, all flash is the way to go, period. BTW - I still get infuriated when people say “all flash for everything” – it belies ignorance that not all workloads are transactional, and that it’s probably coming from the mouth of someone that has one product J
- Virtual SAN 6.2 is wicked fast, and wicked low latency – even when the system load is through the roof. This is because it is a kernel module – not in the guest/user space. This is where you want certain critical parts of transactional (read “low latency under all forms of load”) storage things to be.
- Most importantly – Virtual SAN 6.2 is the BEST, SIMPLEST, MOST INTEGRATED SDS (bar none!) for customers that are uniquely focused on vSphere, that have made vSphere their standard – they don’t need to look anywhere else.
I’ll say it again – Virtual SAN 6.2 is a HUGE release:
Support for All-Flash configurations.
VSAN is a cached architecture, which is why there is always a SSD requirement – that SSD is in effect acting as the write cache. But, in VSAN 6.2, you can have very powerful all-flash configurations – where you configure the bulk of the SSDs/Flash for the purpose of capacity. This means that the cache doesn’t do any read cache (as the NAND is fast, and read cache has reduced benefit), and that in turn means more effective write cache capacity. While of course the performance of a VSAN node is configuration dependent, VMware notes that ~100,000 + write IOps per node are possible. That’s monstrous. (BTW – a post for another day, this is a notable ScaleIO/VSAN difference)
Data Deduplication & Compression
Data Dedupe and Compression are built around the architectural capabilities of all-flash systems – and are both inline. You need to enable it for the whole cluster, and for customers already using an earlier version of VSAN, it will go through and update all your VMs. This requires a non-disruptive low level format change – and you can speed it up if you allow a period of reduced redundancy VM by VM as they transition.
The way Dedupe works is it is performed as the data is in the cache tier, and uses a fixed 4K block size. It is done across a disk group, so if duplicate data that is in another disk group, it will not be deduped. I suspect that this will trigger the debate of “one vs. multiple disk groups per node” (read Duncan’s post on the topic here). I personally believe that all-flash configs will rapidly become the dominant deployment model – and will bias to smaller rather than larger disk groups – but I will defer to VMware for the official position.
Compression is done right before the data is committed to the capacity flash tier. Kind of neat, they will only compress the data if there is a 2x or better effect – otherwise just write the 4K block as is (to minimize computational load).
Together with Erasure Coding (more on that next), these data reduction and efficiency approaches are important because they bring all-flash configurations into “no-brainer” territory. VMware claims up to 7x data reduction, and configuration that can have an effective $1/GB cost – though of course the data reduction rates will vary based on data. This is critical for all workloads – but obviously virtualization (general IaaS and VMs) and VDI/EuC are workloads that are materially affected by this economic effect.
I want to strongly re-iterate something: a modern datacenter architecture in 2016 uses all-flash for all transactional workloads. PERIOD. XtremIO, and the new all-flash VMAX that uses the densest and lowest cost 3D NAND also bring the $/GB into the same price bands – and at that point, why do Hybrid?
Now, this isn’t a critique about VSAN – but I want to point out some things that highlight why at large IaaS/EuC scales, things like Vblock 540 with XtremIO will be more efficient. Longtime readers often go back to this “Understanding Storage Architectures” post. XtremIO is a tightly coupled cluster (a type II), and VSAN is a loosely coupled cluster (a type III). Since it has a very tightly coupled design with extremely low-latency interconnects and a shared memory space – XtremIO’s dedupe domain is all the data on the cluster, so in practice at scales will have higher data reduction rates, as well as a very, very consistently low latency.
This isn’t about “VSAN versus XtremIO” – it’s that practitioners want to understand the tools in the toolbox, and figure out where to use the best tool. VSAN scales like crazy, so you can start small, and then keep growing, but if you KNOW you’re going to be north of around 3000 or so VMs/EuC instances – you’ll tend to find that Vblock 540’s (if you want converged solutions) or XtremIO X-bricks are more cost-effective from a capex point of view.
Now, in favor of SDS models, there’s another interesting observation that has nothing to do with capex, which is that SDS models have a certain operational simplicity and flexibility. You don’t do “frame upgrades” with VSAN or ScaleIO as an example. Migrations are non-disruptive. Scaling is non-disruptive. Updates are non-disruptive.
Erasure Coding
This is huge. In general, most SDS models use a form of data/object/chunk mirroring to protect against node failure. Prior to VSAN 6.2, the same was true of VSAN. The obvious downside was that this results in a “right out of the gate” effective capacity reduction that can be a minimum of 33% and 50% maximum.
However, now you can use a new Storage Policy-based Management (SPBM) policy which uses Erasure Coding rather than mirroring objects. Since it’s an SPBM policy – this is a “per VM Object thing” (in fact you could have a policy for each VM disk) which is cool.
While the analogy to RAID 5/6 makes sense, it’s important to realize that what we’re talking about there is a distributed parity value across hosts, there’s no RAID controller :-)
Failures to Tolerate (FTT) being set to 1 or 2 result in these effective required capacity.
Again – note that the effective utilization rates are lower than in “external storage array” category, but this is really leading in the SDS domain, and ultimately this factors into the “total solution cost” equation – also very important for all-flash systems and economics.
Also – note that unlike other approaches that use erasure-coding, VSAN 6.2 doesn’t implement this as a “post process” or “cold” task. It’s fascinating to hear people who have professed (the nice ones have been “professing” – others that are less polite have been screaming the argument at anyone who dared disagree) for a long time that “data locality” is absolutely critical and paramount now pivot to “it’s OK for data to not be local” (erasure coding demands that you NOT have data locality). With VSAN – you don’t have to limit your use of erasure coding to “some” workloads (that are colder by nature). Go to town.
VM-level QoS
This is the beginning of a rich set of QoS policies which are controlled by SPBM, so like protection policy, QoS is a VM-level object policy.
You can set IOps limits in VSAN 6.2 which can quell the “noisy neighbor” challenge, and expect the sophistication of the QoS engine to expand out over time to be even more flexible.
There’s a lot more in Virtual SAN 6.2, including end to end CRC checks and disk scrubbing for silent latent errors, Client Cache code changes that make all workloads perform better, and make EuC workloads rock even more, and much improved Embedded health and performance monitoring – not as vCenter plugins, but embedded directly.
As you can see – a HUGE release – congrats to the VMware team!!!
So….what are we as EMC doing about this cool new release? EMC is embracing VSAN in two important ways.
First way EMC is leveraging VSAN: as pure software. VSAN is an incredible SDS for customers uniquely focused on vSphere – and can be acquired from VMware, EMC or our mutual partners.
Now – that said, people ask about use cases that favor VSAN and which use cases favor ScaleIO.
To help understand “where VSAN, and where ScaleIO”, here’s a simple way that both VMware and EMC have collaborated on to understand their primary focus:
..This then maps to a simple way to think about the best way forward, and which guides (no hard and fast rule) towards which technology to use when..
I want to be clear – VSAN scales awesomely. VSAN can support mission critical apps. The decision path for VSAN/ScaleIO is NOT about scale or performance. And, while VSAN can absolutely be (and will – stay tuned!) be used in Hyper-Converged rack scale systems (which if you like my distinction above – requires full integration of the networking domain, and less modular, more disaggregated approaches), the customer observation is that at large scale, heterogeneity starts to become more prevalent. You bet there will be customers who want (I know them right now) who are uniquely focused on vSphere in at Enterprise Datacenter scale, and who will want a Hyper-converged rack scale system using VSAN/vSphere/NSX tightly coupled in the core, and Hyper-Converged appliance using VSAN/vSphere at the enterprise edge.
The primary decision factors that steers one way or the other are not really scale or performance per se, but rather the three things on the bottom: customer complexity (for example, some customers need to have configurations that are NOT hyper-converged, but are rather blends of 2-tier compute-only and storage-only nodes – for operational, density, or political reasons), workload variation and the tendency towards vSphere homogeneity or heterogeneity.
Also – there are areas in the Enterprise datacenters where the answer is both – so we’ve made this simple – both VMware and EMC offer a simple SDS bundle that entitles the customer to BOTH VSAN and ScaleIO.
Now haters are gonna hate J Some will claim that one SDS is the cure to world hunger, and peace in the middle east. It’s become so laughable because “one thing is best” is a “zombie lie” (thanks Bill Maher who I’m paraphrasing: “zombie lie” = a lie that doesn’t die in spite of being clearly, demonstrate-ably and evidently wrong).
Note a pattern – people with one storage stack think that it’s the best for all workloads, all the time. Coincidence? J
VMware and EMC are blessed with the industries best SDS portfolio – and Virtual SAN 6.2 makes it even stronger.
Second way EMC is leveraging VSAN: to build the industries best hyper-converged infrastructure appliance.
Hyper-converged infrastructure appliances depend on their SDS stack in the sense that it defines many of their attributes. But – they go above and beyond in a critical dimension – a fully integrated stack, inclusive of the management and orchestration stack that makes deployment, node add/remove, and system updates to be a single-click affair.
Something awesome this way comes – click on the image below, and save the date…
Tune in a week from now on Feb 16th to see what VMware and EMC have been exclusively working on for a while. I’m pretty excited to share it with you :-)
Hi Chad
As Jase (MCCarty) pointed out the erasure coding is per disk :)
"#VSAN62 Erasure Coding, is it per VM? Nope it is per object. So a single VM can have multiple “Failure Tolerance Method” policies Mirror/EC"
That would look perfect even for traditional workloads (Imagine the usual database with a mix of raid 5 and raid 1 disks to separate log and data files).
Thank you
Have a great day!!
F.
Posted by: Fabio | February 11, 2016 at 01:10 AM
Yeah, Everything is awesome !!!!
/s
except one tiny detail: Vsan licensing model - makes it not feasible option for many biz cases. Obviously it's bundled with view - the perfect use case, but Imo Vmware needs to unshackle it from heavy burden of overpriced licensing model for it's to shine.
Posted by: BSA | February 11, 2016 at 10:30 PM
Great article! Very clear and useful to shorten someone's nose xD
"The obvious downside was that this results in a “right out of the gate” effective capacity reduction that can be a minimum of 33% and 50% maximum"
I believe, this numbers are for EC not Mirror configuration! Looking to the table below: FTT=2, 5 Host, 3x capacity req - so it's 66% cap reduction
PS Links (Get it here) in 2b,2c and 2d are broken
Posted by: DmitriyKmitty | May 30, 2016 at 11:12 AM