I hinted at this last year for VMworld 2011... This concept – tentatively called “VM Granular Storage” is a real twist, and has immense potential. It won’t materialize in any significant way for some time to come – so it’s more about airing out the idea, sharing what VMware and EMC (and the broader storage community too) are thinking.
I was sworn to secrecy (this was an NDA-only topic), but VMware is opening up on the topic a bit more...
Duncan blogged about it here:http://www.yellow-bricks.com/2012/08/07/vmware-vstorage-apis-for-vm-and-application-granular-data-management/
And, you can watch VSP3205 from VMworld 2011 for the key concepts here:
Since VMware has outed it a little more – I’ve made a YouTube video EMC made in 2011 that shows these concepts in more detail (EMC made it to help when Vinay and Satyam were producing the content for VSP3205 – and you can see bits of it in that session) also public here:
To understand what this idea means, and also some of the other use cases we’re showing at VMworld 2012 read on!
What is VM Granular Storage all about?
- The “Datastore” is a clumsy concept in the world of virtualization. It means that the storage hardware layer can only apply policy at that level rather than the more natural level – which is a Virtual Machine, or grouping of Virtual Machines. All the things that do “VM-level” operations today are good (examples include the VM-level visibility in EMC Unisphere or the new VAAI NFS Fast Copy supported by EMC VNX and Isilon), but fundamentally a little bit of a hack. The storage subsystem – even when it’s NAS doesn’t really KNOW that a given file or set of blocks is a Virtual Machine, and there is no API to communicate that policy to the storage.
- The management model of storage is… off. Don’t get me wrong, we’re all working to make this more integrated, easier, more automated – but man, managing multipathing and configuration of datastores at scale is kind of sucky. Ideally – the management complexity wouldn’t be linked to the number of VMs and Datastores.
- The current storage policy layer (SDRS/SIOC) and policy communication vehicle (VASA) and hardware acceleration (VAAI) are a step in the right direction, but insufficient. If you think of it, these are all modelled around the datastore, and necessitate all sorts of “place and move” logic that if the storage could respond and adapt to changing policy requests at the VM level of granularity – you would be able to do it all much more efficiently.
- It would need to be able to work on all sorts of different storage models – from Block, NAS – anything transactional (Object storage models would implement VM granularity easily, but tend to be bad for transactional workloads) – and would need to be something people and partners could “step into”. This is an important idea. I’ve said it before, and I bet I will say it again – people underestimate the impact of “persistence” into storage innovation. Since data persists on storage (as opposed to memory state, or contents of CPU registers, or network buffers which are all transient) – it means that storage hardening is tough. It also means that migrations are hard. These are all things that all the storage vendors live and die by – and it means that “cut-over disruptions”, or things that completely invalidate mature stacks in a single step typically struggle. So… VM Granular Storage would need to be an idea the storage community would need to be able to step into.
The concepts VM Granular Storage introduces – IO Demultiplexers (both block and NAS), Virtual Volumes (vVols – another shorthand for the concepts of VM Granular Storage), Capacity Pools are all new ideas. Watch the videos to get the idea. The names are ones only an engineer could like :-) I personally hate the “Capacity Pool” one the most – it’s actually a IO/GB/Policy pool. It turns the idea of LUNs/Filesystems on it’s head – and says “hey, storage admin, carve the infrastructure into pools that can deliver a pool of IOps, a pool of capacity, and various capabilities from snaps, dedupe, encryption, whatever.
BTW – SDRS, VASA, VAAI are all ideas that are “versioning” us to this future state. SDRS will be the policy control later. VASA today describes datastores, but ultimately will communicate the capabilities of Capacity Pools. VAAI of today will turn into the vVol-level VAAI operations of tomorrow.
This topic can also be considered part of “Software Defined Storage” in the “Software Defined Datacenter” vision – it’s only missing the analagous idea of decoupling the control plane from the infrastructure and running that in software on commodity via an open API model (ala OpenFlow) – and yes, we’re working on that too.
Now – everyone starts thinking about this in terms of primary VM storage attributes today…but it could enable more.
Not only could this make performance/availbility policy on a per-VM basis, but also could integrate with other things too. Wondering how we might be able to do EMC VPLEX Geo on a per-VM basis? Wondering how we could auto-configure things like HA, Host Affinity by the storage layer communicating whether this VM can be stretched between places? You can see how this might start to work. We’re aiming to show this at VMworld 2012 in Barcelona (hinges on getting the latest engineering drop of code over the next couple of weeks).
Another thing it could be used for would be to integrate and offload VM-level activities across integrated use cases. This is the example we showed at VMworld 2012 in San Francisco this week. Please bear in mind – this is pure technology preview – but pretty darn cool if you ask me! Thanks to Chris Horn and others in the EMC BRS team for helping pull this together.
This example shows:
- vSphere Data Protection (jointly developed and leveraging EMC Avamar technologies) asking to take a backup. This process requires a VM-level snap as part of the process.
- The storage used is an EMC VNXe – which is running prototype code that supports this VM Granular Storage (note how nicely this is integrated in the Unisphere build – you can see the progress made since 2011)
- The VM lives on a “Capacity Pool” and in a “Virtual Volume” with a set of data services, including VM-accelerated snaps.
- the engineering build of vSphere uses a VM Granular Storage API call to ask for a Virtual-Volume level snapshot, which is used, accelerates the backup
Check it out below:
You can download the demo in high-rez MP4 format and WMV format.
The whole thing is, in effect, invisible – but that’s good. IMO, this highlights how infrastructure will innovate around the changes that the Software Defined Datacenter will drive, and the things that will be of value in the future. Remember that we can hack at it today on VNX and Isilon, and do management integration – but all storage operating at this VM-level is a big change demanding a lot of engineering.
BTW – this concept of VM-granular storage is something we’re fully invested in (as you can judge for yourself – look at the demos we have done in 2011 and now) – I can say that for VMAX, VNX, Isilon, XtremeIO, BRS – heck pretty well everything, it’s something where VMware and EMC are working very, VERY closely together.
What do you think? Are we off our rockers?
Cool stuff indeed. I've encountered issues in the past with vADP backups not working on certain virtual machines that don't behave well with VMware's software-based snapshots, and there was no way to use a hardware-accelerated snapshot with vADP. This looks like the solution, only wish it wasn't so far away!
Posted by: INDStorage | August 28, 2012 at 01:07 PM
Hi Chad,
It is a difficult IT paradigm shift for most people, but Data must be classified and treated with the right service-levels related to the true business relevance. The days of treating all data the same are quickly disappearing with big data and massive multi-tenant designs. VM Volumes are a step in the right direction and enable data classification at volume granularity. Over time it seems even a lower level of granularity will be required. The VMDK is quickly becoming the new “junk drawer” full of relevant and reference data all mixed together.
BTW – “IO Demultiplexers” may be technically accurate, but is the worst term ever ;)
Posted by: Pete | August 30, 2012 at 09:01 AM
@INDStorage - you got it. Trust me, we're working furiously towards this.
@Pete - I was actually using shorthand, vVols are actually at the VMDK level of granularity. If you look at the example in the video, there are actually several vVols created for a VM - for metadata and the VMDKs.
BTW - I TOTALLY agree that "IO Demux" is the worst term ever :-)
Posted by: Chad Sakac | September 07, 2012 at 01:30 PM