The vStorage APIs for Array Integration (VAAI) have had a huge impact in the VMware and Storage landscape. One of the truest tests (this was actually one of my MBOs in 2010!) was to see how many RFPs started to ask specific questions about VAAI support and roadmaps. Needless to say, most large storage RFPs that cross my desk these days do exactly that.
The answer is simple. VAAI remove bottlenecks, and offload tasks that are “expensive” (i.e. place a heavy load on ESX resources) data operations to storage arrays which in many cases can be in a better spot to support them. This has the effect of improving performance, scale, and efficiency – which is fundamentally a good thing.
A great proof point is this doc (click if you want the PDF) that shows the effect of the vSphere 4.1 APIs on a VMAX. I’ve pulled out a couple of the graphs – which show 3-8x improvements in time to complete tasks, and as importantly, a HUGE offload on the network and the ESX host.
Now – I do want to call attention to the fact that this hasn’t been perfectly smooth (new things rarely are). There have been issues, and missed opportunities:
- Implementation bugs (fixed) and optimizations we could have taken but haven’t (still underway). For more on these – I would HIGHLY recommend checking out this blog post.
- The vSphere 4.1 VAAI calls made little use (and had no material awareness of) array thin provisioning, which is rapidly becoming the “defacto” provisioning model – as it is not only more efficient, but also because it enables people to minimize provisioning tasks and simplify storage provisioning to macro “pool level” operations.
- The first round of VAAI APIs were block only (no NAS) and also in some cases, the SCSI T10 standard hadn’t “settled” by the time of the vSphere 4.1 GA – resulting in extra “plugins” needed to match the various states of implementations across the vendor community.
So, what’s new?
- in vSphere 5, not only is vSphere/VMFS “thin aware”, but it actively exploits and leverage thin provisioning at the array level for both Block and NAS storage models.
- Wherever possible, “more standard” approaches have been used – with the goal of not needing any vendor plugins for block storage models.
- There is now support for NAS hardware assists which not only help with Thin Provisioning, but can leverage array functions (like file-level snapshots/versions).
And yup, this will be supported on EMC’s portfolio: VNX has the next VNX OE beta called “Franklin” underway right now (which has all the new VAAI assists, including both block and NAS); VMAX has core support is already in Enginuity – after the VAAI vSphere 4.1 experience, is undergoing heavy testing; VNXe has VAAI support (along with VASA and SRM 5 support) targeted at a Q4 software update, and Isilon has a Q4 target.
Read on for more details!
OK – there are a total of 5 new APIs that are generally used and a 6th which is there for potential use with future releases of things like VMware View. I’m not going to talk about that one – we’ll save it for another day.
Block – Thin Provision Stun
Without this new API, when a datastore cannot allocate in VMFS because of an exhaustion of free blocks in the LUN pool (in the array), it causes VMs to crash, snapshots to fail, and sorts of other badness. This isn’t a problem with “Thick devices”, as allocation is fixed, but then again, thick provisioning models aren’t as efficient. Conversely, Thin LUNs can absolutely fail to deliver a write BEFORE the VMFS is full. Think of it this way. Thin provisioning is like the banking system :-) Horrifying as it may seem, your bank COULD NOT fulfill everyone coming in and withdrawing their funds at once, as they leverage everyone’s savings to gain investment benefits. As scary as that is – it’s that system-wide liquidity that makes the modern banking system work.
Of course, as people have realized, there’s a certain amount of capitalization needed in the system (which is best expressed as a question of risk, rate of change, and percentage of total assets. Some places call that “government regulation” – I call it being “smart” :-) Storage thin provisioning is the same. Careful management (setting thresholds and actions – and monitoring those) at the VMware and Array level are needed.
As a visual example – here we have three VMs that get created. With each one, there are a series of VMFS-level filesystem allocations that occur, each one that consumes an LBA range, writing data. When the last one is going – the storage pool at the array level is exhausted, and the write fails. When the write fails, the VMFS layer goes nuts, after all, a write just failed against an LBA that it thought it had. Cue VMs crashing and other badness.
With the VAAI Thin Provisioning stun API, the array responds with a different error, which results in the VMs all being stunned and an error message like this one showing up in the vCenter UI – much better:
This gives the administrator an opportunity to expand the thin pool at the array level, and saving the day. It does not eliminate the need for setting thresholds and actions. With any reasonable array – these thin pool thresholds/autoexpansion/alerts are robust – they certainly are with EMC. In vCenter, you can monitor datastore utilization using the datastore reports – piece o’ cake to setup.
For those of you keeping close score – this is the “hidden” 4th API in vSphere 4.1 (which has been in CLARiiON, VMAX, NS, and VNX for more than a year :-) Grab me at VMworld 2011 and I’ll explain :-)
Block – Thin Space Reclaim
Without this API (i.e. today) when VMFS deletes a file, the file allocations are returned for use at the filesystem layer, and in some cases, SCSI WRITE ZERO would zero out the blocks. The circumstance that causes this is pretty obvious – anything that deletes a file at the VMFS layer (delete VM, delete snapshot, Storage VMotion). If the blocks were zeroed, you could use array-level manual space reclamation (Space Reclaim on VMAX, LUN Migration to Thin target on VNX) at the device layer could help – but there is clearly a better way.
Here’s a diagram of “what happens”… Let’s say that we create two VMs. Each of them creates a set of files at the VMFS layer, which in turns issues a whackload of SCSI WRITE commands to blam stuff down. As they write, they consume space (free blocks) in the thin pool.
When you subsequently delete one of the VMs, VMFS deletes the files, which may or may not actually issue SCSI WRITE ZERO (in this example, it wouldn’t – the filesystem would simply reallocate the pointers for future use). Regardless, the second “chevron” on the right (the consumption of the 2nd file) WOULD NOT be returned to the free pool.
With the block thin space reclaim API rather of SCSI WRITE ZERO (or just leaving the blocks untouched and simply reallocating the VMFS pointers), SCSI UNMAP is used. With that command, the array releases the LBA range specified back to the free pool.
Sweet and simple. This is a much more efficient provisioning model – and also beautiful and transparent.
With vSphere 5, SCSI UNMAP Is used anytime VMFS deletes a file (svMotion, Delete VM, Delete Snapshot, Delete), and also SCSI UNMAP is used in many other places where previously SCSI WRITE ZERO would be used (think the “hardware accelerated zero”). SCSI UNMAP use depends on VMFS-5.
BTW – there’s one important thing that TODAY vSphere 5 doesn’t do – thin provisioning commands are not “punched through” the virtual disk layer. This means that if a guest issued a SCSI UNMAP command to a virtual SCSI device, it wouldn’t result in a space reclaim at the array level – only vmkernel level space reclaim works. This is something you can count on in the future (and there may be interesting ways to do it before then, hint hint).
NFS – Full Copy
Since VI 3.x, NFS has been a handy dandy storage option in conjunction with VMFS, but hadn’t seen any of the VAAI goodness that came in vSphere 4.1 That’s changed in vSphere 5.
Ok – first of all, consider what happens when you create a VM clone on a NFS datastore without any API goodness. The NFS client reads the whole file to the vmkernel and the vmkernel writes it all out – often to the same NFS server, heck, sometimes the same filesystem.
There must be a better way. Of course there is. Some NFS servers like EMC’s VNX have the ability to create copies of a file in a filesystem (or across filesystems) quickly and easily.
Now, this feature was not used for VMware operations triggered from the normal vCenter/ESX fucntions (which were traditional host-based file copy operations like I noted).
Vendors would leverage them via vCenter plugins. An example is that EMC exposed this array feature via the Virtual Storage Integrator Plugin Unified Storage Module, which could employ “file versions” to rapidly create huge numbers of VMs file-level copies.
Now, with the new API, this same function is called during vSphere “clone” or “deploy from template” operations. There are a few notes. Like the first wave of Block VAAI functions, they are integrated via a Vendor plugin. Also notable that while this is the “NAS analog of the block Hardware Accelerated Copy (XCOPY)” operation – unlike that API call, it is NOT used during Storage vMotion.
Hint – there is another “hidden” NFS API call not used right now, that can be used for future use cases that is very similar to this one.
NFS – Extended Stats
Unlike with VMFS, with NFS datastores, vSphere does not control the filesystem itself – and therefore only has visibility to what it can get via the NFS client (or other out of band technique). With the vSphere 4.x client and no out of band mojo – only basic file and filesystem attributes were used.
This lead to challenges with managing space when thin VMDKs were used (which happened always on NFS), and administrators had no visibility to thin state and oversubscription of both datastores and VMDKs.
This may seem weird (though many of you may have run into it). Think about it this way… with thin LUNs under VMFS, the degree of “thinness” (oversubscription) and rate of consumption was invisible without opening the array UIs/APIs, but at least you could at least see details on thin VMDKs. On NFS based datastores, not only was the “thinness” and rate of consumption at the datastore invisible, you also couldn’t see actually how much space was consumed by a VMDK – since only the reported (“fully provisioned analog”) space was reported.
With the new NFS Extended Stats API, which like all the NFS VAAI calls is implemented via NAS vendor plugin, the NFS client not only sees the normal file stats, but vCenter can see extended file/filesystem details, to bring thin provisioning reporting up to the same point that VMFS offers.
NFS – Space Reservation
While more often than not it was a nice thing to have NFS be very aggressive with “thin on thin” models since the early days, there were points where it was frustrating, and it would have been awesome to have been able to specify “reserve all the space in the filesystem, and reserve all the underlying blocks”. Ergo – there was no way to do the equivalent of of an “eagerzeroed thick” VMDK or a “zeroed thick” VMDK. An example was the fact that for WSFC support, you need to fully provision in advance.
With API (again, as with all the NFS APIs is implemented via NAS vendor plugin) – it’s possible to reserves the complete space for a VMDK on an NFS datastore. When the API is in use, you get the same dialog options for VMDKs on NFS datastores as you do on VMFS.
Summarizing, VAAI expands pretty dramatically with vSphere 5. Personally, I think the thin space reclamation and the NFS Full (and fast) copy will prove long term to be the most material for customers.
EMC will be there day one – ready for our joint customers! Some parts will take a little bit longer than others, but will all be here before you know it! If you’re part of the Franklin Beta – feedback welcome!