With the new awesome thin provisioning GUI and more flexible virtual disk behavior (hallelujah – no more "clone/template=eagerzeroedthick”!) in vSphere, I’m getting more questions re: best practices when you have the choice of doing it at the array level or the VMware layer.
This is covered in chapter 6 of the upcoming Mastering VMware vSphere 4.0 that Scott Lowe is authoring (more here). I’ve guest authored Chapter 6 for Scott. Chapter 6 is entitled – “VMware vSphere 4.0 - Creating And Managing Storage Devices”
Read on for more details – and there’s LOTS more in the book!
Ok – first – some critical understanding:
Virtual Disks come in three formats:
- Thin - in this format, the size of the VDMK file on the datastore is only however much is used within the VM itself. For example, if you create a 500GB virtual disk, and place 100GB of data in it, the VMDK file will be 100GB in size. As I/O occurs in the guest, the vmkernel zeroes out the space needed right before the guest I/O is committed, and growing the VMDK file similarly.
- Thick (otherwise known as zeroedthick) - in this format, the size of the VDMK file on the datastore is the size of the virtual disk that you create, but within the file, it is not “pre zeroed”. For example, if you create a 500GB virtual disk, and place 100GB of data in it, the VMDK will appear to be 500 GB at the datastore filesystem, and contains 100GB of data on disk. As I/O occurs in the guest, the vmkernel zeroes out the space needed right before the guest I/O is committed, but the VDMK file size does not grow (since it was already 500GB)
- Eagerzeroedthick - in this format, the size of the VDMK file on the datastore is the size of the virtual disk that you create, and within the file, it is “pre-zeroed”. For example, if you create a 500GB virtual disk, and place 100GB of data in it, the VMDK will appear to be 500GB at the datastore filesystem, and contains 100GB of data and 400GB of zeros on disk. As I/O occurs in the guest, the vmkernel does not need to zero the blocks prior to the I/O occurring. This results in improved I/O latency, and less back-end storage I/O operations during normal I/O, but significantly more back-end storage I/O operation up front during the creation of the VM.
In VMware Infrastructure 3.5, the CLI tools (service console or RCLI) could be used to configure the virtual disk format to any type, but when created via the GUI, certain configurations were the default (with no GUI option to change the type)
- On VMFS datastores, new virtual disks defaulted to Thick (zeroedthick)
- On NFS datastores, new virtual disks defaulted to Thin
- Deploying a VM from a template defaulted to eagerzeroedthick format
- Cloning a VM defaulted to an eagerzeroedthick format
This is why the creation of a new virtual disk has always been very fast, but in VMware Infrastructure 3.x cloning a VM or deploying a VM from a template (even with virtual disks that are nearly empty) took much longer.
Also, storage array-level thin-provisioning mechanisms work well with Thin and Thick formats, but not with the eagerzeroedthick format (since all the blocks are zeroed in advance) - so potential storage savings of storage-array level thin provisioning were lost as virtual machines were cloned or deployed from templates.
Also – BTW – if you have TP at the array level and are using EITHER NFS or VMFS, that clone/template behavior is also why you can save a lot of storage $$ by going to vSphere.
The Virtual Disk behavior in vSphere has changed substantially, resulting in significantly improved storage efficiency - most customer can reasonably expect up to a 50% higher storage efficiency than with ESX/ESXi 3.5, across all storage types.
- The Virtual Disk format selection is available in the creation GUI
- vSphere still uses a default format of Thick (zeroedthick), but in the virtual disk creation dialog, there’s a simple radio button to thin-provision the virtual disk (if your block storage array doesn’t support array-level thin provisioning).
- Also note that there is a radio button to use Fault Tolerance, which employs the eagerzeroedthick format on VMFS volumes.
Above is the new virtual disk configuration wizard. Note that in vSphere 4 the virtual disk type can be easily selected via the GUI, including thin provisioning across all array and datastore types. Selecting the “Support Clustering features such as Fault Tolerance” creates an eagerzeroedthick virtual disk on VMFS datastores.
Clone/Deploy from Template operations no longer always use the eagerzeroed thick format, but rather when you clone a VM or deploy from a template, this dialog box enables you to select the destination type (defaults to the same type as the source).
Also, the virtual disk format can be easily changed from thin to eagerzeroedthick. It can be done via the GUI, but not in a “natural” location (which would be in the Virtual Machine settings screen). If you navigate in the datastore browser to a given virtual disk and right click you see a GUI option as noted below.
You cannot “shrink” a thick or eagerzeroedthick disk to thin format directly through the virtual machine settings in the vSphere client, but this can be accomplished non-disruptively via the new storage vmotion (allowing VI3.x customers to reclaim a LOT of space).
The eagerzeroedthick virtual disk format is required for VMware Fault Tolerant VMs on VMFS (if they are thin, conversion occurs automatically as the VMware Fault Tolerant feature is enabled). It continues to also be mandatory for Microsoft clusters (refer to KB article) and recommended in the highest I/O workload Virtual Machines, where the slight latency and additional I/O created by the “zeroing” that occurs as part and parcel of virtual machine I/O to new blocks is unacceptable. From a performance standpoint, the differences between thick and pre-zeroed for I/Os to blocks that have already been written to perform identically - within the error of margin of the test.
So… What’s right - thin provisioning at the VMware layer or the storage layer? The general answer is that is BOTH.
If your array supports thin provisioning, you’ll generally get more efficiency using the array-level thin provisioning in most operational models.
- If you thick provision at the LUN or filesystem level, there will always be large amounts of unused space until you start to get it highly utilized - unless you start small and keep extending the datastore - which operationally is heavyweight, and general a PITA.
- when you use thin provisioning techniques at the array level using NFS or VMFS and block storage you always benefit. In vSphere all the default virtual disk types - both Thin and Thick (with the exception of eagerzeroedthick) are “storage thin provisioning friendly” (since they don’t “pre-zero” the files). Deploying from templates and cloning VMs also use Thin and Thick (but not eagerzeroedthick as was the case in prior versions).
- Thin provisioning also tends to be more efficient the larger the scale of the “thin pool” (i.e. the more oversubscribed objects) - and on an array, this construct (every vendor calls them something slightly different) tends to be broader than a single datastore - and therefore more efficiency factor tends to be higher.
Obviously if your array (or storage team) doesn’t support thin provisioning at the array level – go to town and use Thin at the VMware layer as much as possible.
What if your array DOES support Thin, and you are using it that way - is there a downside to “Thin on Thin”? Not really, and technically it can be the most efficient configuration – but only if you monitor usage. The only risk with “thin on thin” is that you can have an accelerated “out of space condition”.
An example helps here.
Scenario:
At the VMware level you have 10 VMs, each VM is a 50GB Virtual Disk, and has 10GB of data on it.
- If provisioned as Thick, each is a 50GB file, but only containing 10GB of data. It could never get “bigger” than 50GB without extending it.
- If provisionined as Thin, each is a 10GB file, that can grow to 50GB.
At the Datastore level:
- If you used Thick virtual disks, you would HAVE to have a 500GB (10x50GB) datastore (technically a lot more due to the extra stuff a VM needs, but for the sake of easy math I’m keeping it simple here…) In the Thick case you can’t run out of space at the VMware layer – so you don’t need to monitor that.
- If you used Thin virtual disks, you only needed a 100GB (10x10GB) datastore (more due to the extra stuff a VM needs, but for the sake of easy math…). In the Thin case you CAN run out of space at the VMware layer – so you DO need to monitor that (vSphere adds a simple alert on datastore thresholds).
At the storage layer:
- If you use Thick storage provisioning and Thick VMs, you would need to create a storage object (LUN or Filesystem) that is 500GB in size, though in reality, only 100GB is being used
- If you use Thick storage provisioning and Thin VMs, you would need to create a storage object (LUN or Filesystem) that is 100GB in size, but you HAD BETTER MONITOR IT and be ready to expand – as it will grow to up to 500GB in size.
- If you use Thin storage provisioning and Thick VMs, you would need to create a storage object (LUN or Filesystem) that is 500GB in size, but it would only consume 100GB. You wouldn’t need to monitor the LUN/filesystem, but instead the pool itself (because there isn’t actually 500GB available), and you could be more efficient.
- If you use Thin storage provisioning and Thin VMs, you would need to create a storage object (LUN or Filesystem) that is 100GB in size, but you should ACTUALLY configure a storage object (LUN or filesystem) that is 500GB, as in either case, it would only consume 100GB – but by using a larger storage object, you don’t need to monitor it at the VMware layer as closely. You wouldn’t need to monitor the LUN/filesystem, but instead the pool itself (because there isn’t actually 500GB available), and you could be more efficient.
If you look at that, you can see that picking:
- 1 at the VMware layer (thick VMs) + 3 at the storage layer (thin storage) and 2 at the VMware layer (thin VMs) + 4 at the storage layer (thin storage) are operationally the same thing, and have the same space efficiency.
- 2 at the VMware layer (thick VMs) and + 4 at the storage layer (thin storage) has less management complexity (you monitor at one layer, not both)
If you DO use Thin on Thin, use VMware or 3rd party usage reports in conjunction with array-level reports, and set thresholds with notification and automated action on both the VMware layer (and the array level (if you array supports that). Why? Thin provisioning needs to carefully manage for “out of space” conditions, since you are oversubscribing an asset which has no backdoor (unlike how VMware oversubscribes guest memory which can use VM swap if needed). When you use Thin on Thin - this can be very efficient, but can “accelerate” the transition to oversubscription.
BTW – this is a great use of the new managed datastore alerts. Just set the alert thresholds below the array-level TP (and if your array supports auto-grow and notification, configure it to auto-grow to the maximum datastore size – BTW – all modern EMC arrays auto-grow and notify). Also, for EMC customers, use the vCenter plugins or Control Center/Storage Scope (which accurately show VP state and use) to monitor and alert at the array level.
In the next minor release of vSphere, one of the areas of ongoing work in the vstorage API stack is around thin provisioning integration which means the reports (the actual array-level details) will also be directly in vCenter (the vCenter 4.0 reports already show the VMware-level provisioned vs actual usage), in which case the management overhead gets less, and we manage to squeeze out even more.
There are only two exception to the “always thin provision at the array level if you can” guideline. The first is in the most extreme performance use cases, as the thin-provisioning architectures generally have a performance impact (usually marginal) vs. a traditional thick storage configuration. UPDATED (5/1/2009, 1:32pm EST – good feedback in the comments – see below - suggested this case only applies when on local storage. So, when you couple this with the caveat in the first case – that performance impact is marginal, and heck, there are benefits to the wide-striped approach of TP – there’s almost no reason not to use array TP) The second are large, high performance database storage objects (which have internal logic that generally expect “IO locality” - which is a fancy way of saying that they structure data expecting the on-disk structure to reflect their internal structure.
Have fun!
That latter "exception" doesn't really apply to an intelligently cached disk array like the Symmetrix DMX and V-Max.
The bias towards "IO Locality" really only works for locally-managed storage - with external storage, the actual locality is rarely as expected, since the array will distribute data across a large number of spindles or pack it into cache such that the "locality" no longer applies. Since databases rarely do long-run sequential reads (where minimized head movement might be of benefit), for the Symmetrix implementation of Virtual Provisioning you are probably still better off using the array-based thin implementation than the host-based approach.
YMMV on other platforms, of course.
Posted by: the storage anarchist | April 30, 2009 at 06:01 PM
While I completely agree with anarchist's premise that IO locality is not a material issue for an intelligently cached disk array, I am surprised by the unwritten message that only uber-expensive high-end arrays alleviate this issue. Only Symmetrix DMX and V-Max offer this? I can think of several mid-range priced arrays that are leaders in array based thin provisioning (and have brilliant caching and data layout schema) that perform extremely well in VMware environments.
YMMV is right and it may be better ;-)
Posted by: robcommins | May 01, 2009 at 12:48 AM
My previous post was a little quiz. To make it a little easier for ya, here's a disclaimer I should have put on the trailer of my comment: [I work at 3PAR] :-)
Posted by: robcommins | May 01, 2009 at 01:25 AM
Thanks for the comments Rob, Anarchist. I'll update the article accordingly - I think you are right on the ONE SMALL point in the whole article - the curse of the blog is that people tear apart one paragraph in a full post :-)
Rob - I'm missing 3Par on the vStorage API calls on the TP mgmt integration as "leaders in array-based thin provisioning" you guys would be leading the charge :-)
3Par makes a fine product, and kudos for being one of the earliest pure block devices to ship TP. That said - it is now relatively mainstream - and Anarchist's comment doesn't have a hidden unwritten agenda - he's a plain-jane, up-front Symmetrix bigot (it's the part of the company he works for after all) - as I am a VMware bigot (though try to be as pragmatic as I can).
As he himself said - IO locality is only really needed in local storage use cases. The same comment would apply to all EMC arrays, mid-range and down. Every array varies, and customers should evaluate carefully on all fronts.
The guidance (use array-level TP, Thick on Thin and Thin on Thin yield roughly the same efficiency, only use Thin on Thin if you are carefuly about thresholds/alerts/actions) holds.
Posted by: Chad Sakac | May 01, 2009 at 01:17 PM
Great insightful article Chad, thanks a bunch!
Posted by: Paul Wegiel | May 01, 2009 at 03:12 PM
Thanks Chad - excellent article nonetheless. Very well done!
Rob
Posted by: robcommins | May 01, 2009 at 04:01 PM
Actualy, there is a third angle on the "thin" story.
We have had the problem with clones inflating and taking a lot of space. Also, due to some conservatism among sysadmins everyone claimed they needed 36GB or more on the servers so we had to give them the space.
It didn't help that we are using Vmware on NFS.
Then we implemented deduplication on the primary storage on our Netapp. So now I have the best of all the worlds, both inflated Vmware disks and yet using minimal space as all the zeroed blocks (and MORE) is deduplicated away, totaly transparently to Vmware and the Vm-systems sysadmins.
Sure, thin-provisioning the Vmdisks, in Vmware or "classical" array-thinprovisioning would probably releave some pressure from the dedup stage, but I have no headache anymore with sysadmins choosing the wrong format. And deduplication also releases more storage as it finds identical OS-related blocks, something that Vmware thin provisioning isn't able to get away with.
Posted by: Dejan Ilic | May 04, 2009 at 05:10 AM
Dejan - thanks for the response.
I'm actually working on a multivendor NFS post with my NetApp colleagues - it's a great option for many use cases - hopefully we'll get it done soon.
Yup - that's also a way to be capacity efficient. I would argue that there's still a benefit (even if using production storage dedupe) to the new thick and thin defaults - cloning/template operations will be much faster, and you can avoid shrinking the flexvol - or over-provisioning before the post-process dedupe (unless using file-level snapshot approaches - which are an option in some of the use cases)
I believe that over time - production dedupe will be a ubitquitous feature - each vendor does certain things first, and NetApp did production dedupe first. We've just introed our first production dedupe (you can download the Celerra VSA and give it a whirl), and will continue to evolve this as fast as we can to broaden use cases and platforms - efficiency needs to be applied at every level, every use case, and every location.
Deduplication techniques in general have broad applicability - backup, production, IO - everywhere.
Again - thanks for the comment!
Posted by: Chad Sakac | May 04, 2009 at 08:14 AM
I am wondering how VMware will handle disk fragmentation... I remember, "growing up" on GSX and Workstation (before ESX was widely popular) - thin provisioning was a feature of both GSX and Workstation and disk fragmentation was always a big issue as time progresses.
Will VMware have some sort of defrag utility that will help defrag the thinly provisioned disks, which now may now end up all over the VMFS?
I have not seen any mention of a built-in defragmenter so far.
Posted by: Paul Wegiel | May 06, 2009 at 01:49 PM
I think the issues of Defrag will only be apparent on local storage with less spindles and traditional RAID based partitions, the use of "Volumes or Meta LUNs" etc on Remote Shared storage effectively spread the VMFS partition across many more spindles than traditional RAID based partition ever would.
Posted by: Tom Howarth | May 09, 2009 at 05:43 AM
I tend to agree with Tom - I've never had this come up as an issue at any customer (which suggests it might be a "mountain of a molehill").
You can't think of workstation and GSX - as they of course live on the host filesystem.
Fragmentation is inherently not as bad with VMFS purely because you have a relatively small number of very large files compared with a "normal" filesystem use case.
VMFS-3 also doesn't "spray it all over the place" - initial file allocation is random, subsequent attempt to be sequential. This means the contents of files are grouped, though the files themselves are all over.
Of course, doing a storage vmotion to a new datastore would also "defrag" it, so there is already a non-disruptive workaround.
Posted by: Chad Sakac | May 09, 2009 at 08:48 AM
I am curious to know if the Windows servers need to have the drives formated in the dynamic or basic format. The physical servers require me to be in basic format now to extend from the SAN but dynamic drives can be extended if I need to grow them. Can I format them in dynamic and grow them in 4.0 thin provisioning or do I need to keep them all basic and forever lose the ability to extend those volumes.
Posted by: Kevin | July 06, 2009 at 04:19 PM
@Kevin - you can have them basic and still expand them. You can grow basic disks using diskpart. In W2K8 that gets even easier (it's right in the GUI).
Posted by: Chad Sakac | July 23, 2009 at 12:18 PM
Excellent post, Chad. I found it from Vaughn Stewart's post and am pleasantly surprised to discover convergence of viewpoints around the fundamentals between your respective posts. I learnt quite a lot from them and have summarized my learning at
http://blog.sharevm.com/2009/12/03/thin-provisioning-when-to-use-benefits-and-challenges/
Thank you
Posted by: Paul Evans | December 03, 2009 at 03:41 AM
Is there any way to convert my eagerzeroedthick disks (created form templates on VI3) tot the vSphere default (lazy)zeroedthick format?
I want to use only the thin provisioning on the storage array and not thin on thin. According to vmware support this cannot be done.
Posted by: Peter | January 15, 2010 at 10:06 AM
@Peter,
You can go from eagerzeroedthick to zeroedthick by performing a Storage vMotion. After selecting your datastore, you have the option of choosing a format: same format as source, thin, or thick. Note that by selecting thick, you are choosing zeroedthick and not eagerzeroedthick.
Posted by: Son Cao | January 28, 2010 at 01:48 PM
Say a VM running MS Windows was newly created using TP, and a 10GB file was just copied to this VM. The size of the vdisk, as expected, would be expanded by 10GB. How to reclaim this 10GB space after the file is permanently deleted within the VM?
Posted by: Terry Tsang | February 09, 2010 at 02:10 AM
@Son Cao
Just tested migrating to thick but it does not work like you say, the new disk takes up all allocated space on the virtual provisioned lun.
This is very different from creating a thick disk from scratch (that only takes up the used space in the vm and not all allocated space)
Anyone have an answer?
Posted by: Peter Tak | February 15, 2010 at 04:59 AM
Last time I had some contact with this and really was not very happy with the situation just like me that the selection of virtual disk format is available in the creation of graphical user interface.
Posted by: buy generic viagra | February 19, 2010 at 08:57 AM
Chad,
First, thanks for another in a long line of great posts.
As this post was written about 10 months ago I thought I would see if anything has changed you opinions on TP, in particular Thin on Thin.
I am currently running a VI 3.5 environment using a CX4-120 w/o TP. I am about to move to vSphere and am also looking at possibly replacing the CX4-120 with a NS-480. I am looking at the possibility of using TP and probably leaning towards trying Thin on Thin.
Along with this, on the NS-480 I am looking at using FAST for tiering with about 1/3 SATA, 2/3 FC and a small amount of SSD. Does the introduction of tiering with FAST change have any impact on the use of TP?
Thanks,
Rod
Posted by: ThatFridgeGuy | March 03, 2010 at 12:30 AM
can you please explain the following statement in regards to "Thin on Thin" can have an accelerate "out of space condition". The storage used by the VMware environment is NFS Datastores on a Netapp storage array. We currently have Thin configured on the VMware layer but not yet on the Netapp storage device. We want to turn on Thin on the Netapp but would like to obtain anyone experience and feedback and any issues they have encountered using Thin on Thin with Netapp NFS.
What if your array DOES support Thin, and you are using it that way - is there a downside to “Thin on Thin”? Not really, and technically it can be the most efficient configuration – but only if you monitor usage. The only risk with “thin on thin” is that you can have an accelerated “out of space condition”.
Posted by: morjo02 | December 09, 2010 at 10:34 AM
@morjo02 - thanks for the comment. I've often observed that people struggle to track a metric in two places (ergo track storage utilization at BOTH the underlying storage AND at the VMware datastore).
This is often compounded when you're talking about two different teams.
Space consumption of Thin on Thin obviously consumes the same (real) resources as Thin on Thick, but enables you to oversubscribe even further. In your case (and this applies across the storage vendors, just using different "words" for features):
- your FlexVols will be oversubscribed (what is shown as 2TB for example to the ESX host, might only actually HAVE 1TB).
- your Datastores will be oversubscribed (It will LOOK like you can put in 20 x 200GB VMDKs for a total of 4TB - even though the datastore LOOKS to be 2TB in size, though of course, it might only have 1TB under the covers).
That double-oversubscription means that as you approach the actual allocation limits, the process of "running out of space" actually takes less time, giving you less time to respond.
If you are committed and able to track & manage at both layers (which isn't rocket science, but falls into "operational excellence") - you're fine with Thin on Thin. My only guidance is understand how important monitoring - via Managed Datastore Reports in vCenter 4.x and your storage array's tools - is going to be to have a very available infrastructure.
Posted by: Chad Sakac | December 15, 2010 at 09:07 AM
Chad, is there any tuning to be done on the Clariion pool side for thin lun performance? I've been messing with different configuration options for my ESXi 4.1 w/powerpath to CX4-480 setup and if I thin provision on the vmware side, thick LUN on the EMC side, I get nearly identical write performance to thick on thick, but read performance is only about 60% of what it is thick on thick. If I instead thin LUN on the EMC side and thick provision on the ESXi side, I gain huge on the read side, getting nearly the same as thick on thick, but my write performance drops to 33% of what it was thick on thick. So I'm stuck with a trade off of vmware thin if I want good writes but poor reads, EMC thin if I want good reads but much slower writes.
I supposed I need to do some playing with the clariion's cache allocations, maybe that would help.
Posted by: David H | December 18, 2010 at 04:47 PM
I appreciate all your fine work here but perhaps you could explain more clearly what benefit we get from Thin on Thin. Let's just assume there are additional scales of economy and performance obtained from TP at the array level.
What is that TP at the VMware level brings me specifically ON TOP of this when I go thin on thin.
Indeed, intuitively I would think it just adds overhead to the hypervisor, potential SCSI reservation contention (after all, don't increases in VMDK size require a SCSI reservation due to meta data updates?). I know you have a reason in mind but I'm not quite picking it up. Any way you might clarify? Thank you again for your post!
Posted by: Mark Singer | February 15, 2011 at 12:59 PM
Hi Chad,
This is a great article and certainly answers many questions; is the information contained within still relevant in vSphere 5(.1)?
Thanks,
Tom
Posted by: Tom Kivlin | September 04, 2012 at 03:32 AM
Hello,
Eventhough a Think VMDK allocate all the space up front, is it really? Isn't it just a point to the end of the file? If so isn't the stroage array able to determine the real about of blocks being used in a Think VMDK on a Thin LUN using VAAI?
Posted by: Chris Anania | January 04, 2013 at 04:26 PM