UPDATED April 30th, 2010 @ 9:14pm ET (linked to WMV demo)
UPDATED Dec 1st @ 10:10am PST (minor updates/corrections – also posted the table in xls format based on demand)
“What is the best way to backup my VMware environment?”
Ah… There are some questions that are easy and short to ask, but are long to answer. This is one of them. But – the proliferation of various tools for VMware-oriented backup (including VMware’s own VMware Data Recovery as well as other 3rd party options) suggests that good vendor answers to that question have a lot of value to customers.
EMC Avamar has long been one of the most popular ways to do backup in the VMware space (with many, many customers of every size including VMware and Cisco themselves – each with thousands of VMs). The reasons for it’s popularity in this context are the following:
- simple backup and simple single-step restore in all cases (not just one or two guest type)
- good application integration
- saves a ton on storage due to data deduplication (commonly 30x-50x)
- dramatically faster backup due to the fact that the dedupe occurs before the data is transfered (aka source-based dedupe – commonly reduces network load by 300x-500x)
- higher server and VDI consolidation ratios in dense VM configurations due to lower LAN use during backup
- great for remote office backup (particularly where the remote office is too small for the smallest Data Domain platform) due to lower WAN use for centralized backup
What people didn’t like about Avamar in earlier versions was that it was focused almost exclusively on the in-guest based source backup model. While there are cases where this is the right way, there are use cases which are well served with VM-image level backup. Frustration with those VM-level proxy based mechanisms historically was that VCB had challenges (proxy scaling, increased backup window due to various data movement steps). All sorts of VM-level variants exist: 1) NFS datastores mounted in other places; 2) Array-integrated snapshots; 3) NDMP backup of NFS datatores; 4) many, many other things. Each has advantages and disadvantages.
Don’t get me wrong - customers ideally want BOTH in-guest and VM-level, and selectively use them. In-guest always has the best application-integrated options, broad guest OS file-level recovery and and simple restore/catalog. VM-level is (in theory minus the VCB challenges) the simplest to deploy/manage, at the expense of move “moving pieces” for file-level restore.
The vStorage APIs for Data Protection in vSphere 4 are focused at improving the VM-level proxy-based approaches. Avamar 5.0 is out now, and is very much focused on expanding both in-guest and VM-level scenarioes. There is a TON of vSphere integration.
Key among these:
- Full support for the vStorage APIs for Data Protection (“son of VCB” – but much better than VCB was). This means:
- use of Changed Block Tracking – much better backup performance. How much faster? Think 10x-20x faster!
- direct file-level restore in guests for windows VMs without a guest agent
- vCenter integration. This means:
- Simple views of whether VMs have been backed up or not, and simple ability to remediate.
- Showing HOW a VM was backed up (guest, VM, not at all) and when.
- Automatically adding backup policy to virtual machines as they are added.
Beyond VMware-focused features, there are some other massive ones:
- New desktop/laptop backup functionality – a dramatically lightweight client, with simple end-user initiated restore as an option.
- Much more dense Avamar backup target – Avamar has always been awesome for huge reduction of the amount of backup storage needed (30x-50x less is common). Through a bunch of opimizations, we can get 60% more stuff backed up in the same physical space, for the same dollar cost/watts, etc. The march of progress!
Networker 7.6 has also been released, and also has piles of additional VMware goodness. Very good visualization tools that integrate with vCenter, and a much much simpler script-less proxy-based backup model. It integrates with Avamar, so large enterprises often use Networker for it’s very, very broad application agent and backup device/model support, but leverage Avamar for VMware-centric backup models – but managed from a single console. Note that until Networker 7.6 gets updated (see below) – the new Avamar functions aren’t exposed in the Networker GUI.
For customers using Networker on it’s own without the Avamar integration, Networker 7.6 will also be getting similar vStorage API for Data Protection support (for the faster backup gains when using a proxy approach and the single-step file restore from VM-level backup) – but it’s not in the current release. It is expected in an upcoming near term patch.
If you want to see it in action (remember that this video focuses on what’s new, not the existing stuff), and get the nutshell – watch the below (thank you to Mike Zolla – always awesome in the Avamar product team). Mike – next time crank up to volume a notch or two :-)
Get the high-resolution version here in MOV format, and here it is in WMV format (slightly cut down, but the same core)
Read on for more detail, as well as a general discussion of backup/recovery in the VMware context!
There was a twitter dialog over backup in the VMware space - “what is the best way to backup VMware?”…. Ah… like all questions – it’s an “it depends”.
For some customers, they can be well served with a “one size fits all” model. Others need varying backup models based on recovery scenarioes. Generally, in your “kit-bag” you have the following choices for varying local recovery scenarioes:
- VMware Snapshots
- VMware-integrated Array Snapshots
- Backup to target via in guest agents
- Backup to target via a proxy method
Each of these is “modulated” by technology variations. For example, for the proxy method, an increasingly popular variant are:
- disk-targets (much faster and more reliable restore than tape)
- deduplication (to make the cost per GB match the cost of tape) backup targets (e.g. DataDomain, Avamar),
- the new vStorage APIs for Data Protection from a backup app standpoint.
The “backup to target via in-guest agents” method is where most customers start. Put in the most basic way – this design is “don’t change how you do backup, just treat the VM like it was a physical host”. This design model is excellent for it’s “we catalog the files the same way we always did, we get all sorts of application-level backup support, and it’s easy to restore in every guest-level way”. Conversely but it’s bad for the “hey, my backup windows just got longer due to consolidated backup load, and VM-level restore (the VM-equivalent of “bare metal restore”) doesn’t leverage virtualization in any way.
Now – personally – I don’t think VMware-snapshots and even VMware-Integrated should be considered a full “backup” choice (and get mad when EMC or other storage vendors propose it as such – to me at least, this seems self-serving).
This isn’t because VMware-integrated array snapshot mechanisms shouldn’t be used. In fact they can be an awesome tool to augment a backup strategy (think use cases that sound like “oops, that patch blew up the system”, or “oops, I shouldn’t have deleted that virtual disk” scenarioes). The trade-offs with VMware or integrated array snapshots tend to become more and more substantial as you move in time away from the backup event (longer term retention gets costly fast), or as you try to move to more comprehensive (and often required) backup use cases (broad cataloging, broad application support, They are awesome choices to supplement backup –
Here’s a summary table where I try to capture where various backup technologies are most useful IMO in the VMware context, and considerations.
Warning – this is my best effort to summary a very complex topic, summarize architectural options and to try to do it without too much “vendor koolaid”. Comments welcome!!!
BTW – I know this is unreadable :-) struggled to get it in a good fmt (heck, I struggled with the summary in general) – so also available on PDF here, and XLS here.
To me, looking at the breadth of options – the best mix is to: 1) Use VM-Level backups as the baseline; 2) add guest-level backups as an option on guests that need application-integrated backup and restore, or guest-level restore and aren’t Windows; 3) augment with VMware-integrated array snapshots for the “omigod what did I just do” restore scenarioes.
Ok – now for some Q n’ A
Q: What is the “Changed Block Tracking” you referred to?
A: In the past, VCB 1.x used the ESX snapshot method exclusively to create a point-in-time copy that was leveraged during backup, and several of the common backup scenarioes involved pulling reading/copying the full virtual disk (done in different ways depending on whether you were using block or NAS). This was a a huge reason why in many cases, VCB 1.x actually increased the backup window.
In vSphere 4, VMware introduced Changed Block Tracking (CBT) as well as the vStorage APIs for Data Protection (a series of APIs in the virtual disk developers kit). Use of CBT during a proxy-based backup makes things MUCH faster – particularly in longer time periods where rate of change is high (as in those cases, ESX snapshot deletion is “expensive” from a resource standpoint). I’m planning a deep-dive on CBT to discuss the mechanics.
Below is a table showing a series of VM backups in Avamar 5 – the ones in green use CBT, and the ones in red don’t. Notice that backup of a 100GB VM using CBT completed in 1:10 (minutes:seconds), and a 40GB VM using the old-school VCB way took 26:15. Wow. Personally, I wouldn’t look at a VMware-backup approach that didn’t leverage this, or offer an alternative model.
Q: What is the benefit of the vStorage APIs for Data Protection?
A: The answer is basic – using the vStorage APIs for Data Protection makes file-level recovery into guests from VM-level backups possible. The vStorage APIs for Data Protection are which are literally a collection of APIs – mostly centered around virtual disk handling (this is the VDDK) which is publicly available (you can read more here) and getting the data off the datastores themselves (using SCSI hot add in vSphere 4 rather than the kludge that was the VCB block offset method) .
The vStorage APIs for Data Protection (VADP) are a “son of VCB” only in the sense that they are both focused architecturally on the idea of a backup proxy and vm-image level backup. Also, with the VADP case, a backup proxy that is a VM starts to make a lot more sense.
Q: Should I go source based dedupe or target based dedupe – which is better?
A: There’s no black and white on this one. The only thing that is relatively black and white these days is “use backup to disk rather than tape if at all possible, and if using backup to disk use dedupe”. The decision go target-based is most often focused on “we can’t change our backup process, but want to improve our restore speed and reliability using disk, and want to plop in a deduped backup to disk target to minimize the impact on our business”.
The reason to go source based is usually “look, sure we want to get the restore speed/reliability of deduped backup to disk, but our bottleneck is the LAN/WAN”. The LAN bottleneck is pronounced in very highly consolidated VMware environments. Either you use a REALLY good proxy based approach (think uses CBT and the vStorage APIs) with a varying number of proxies – or go in-guest with source-based dedupe. Target based dedupe would use more proxies, as you would need to think about network traffic from the proxy to the backup target. Source-based dedupe on the proxies reduces the number of proxies if using that , or you use in-guest and use sourced based. Avamar 5.0 is (at least in my opinion) a good example of both (uses CBT, vStorage Backup APIs when using the proxy approach, and reduces the number of proxies via source-based dedupe; if used in-guest for app-integration or simplest file-level restore is the leading source-based in-guest model).
To understand this better – look at the graphs below that show the impact on CPU, Network and Disk from a guest/vmkernel perspective during a backup. The pink section is a traditional backup agent (in this case Networker), the grey is Avamar. The CPU spike is higher, but the area under the curve is much lower (so as you add together more and more jobs, the “area under the curve” – which is the CPU utilization over time becomes a big challenge). The biggest win is the network – which in many cases is the core bottleneck (see the 2nd chart below).
CPU utilization during backup – traditional vs. source-based dedupe
Network utilization during backup – traditional vs. source-based dedupe
disk utilization during backup – traditional vs. source-based dedupe
But – if you love the backup product you’re using (often people dig whatever they have) – it’s OK. You can ALWAYS use target-based dedupe (e.g. DataDomain) to increase backup/restore speed and reliability without changing your backup. The storage savings (in $/GB) are about the same, but there isn’t a core change in the backup dynamic (and other bottlenecks). Sometimes this is the right choice.
Q: What’s the CPU impact of the source-based dedupe (with Avamar 5 – this can be in-guest or in the vStorage API for Data Protection VM proxy)?
One legitimate question applied to source-based dedupe is the impact on the guest/ESX host where the dedupe is occuring (I want to keep reiterating - remember that with Avamar 5, this can be in-guest, or VM-level on the vStorage API for Data Protection proxy VM). This mostly FUD for the following reasons:
- most customers never run into this, and note that even unthrottled, the “area under the curve” of CPU utilization of a traditional backup vs. a source-based is much lower (meaning you can do more with less)
- You have many more CPUs at your disposal on all your ESX hosts than you do in backup targets and storage targets.
- CPU power continues to climb
- CPU consumption during the dedupe backup process is usually centered at off hours
- The amount of time needed for a source-based dedupe is measured in seconds or short minutes.
That all said, for some customers – this can be an issue – as the PEAK CPU workload is higher. So… we introduced CPU throttling as an option. The graph below shows the impact of backing up 12 100GB VMs at the same time. Without CPU throttling (grey shading) – it took between 3 and 12 minutes for a backup. With CPU throttling (green shading), it look between 27 and 36 minutes, but CPU utilization only increased 12%.
Q: What is the Desktop/Laptop backup thing?
Avamar’s lightweight client and source-based dedupe approach is handy in the desktop/laptop use case (whether physical – where the LAN impact is high; or virtual – where the consolidation ratios are highest and a big factor in the economic model). Many of the members of Chad’s Army use Avamar Virtual Edition nodes as their laptop backup target. What this Desktop/Laptop client and use case does is formalize what a lot of customers were telling us – that they thought this was a great use case. It is an especially lightweight client focused at this use case, and self-service portal that is part of the core product. This enables self-service restore for the clients in a simple, easy, and scalable way. Screenshot of this is below.
Thanks for spending the time – hope you like the vStorage APIs for Data Protection and how Avamar 5 leverages them along with the vCenter API for centralized management and visibility.
Comments and feedback ALWAYS welcome!
I can't believe the streak is over; you referenced Data Domain!! As great as Avamar is (I'm a fan) backup remains THE hardest thing to unglue from an environment (retention scheme expiration, etc). The fact is a vast volume of shops are looking for the "make backup better" thing. Crazy but that's where DD is ruling, and ruling in volume. Nice benefit is you can hit it with SQL dumps, Oracle RMAN, any of the VMware backup native schemes, or commonly used point products like Veeam.
The unfortunate truth is IT doesn't think or act strategic. That's my theory on why VCE might fail. They act in budget cycles and manage project portfolios. Strategically insignificant (simple) products like a little old dedupe NAS becomes a GREAT combo for backup, archiving, and DR. They "out NetApp'ed" NetApp.
Absolutely love the deep bomb (Vikings reference) on Avamar and VM backup but I always love how DD get's overlooked. Simplicity
Posted by: Keith Norbie | November 30, 2009 at 09:03 PM
Hi Keith.
Data Domain is far from overlooked by any of us but they prefer to tell their own story as such I'd direct you to http://www.dedupematters.com/ for their commentary.
Though the way their systems just work when you slide them in one wonders how much illustration that point requires?
Posted by: Storagezilla | December 01, 2009 at 05:27 PM
Hi Chad,
I understand that AVE 5.0 is not yet released. When it is available will we see feature parity with the hardware appliance (i.e. RAIN, Accelerator node for NDMP backups, Tape out support)?
I also see with the new Rainfinity FMA/VE that it lacks some of the key new features (i.e. archive to Windows servers and Atmos) - is this likely to change soon?
I find it a little frustrating when we are telling customers to move to the 100% virtualised datacenter, that they must buy Avamar and Rainfinity hardware appliances when VEs are available!!!
Your thoughts would be appreciated.
Posted by: Mark Burgess | December 02, 2009 at 05:34 AM
Hi Mark - the answer is YES to your FMA/VE question. Look for an early 2010 update.
Posted by: Eric Kaplan | December 03, 2009 at 10:54 PM
Hi Chad,
Is it still possible to get a copy of your spreadsheet analysis? The Documentum links no longer have the document stored.
Great article, thanks.
Otto
Posted by: Otto le Roux | January 19, 2010 at 06:07 AM
Has the Networker patch been released yet? Any idea when we can expect to see the additional vStorage API support in Networker? This is particuarly important now that VCB is going end of life...
Posted by: Stephen | February 24, 2010 at 12:18 PM
The link still seems to be invalid to download the high res WMV format.
Posted by: Tom Van Arkel | May 03, 2010 at 08:05 PM
Same question, Is it still possible to get a copy of your spreadsheet analysis? The Documentum links no longer have the document stored
Posted by: Tim Xiao | October 14, 2014 at 11:49 AM