In the most recent Avamar and Data Domain releases, if restoring an entire virtual machine from backup – and those backups are stored on a Data Domain system, a special feature called “instant access” is available.
Instant access is similar to restoring the full image to a new virtual machine, except that the restored virtual machine can be booted directly from the Data Domain system (via an NFS datastore). This reduces the amount of time required to restore an entire virtual machine – to in essence zero, and if particularly handy with large VMs.
The instant access process works as follows:
- Instant access is initiated (this is a single click)
- Selected VMware backup is cloned (Data Domain fastcopy) to temporary NFS share on the Data Domain system (this happens automatically behind the scenes)
- Temporary NFS share is exported, and mounted on ESXi host (this happens behind the scenes)
- The admin now can play with the restored VM and make sure it’s right. At that point – you want it back onto a production datastore. From the vSphere client, power on the virtual machine and initiate a storage vMotion of the virtual machine to a datastore within the vCenter
- When the storage vMotion is complete, the restored virtual machine files no longer exist on the Data Domain system
If you read the fine print you’ll see this… NOTICE: In order to minimize operational impact to the Data Domain system, only one (1) instant access is permitted at a time.
So, is one (1) Instance Access enough for your needs?
Probably not, so we didn’t hardcode this limitation. In fact, we anticipate that customers will need to restore multiple VMs at a time, especially in the vApp context or for multi-VM applications. But, we are being caution initially, because we want to ensure the Data Domain system is operational for its intended purpose – backups and restores, not running VMs.
If you read my post back here – you’ll see yet another example of my “different horses for different courses” view when it comes to storage stacks (yes, you CAN have a general purpose storage stack like VNX or NetApp – which are good at many things, not great at anyone thing – in fact their “good at many” characteristics is why people dig them). People LOVE Data Domain as a backup target (for dedupable datasets), and we often get the “why not just use it always as a NFS datastore for VMs?” question. “Hey – it does an inline dedupe (unlike VNX or NetApp that do it as a post process), and it seems that only all-flash arrays like XtremIO can do that…” so why not? The answer is rooted in the things that make DD awesome for backup workloads (inline dedupe, huge ingest bandwidth) are wrapped up in the things that make it NOT ideal for transactional NFS (small IO latency characteristics, IOps density and cost).
So, a characterization of “what does this load do to a DD system during the transient period” seems necessary… For that, read on dear reader for gobs of test data! (thank you BRS team!)
We tested the ability to use Avamar “Instant Access” to restore and power up five (5) virtual machines on a Data Domain system. Once the virtual machines are powered on, we storage vMotion them to an EMC VMAX 40K, and during the S-vMotion we’ll gather ESXTop metrics to gain an understanding of the impact on the VM during the process.
Q: How do you enable more than one (1) instant access at a time?
A: The Instant Access field in the Avamar MC GUI’s “Edit Data Domain System” dialog is currently set to read-only mode and its value default to 1. The field's read-only mode is modifiable, by modifying /usr/local/avamar/var/mc/server_data/prefs/mcserver.xml file. Locate the xml element <entry key="ddr_can_modify_ir_limit" value="false" /> and change the boolean value to true. You'll need to restart the MC, then you'll need to go to the Edit Data Domain System dialog, the field should accept a new value.
Q: What hardware and software will be used during the tests?
A: We used the following:
- Avamar Virtual Edition (AVE) 7.0 - image-level backup datasets were set to using Data Domain as the target.
- Data Domain DD990 running DDOS 5.3 with 180 disks (12 spares)
- Data Domain DD890 running DDOS 5.4 with 100 disks (13 spares)
- ESXi 5.1 – two hosts
- Microsoft Exchange and SQL VMs – load generated by JetStress and SQLIO respectively
Q: How were the tests run?
A: Here was the sequence for each test:
- Run backups for virtual machines
- After backups are complete, run “filesys restart” command at DD CLI prompt to ensure backup data is not in DD cache (for more realistic test results)
- Run instant access for virtual machines
- Start virtual machines
- Start load generation tools within each virtual machine (unrealistic that customers will start to pound on the virtual machines prior to S-vMotioning them to primary storage, but we wanted to know the impact)
- Start ESXTop data collection on ESXi hosts
- Start storage vMotions – by default, Data Domain restores the virtual machine as Eager Zero Thick; however, we wanted to know the effects of converting to thin during the S-vMotion.
Q: How much time did it take to complete each portion of the tests?
A: Here were the results:
Q: What can we conclude from these tests?
A: Well – we shouldn’t conclude TOO much – but here are some thoughts:
- Keep in mind, most people will not generate stressing loads on a virtual machine while it’s residing on a Data Domain system, let alone during an storage vMotion if they don’t have to. These tests are probably unusual, but help illustrate the effects on the Data Domain systems and the virtual machines themselves.
- In most of the tests, converting back to thin vDisks increased the duration of the storage vMotion. This may be a necessary evil if the “source” virtual machines used thin vdisks and storage resources are limited during the restoration.
- The durations appear to be linear based on previous single virtual machine testing (not shown here)
- The effects on Data Domain system CPU resources during the storage vMotion were similar to the effects during the initial backups, which leads us to believe that while it’s NOT the design center of Data Domain, the DD990 and DD890 are engineered to handle temporary running virtual machines.
- The disk latency effects (for both reads and writes) on the virtual machines was unacceptable by application vendors standards (e.g., what Microsoft requires for Exchange and/or SQL operations) – again see my comment about design centers of platforms; however, as noted in bullet one, it’s unlikely that a load would be placed upon the virtual machine during the restore process, and the virtual machine would only briefly resides on the DD system.
Q: What was the effect on Data Domain resources?
A: here’s all the test data (in a nutshell, the DD890 and DD990 CPU load stayed relatively low):
Figure 3 – DD990 During Instant Access Restore
Figure 5 – DD990 During VM startup
How were the virtual machines effected during the S-vMotion? (in general higher than you would like on a VM if it’s running in production, but for the time period where you’re checking it out before using storage vMotion to move back to a production datastore, not bad).
Figure 11 – DD990 All VMs from all tests - read latency
Figure 12 - DD890 All VMs from all tests - read latency
There you have it – gory detail on the performance impact and performance envelope of VM instant access!