Dehydrated archiving of backup jobs of VMware virtual machines

Issue

Archiving partial backups of VMware clients on the Backup & Disaster Recovery appliance with firmware version 5.0 or later takes far more space on the archive drive than reported in Jobs › History.

Description

Starting from Infrascale Backup & Disaster Recovery 5.0, backup of virtual machines (VMs) obtained through integration with VMware allowed for file level restores from within the VM. To facilitate this the system treats the partial backups of VMware clients differently than it did in the past.

In Infrascale Backup & Disaster Recovery 4.0, we introduced the VMware integration. In Infrascale Backup & Disaster Recovery 4.0 and later, when we back up a VM we take a snapshot of the current system state and save that. This can then be used to restore the entire VM to the state it was in at the time the snapshot was taken. You can choose to either restore over the original VM, or restore it to a new VM.

Partial backups are taken by working with the Сhanged Block Tracking (CBT) in VMware. If this feature is supported in your environment it allows us to take only the portions of the disk that have changed since the last backup was taken.

To restore a VM from a partial backup, you will need the entire job chain on the system, so you would need the most recent full backup, the most recent differential backup, and any incremental backups between that differential and the incremental you are trying to restore. When you archive jobs from VM clients in this version it copies the data that was backed up to the archive disk, so if you want to restore a partial backup from archive drive you could potentially need to import the necessary jobs from several archive drives back to the RAID to have the full backup chain to be able to restore.

In Infrascale Backup & Disaster Recovery 5.0, we introduced the file-level restores for VM client backups. To allow this we started storing all partial VM backups as synthetic fulls. This takes the changed block information gathered at the time of the backup and combines that with the information from previous backups to be stored as if it were a full snapshot of the machine at the time the backup was taken. Being stored like this allows us to be able to open the backup and restore individual files from within the disks backed up. Once a backup is stored as a synthetic full the appliance will treat it like a full backup from then on, so restoring the entire VM becomes easier since every backup is a full you will only need the backup from the day you are restoring from rather than an entire chain. One of the drawbacks of VM jobs being stored and treated as fulls is that when archiving (or replicating) they will be copied over as the full job rather than just the smaller changed block information. This means that it has to copy over the entire size of the VM for every job it archives, so VM jobs can fill up archive drives quite quickly. With the VM jobs being archived like this recovering VMs are much easier since each job copied to the archive is a full so you will only need to import the job from the date you wish to restore to. VM jobs archived in this matter also retain the file-level restore ability once imported back onto the raid.

In Infrascale Backup & Disaster Recovery 5.1, we added the dehydrated archiving feature. Enabling this feature will reduce the amount of data copied to the archive drives, but has some trade-offs. To make this feature work the system will store the partial backup jobs on the appliance in a hybrid of how it was stored in both the 4.0 and 5.0. When a partial backup is taken of a VM client in 5.1 and later it is still converted to the synthetic full like in 5.0, but the change block information is also stored in the job like in 4.0. This causes the jobs stored on the raid to take up slightly more space than they would have if stored strictly in the 5.0 standard. The system stores the backups in this way regardless of if the dehydrated archiving feature is enabled or not. Enabling the feature causes the system to copy only the changed block data from the stored job to the archive drive. This will cause it to treat VM jobs copied to the archive drive to act like jobs archived in 4.0. You will lose the ability to get file level restores from jobs on the archive drive, and to recover a partial backup you will have to import the entire job chain, again possibly spanning several archive drives.