Currently, Infrascale offers a post-process deduplication feature. This requires a backup job’s data to occupy space on your RAID while deduplication occurs. Deduplication occurs by reading the backup job data and copying unique files to a repository (that’s also on the RAID) where duplicate files can be referenced rather than copied. During this time, both the original backup job data and repository copies of unique files are on the RAID. In the worst case, (if a backup job consists entirely of unique files), free space equal to the backup job’s size is required for the repository copies. Consequently, prior to running a backup job, which will be deduplicated, the amount of free space required to backup and deduplicate may be as much as double the size of the data to be backed up. When deduplication completes, the original backup job data is then deleted from the RAID.
Given these requirements, the system checks the following before deduplicating a backup job on the RAID:
There must be at least 1.5 GB free space on the RAID.
There must be more free space on the RAID than the size of the backup job. For example, if a 100 GB backup job completes, the system requires at least 100 GB of free space to begin the deduplication process.
If your CFA is nearing its full capacity, you can try these suggestions to allow successful deduplication:
Stagger the full jobs of large clients. The time between each job should account for its backup, importing, and deduplication time. For example, if you have 3 clients that each take 30 minutes to backup, import, and deduplicate; schedule the full backups at least 45–60 minutes apart. The length of time it takes to backup and deduplicate varies with the size and number of files.
If a single client’s full backup is too large and won’t deduplicate, we recommend you break the client into two or more separate clients, each with a non-overlapping file set that has roughly a balanced amount of data. For example, if you have a client SRV with 1 TB of data on
C:and 1 TB of data on
D:, create a SRV-A client with
C:/in its file set, and a SRV-B client with
D:/in its file set. Stagger these clients’ schedules per the above recommendation. Separating the data in this manner will require a minimum of 2 TB of free space instead of 4 TB.
The Deduping Jobs subtab lists backup jobs that have been deduplicated, meaning that only one copy of a given file is kept in the backup.
The screen lists jobs that are pending, the job currently being deduped, and jobs that were most recently deduped.