Symptom

Backups Fail with the following error message seen in the Message Logs of the backup:

bsock.c:184 Unable to connect to Storage daemon on 192.168.1.15:9103. ERR=Connection refused

and/or you see the following notices on the CFA:

primary database is inaccessible
insufficient storage space on /raid
storage space critically low /raid
storage space critically low /root full
storage space critically low /var/log/full

There does not seem to be many/any GB free on the bottom left RAID Usage bar on the CFA.

info If you see storage space critically low /raid/:catalogs: or primary database is inaccessible, and you have a lot of space on the RAID usage bar, this is a different error message involving space on the database and will require additional assistance. Please contact Infrascale Support.

Resolution

These messages typically indicate that the space on the RAID of the CFA is full. The CFA will not overwrite the oldest jobs when the CFA runs out of space, rather the newest jobs will fail.

Below are the necessary steps to clear space on the RAID in the order they should be attempted.

Delete erred and cancelled jobs

Erred jobs may have hidden space taken up even if there are 0 bytes showing as saved.

It is always recommended to delete erred jobs unless you are troubleshooting an issue and need them for diagnostics. You can have the CFA automatically delete erred jobs by going to Jobs > Settings > Enable Automatic Deletion of Erred Jobs.

infoYou will still be able to see the erred jobs for the first 24-48 hours by going to Server > Recent Jobs.
  1. Go to Jobs > History.

  2. Select Filters (between Actions and Refresh at the top of the screen) > Show Filters.

    A new Filters toolbar should appear below the Filters option but above the Jobs.

  3. Click Status, clear Successful and Warnings.

  4. Select the upper left checkbox to select all jobs.

  5. Go to Actions > Delete, and confirm you wish to delete the erred and canceled jobs.

Delete unreferenced data

The Unreferenced Data subtab lists data that should have been deleted after you deleted a job, but the deletion left behind files. These unreferenced data files take up disk space and should be deleted, especially if you suspect that you are running out of disk space. Or, you can attempt to re-import unreferenced data files that may have been misfiled or lost in the system.

  1. Go to Jobs > Unreferenced Data.

  2. Scan for unreferenced data.

  3. Select all.

  4. Delete selected volumes, confirm.

Manually delete jobs

If erred/cancelled jobs and unreferenced data have been cleared but you need additional space cleared on the RAID, you will need to delete older or less critical jobs.

You can use the same filter option as mentioned above with erred and cancelled jobs. Select by client or set a date range to make the job selection easier.

Select the check box to the left of the associated job. Once you have selected all the jobs you wish to delete, you can either right-click on one of them to click Delete, or you can go to the Actions menu above the tabs (far upper left corner) and click Delete.

Also, you can manually select jobs one at a time. For this, go to Jobs > History, and right-click a job. A menu should come up with Delete as an option.

To avoid any possible data loss, please make sure that you are keeping, at the very least, your most recent full and any subsequent backups on your RAID.

Run garbage collection

After attempting the above options, it is always a good idea to run the garbage collection. This will ensure that all data is cleared quickly and efficiently.

To manually run the garbage collection, go to Server > Storage > Start Garbage Collection.

Garbage collection can take a long time to run, especially if you have deleted a lot of data.

For clients using deduplication, the UCAR system runs a garbage collection process every day to find and purge any garbage. Some ways that data can become unreferenced garbage are when clients are deleted without their jobs being purged, or when old jobs were not removed completely. It is recommended to run garbage collection after deleting jobs to be sure the data is cleared completely.

Typically the garbage collection occurs automatically at a scheduled time.

The progress bar for the garbage collection process will say deferred in this case.

The garbage collection will be deferred for up to 12 hours before it gives up, and will be retried at its regular time. One exception is if the system is running low on space, at which point the garbage collection will proceed whether there are jobs deduping or not.

If you have Block level deduplication enabled there may be some space taken up by the block store that is not actually being used. Block deduplication has its own garbage collection process. To run this process go to Server > Storage then scroll down to the section about block deduplication. There should be the Reclaim Storage button.

Automatically recycle jobs

For long term management of space on the RAID, it is a good idea to set recycling schedules. This will allow the CFA to automatically delete jobs once they reach an expiration date.

infoAll customers using byte-level replication need to use a schedule and retention policy that ensures you have sufficient time for a second full backup to completely replicate over to the secondary before purging the oldest or the original full backup. It is recommended to have a minimum of two full backups on the CFA at a given time, preferred is three.

Best practices for schedules and recycling will depend on many things, like the size of the backups and your company’s policies for data backup. It may be a good idea to start on a weekly schedule and see how the backups run from there. If you find that you are running out of space quickly you may need to move to a monthly schedule or another custom schedule that you create (Unless you are backing up Exchange, in which case you will need to stay on a Weekly schedule).

Recycling can be set to echo the backup schedules:

  1. In Clients > Edit (after selecting a client from the list on the left), you will be able to edit various aspects of your backup.

  2. Once you’ve selected a schedule, you can set up recycling via the Job Recycling section. Make sure this fits with your chosen schedule’s details (which can be found or created in Clients > Schedules).

    • If you chose the weekly schedule, set all of the job recycling (full, incremental, and differential backups) to recycle weekly if you want the job immediately cleared off of the RAID, or set them to recycle after a longer period of time if you need them to stay around.

    • If you chose the monthly schedule and just wanted space cleared as soon as possible, you could have full backups recycle monthly, while incremental and differential backups recycle weekly.

  3. Go to Jobs > Settings, and make sure Enable Automatic Job Management is selected.

You can also select Enable Automatic Deletion of Erred Jobs.

Preserve single job set

It is highly recommended to enable Preserve Single Job Set. For this, go to Jobs > Settings, and set Job Retention Policy to Preserve Single Job Set.

This will tell the CFA not to recycle a job until there is a viable replacement. This option is recommended to be sure the most recent full and any subsequent differentials/incrementals remain on the CFA and are not automatically deleted.

Keep in mind, however, that this setting will affect recycling/retention settings until new full is available.

For example, If you have a client with the following settings:

  • Backup Schedule: Monthly (full backups – on the first Sunday, differential backups – on the second/fifth Sunday, and incremental backups – every Monday-Saturday)

  • Recycling/Retention:

    • Full 5 weeks

    • Diff 1 week

    • Inc 5 days

  • Preserve Single Job Set ENABLED

This is what you will see:

  • The full and incremental backups will run for the first week.

    Expired incremental backups will not be deleted because of the ‘preserve single job set’ setting.

  • The differential and incremental backups will run for the first week.

    • Expired incremental backups from the first week will delete because the differential has the needed data collected from the time the full backup was taken.

    • Expired incremental backups for the second week will not be deleted because of the Preserve Single Job Set setting.

  • Incremental backups will run for weeks three and four.

    Both the expired differential and the expired incremental backups from weeks two, three and four not be deleted because of the Preserve Single Job Set setting.

  • The full will run the first Sunday of the following month.

    At this point, all expired differential and incremental backups from the previous month will be deleted (once they hit their retention/recycling setting).

The more time between full backups on a client, the greater the impact of the Preserve Single Job Set feature.

If you find yourself running out of space because differential and incremental backups are not automatically deleted when their retention settings say they should be, check to see if you have Preserve Single Job Set selected. If this is the case, it is likely that the CFA is working as designed and the backup schedules and retention settings need to be modified. Try a weekly schedule rather than a Monthly schedule and set retention settings accordingly. If you need one month’s worth of backups available for restoration but do not seem to have the capacity on the RAID, remember you can also make use of the archiving feature to save the backup jobs for a month off the RAID.

Other space considerations

Delayed calculation
The storage space calculations, used in the bottom bar, are automatically performed every 12 hours and cannot be forced.

If space still does not add up, the database is used extensively for the accounting calculations. If the database has errors, the accounting will likely be incorrect. Please contact Infrascale Support.

If replication has been suspended due to lack of space on the RAID and does not automatically start up again please go to Replication > Status on the primary CFA, click Actions, and then select Reconcile Secondary.

The trouble is that space usage is difficult to pin down if there are other processes running in the background (de-duping, importing, or new jobs, for instance). Even if all jobs are deleted from the raid the CFA then has to check for unreferenced data, recalculate space and also run the garbage collector in order to get an accurate reading (and all of this before the next set of jobs runs). If any jobs were in the process of de-duplicating when they were deleted it can hold up this calculation process (de-duplication of backups actually takes almost double the space of the backup while in progress, but can save you space in the long run, after it’s done). If garbage collection runs while jobs are completing the importing or de-duping those jobs are not part of the space recalculation and can offset the numbers. If large amounts of data is all deleted at once the CFA has to check itself to make sure that there are not bits and pieces of these jobs left behind (similar to, but not exactly like, when you defrag any other computer system, only the CFA is trying to do this while simultaneously receiving new data). Finally, the only part of the space usage that is set to recalculate more frequently than once per day is the free space, but any of the above scenarios can throw that off.