Overview

The Deduplicated File System section shows information about your UCAR garbage collection system.

For clients using deduplication, the UCAR system runs a garbage collection process every day to find and purge any files that are no longer referenced.

Some ways that data can become unreferenced garbage are when clients are deleted without their jobs being purged or when old jobs weren’t removed completely. It is recommended to run garbage collection after deleting jobs to insure the data is cleared completely. This is similar to the jobs with unreferenced data. (See Unreferenced Data).

The garbage collection will be deferred for up to 12 hours before it terminates the process. If the process times out, it’ll be retried at its next regular time. There’s one exception. In the event the system is running low on space, the garbage collection will proceed whether there are jobs deduplicating or not.

Unique Content-Addressable Repository

Name Description
Garbage Collection Start the garbage collection
Garbage Collection Time of Day Set the time of day when the garbage collection will run automatically
Compact Online Start Online DDFS Compact manually
Verify UCAR Verify the UCAR integrity. This will systematically read all the files in UCAR, and verify if their computed signature matches the recorded one. If not, the file will be quarantined. The process is extremely I/O intensive and can take weeks to run to completion on systems with large amount of the stored data. Use only when told by the Infrascale Support

Realtime Processing

This group shows the following data:

Data name Data description
Total UCAR Bytes  
Processed Files  
Processed Bytes  
Duplicate Files  
Duplicate Bytes  
Quarantined Bytes  
Quarantined Files  

Garbage Collection History

This group shows information in the table format with the following columns:

Column name Column description
Date  
Files Removed  
Bytes Removed  
Total Files  
Unreferenced Files  
DB Errors  
FS Errors  
Missing Files  
Elapsed Time  

Block Deduplication

Block Deduplication Statistics

Data name Data description
Block Written Total number of full blocks that have been written into DDFS since it was configured initially
Block Size The size of the blocks files are divided into during the deduplication process. This option isn’t configurable
Total Blocks The total number of blocks that have been written to DDFS since it was configured initially. It includes both full and partial blocks
Total Bytes The number of partial blocks that have been written to DDFS since it was configured initially
Partial Blocks The total of the size of all the partial blocks that have been written to DDFS. Partial blocks happen at the end of a file that doesn’t evenly divide into blocks. For example, a 96 KB file will be divided into 64 KB full block, and 32 KB partial block
Partial Bytes The number of times a block already existed in the block store and didn’t need to be written again, thus saving space
Duplicate Blocks The number of bytes that didn’t have to get written to the RAID because we already had a copy of a block
Duplicate Bytes The sum of the size of all of the blocks marked as free in the block store
Free Blocks A counter of times blocks have been read back from the DDFS
Free Bytes The sum of the size of all of the blocks marked as free in the block store
Blocks Read A counter of times blocks have been read back from the DDFS
Allocated Bytes The size of the block stores. Includes both the used and the free blocks

Storage Stats

Data name Data description
Block Address Map Size  
Block Address Map Incore  
Block Address Map Modified  

App Statistics

Data name Data description
Heap Used  
Heap Free  
Heap Maximum