Sometimes the VMware VM backup may fail with following message log:
"VixEOUTOFMEMORY: Memory allocation failed. Out of memory."
Below is an explanation of what has probably happened, why it happened and what it is possible to do in order to avoid this in the future:
Beside limitations on connections count to VC/ESXi, ESXi itself also limited by a transfer buffer for all NFC connections. This limitation is enforced by the host and cannot by bypassed or to be known in advance. The sum of all NFC connection buffers to an ESXi host cannot exceed 32 MB, and by default it is configured as 16 MB.
The primary physical CFA uses NBD protocol to backup VM's disks. NBD, in turn, employs the VMware network file copy (NFC) protocol and thus is a subject of aforementioned limitations.
WHAT: "VixEOUTOFMEMORY: Memory allocation failed. Out of memory." happened because the ESXi server couldn't serve the request due to lack of enough resources (NFC connection buffer).
WHY: At the time there were N failed jobs with VixEOUTOFMEMORY error you had N+1 or more simultaneously running FULL backups. All of them were backing up VMs located on a single ESXi host. We use up to 10 MB buffer to transfer data. So there is a probability of facing with NFC buffer limitation on ESXi host. Which, as you saw, happened to occur. It does not mean the probability is always 100% with parallel backup jobs. It very depends on a lot of factors. We try to recover from this situation during backup as well. But sometimes our efforts were not enough.
WHAT YOU CAN DO:
- Nothing in case if there are no other failed backups with VixEOUTOFMEMORY error on your CFA on consequent backups. Of course it is disappointing that backup failed after 1 or 2 hrs due to a reason we can overcome.
- You can also optimize the ESXi network (NBD) performance by increasing the NFC buffer size from 16 MB to 32 MB and reducing the cache flush interval as suggested in VMware KB article 2052302 (https://kb.vmware.com/s/article/2052302). Do it on all of your ESXi hosts. You can query current values using the following commands (from ESXi host): "esxcfg-advcfg -get /BufferCache/MaxCapacity" and "esxcfg-advcfg -get /BufferCache/FlushInterval". It will not guarantee VixEOUTOFMEMORY never happen again but will decrease its probability. And it seems to be a good idea in general since you perform a lot of simultaneous backups.
- Consider upgrading your network to 10GbE. That should cover every network link in the chain between the CFA and the VMware host.