VM Component Protection

VMCP provides protection against datastore accessibility failures that can affect a virtual machine running on a host in a vSphere HA cluster. When a datastore accessibility failure occurs, the affected host can no longer access the storage path for a specific datastore. You can determine the response that vSphere HA will make to such a failure, ranging from the creation of event alarms to virtual machine restarts on other hosts.

Types of Failure :

There are two types of datastore accessibility failure:

All Path Down (APD) – The ESXi hosts detects a problem with the storage device, can’t access the storage device, and cannot determine if this device will be gone forever or come back in X seconds. During the APD timeout period the ESXi host continues to send I/O requests to the storage device and when the APD timeout is reached the VM I/O will still be send to the storage device but non VM I/O (e.g. mounting storage device) will be fast-failed with status NO_CONNECT. APD timeout has been around for a few years but before we could have the ESXi hostd process taking all the CPU making the ESXI host more or less unresponsive.

Permanent Device Loss (PDL) – The PDL is different form the APD since the ESXi host will receive information that the storage device is unavailable form the storage array via a SCSI sense code. I/O will be stopped immediately form the ESXI host perspective when the PDL state of a storage device is detected. PDL configuration has also been around for some time but was configured via advanced settings or via specific files on the ESXi hosts.


Option for VM Component Protection (VMCP) :

Response for Datastore with Permanent Device Loss (PDL)

Disabled – No action will be taken against the affected VMs.
Issue events – No action will be taken against the affected VMs, only an event when a PDL has occurred.
Power off and restart VMs as would HA normally do.

Response for Datastore with All Paths Down (APD)

Disabled – No action will be taken against the affected VMs.
Issue events – No action will be taken against the affected VMs, only an event when an APD has occurred.
Power off and restart VMs (conservative) – Restart only if there is sufficient capacity on healthy hosts.
Power off and restart VMs (aggressive) – Does not perform any checks for resources, attempts to restart the affected VMs. This setting might not restart all VMs if there are no resources available on the other hosts in the cluster.

Delay for VM failover for APD

Once the APD Timeout has been reached (default: 140 seconds) VMCP will wait an additional period of time before taking action against the affected VMs. By default, the waiting period is 3 minutes. In other words, VMCP will wait 5m:20s before taking action against VMs. The sum of the APD Timeout and the Delay for VM Failover is also known as the VMCP Timeout.

Response for APD recovery after APD timeout

This setting will instruct vSphere HA to take a certain action if an APD event is cleared after the APD timeout was reached but before the Delay for VM failover has been reached.

Disabled – No action will be taken against the affected VMs.
Reset VMs – The VMs will be reset on the same host. (Hard reset)

This option is available because some applications or guest operating systems may be in an unstable condition after losing connection with storage services for an extended period of time. This setting will instruct vSphere HA how to handle this situation.

VM Component Protection is configured in the vSphere Web Client. Go to the Configure tab and click vSphere Availability and Edit. Under Failures and Responses you can select Datastore with PDL or Datastore with APD. The storage protection levels you can choose and the virtual machine remediation actions available differ depending on the type of database accessibility failure.

There is a very nice illustration of VM component protection in the below video :

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *