VMware HA isolation caveats
Updated to reflect Duncan’s comments below…
VMware HA Clusters that have hosts spanning sites or where more than one node can be isolated from the other nodes, may not trigger an HA isolation event when nodes still receive a heartbeat from at leastone only node in the same Cluster.
Primaries nodes send heartbeats to primaries and secondaries, secondary nodes send heartbeats only to primaries. (quote from Duncan’s HA Deepdive book)
An isolation response will only be triggered if and when the following two requirements are met:
– Heartbeats have not been received from ANY host in the cluster, whether primary or secondary
– The Isolation Address cannot be pinged
Only in this case will a Host (primary or secondary node is irrelevant) trigger the isolation response.
When hosts are split like this, they do not attempt to make contact with the isolation address and do not shut down or power off VM’s, as they aren’t isolated. The Failover Coordinator, a randomly selected Primary node will attempt to power on the VM’s but this will fail as the files will be locked as those VM’s will still be running on the other Hosts split from the Failover Coordinator.
Therefore it’s better for ESX Clusters not to span sites to avoid this issue occurring.
Thanks to Michael Francis and Duncan Epping in confirming this information for me.