Home > VMware > VMware HA isolation caveats

VMware HA isolation caveats

Updated to reflect Duncan’s comments below…

VMware HA Clusters that have hosts spanning sites or where more than one node can be isolated from the other nodes, may not trigger an HA isolation event when nodes still receive a heartbeat from at leastone only node in the same Cluster.

Primaries nodes send heartbeats to primaries and secondaries, secondary nodes send heartbeats only to primaries. (quote from Duncan’s HA Deepdive book)

An isolation response will only be triggered if and when the following two requirements are met:

– Heartbeats have not been received from ANY host in the cluster, whether primary or secondary
– The Isolation Address cannot be pinged

Only in this case will a Host (primary or secondary node is irrelevant) trigger the isolation response.

When hosts are split like this, they do not attempt to make contact with the isolation address and do not shut down or power off VM’s, as they aren’t isolated. The Failover Coordinator, a randomly selected Primary node will attempt to power on the VM’s but this will fail as the files will be locked as those VM’s will still be running on the other Hosts split from the Failover Coordinator.

Therefore it’s better for ESX Clusters not to span sites to avoid this issue occurring.

Thanks to Michael Francis and Duncan Epping in confirming this information for me.

Advertisements
Categories: VMware
  1. January 24, 2011 at 8:57 pm

    An isolation response will only be triggered if and when the following two requirements are met:

    – Heartbeats have not been received from ANY host in the cluster, whether primary or secondary
    – The Isolation Address cannot be pinged

    Only in this case will a Host (primary or secondary node is irrelevant) trigger the isolation response.

    So to be clear, it has nothing to do with being able to contact other hosts it has every thing to do with NOT receiving any heartbeats from any of the other nodes in the cluster.

    • January 24, 2011 at 9:10 pm

      Thanks for clearing that up for me Duncan, I’ll update the post accordingly.

  2. January 24, 2011 at 9:57 pm

    You are using the word “primary” too often when it is irrelevant.

    “when a Primary node is isolated but can still contact at least one other Primary or Secondary nodes.” –> technically speaking when it is still receiving heartbeats the node isn’t even isolated. Your site maybe in a campus-cluster but that is something completely different.

    “When hosts are isolated like this they do not attempt to make contact with the isolation address and do not shut down or power off VM’s, even if configured to do. ” –> Again the host isn’t isolated as it is still receiving heartbeats from someone. Maybe “split” is a better word.

    “The Primaries still able to communicate with vCenter will attempt to power on the VM’s, believing the hosts isolated but this will fail as the files will be locked as those VM’s will still be running on the isolated Hosts (in Fibre Channel Storage environment).” –> vCenter has got nothing to do with it. HA doesn’t use vCenter at all to power-on VMs. Depending on where the storage resides even with iSCSI or NFS it could be that the files are still locked.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: