Exchange 2010 DAG and VMware HA
Update: Microsoft has changed it’s support stance, see the below article.
Updated post on this issue here
As Paul Cunningham from Exchange Server Pro correctly points out in this post Microsoft does not support Live Migration or vMotion of Mailbox Servers if they are part of a DAG (Database Avalaibility Group). Paul also mentions something which Scott Schnoll talked about at Teched last week, that Microsoft does not support high availability solutions in the hypervisor, i.e. VMware HA. Refer to the following line in the system requirements:
“Microsoft doesn’t support combining Exchange high availability solutions (database availability groups (DAGs)) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers. DAGs are supported in hardware virtualization environments provided that the virtualization environment doesn’t employ clustered root servers, or the clustered root servers have been configured to never failover or automatically move mailbox servers that are members of a DAG to another root server.”
I had a discussion regarding this same topic with Sandy Miller at Teched. Now I believe there is either misunderstanding from Microsoft on how VMware HA works or that Microsoft is placing the same limitation on VMware that it has with its own hypervisor so as not to disadvantage its own product. To understand this we first need to clearly understand the differences between how the two hypervisors implement HA.
VMware HA continuously monitors all ESX Server hosts within a cluster and acts when it detects a failure. An agent placed on each host maintains a “heartbeat” with the other hosts in the cluster and loss of the heartbeat with the other hosts in the cluster initiates the process of restarting all affected virtual machines that were registered on the affected host, thereby restarting the VM’s. Contrary to popular belief it is nothing more than automatically powering back on the VM, it does not vMotion or move a resource as the VM is now powered off.
Hyper-V uses Windows Clustering and Clustered Shared Volumes or CSV’s, these CSV’s are NTFS volumes shared by the Hyper-V hosts. The VM becomes a clustered resource, similar to other Microsoft Clustered resource, that VM resource can then be moved between hosts (Live Migration), or in the event of a failure the cluster service will automatically move that resource and restart the VM on the remaining hosts.
Therefore only the “hypervisor-based solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers” are not supported. I do not believe that VMware HA meets any of these criteria, “moving” or “failover” of mailbox servers. VMware HA simply powers on a failed VM. I suspect this limitation does effect Hyper-V due to its use of Windows Clustering.
When Microsoft states that Mailbox Server protected by high availability solutions in the hypervisor is not supported, what exactly does that mean? If you contacted Microsoft with an issue regarding a general application error and they identified you having HA enabled would this mean you would receive no support at all, irrespective if the issue was unrelated to HA? I understand if your issue was corrupted Information Stores and you were Live migrating the effected Mailbox Server you would clearly be stuck with a unsupported configuration. The blanket statement “unsupported” to me is a cope out! VMware HA is a tick box that automates the pressing of the power button a VM?
Unfortunately I can only really vent my displeasure here in this blog, but ultimately as a IT professional with Microsoft certifications and working for a Microsoft Partner I will ensure that any Mailbox Servers I deploy are neither Live migrated, nor have HA protecting them, to ensure my customers do have supported configurations.