EMC PLEX vMSC (VMware Metro Stretched Cluster) support and certification process. I have complied the following detail on the requirements from both VMware and EMC.
There is no formal process to certify a vMSC installation but as long as the storage infrastructure is supported by EMC, equipment is on the VMware HCL and the following the kb articles are followed, the environment will be supported. Ultimately the configuration defined in the below kb articles was verified by and directly supported by EMC.
VMware kb articles:
vSphere 4.x : Using VPLEX Metro with VMware HA (1026692)
vSphere 5.x : Implementing vSphere Metro Storage Cluster (vMSC) using EMC VPLEX (2007545)
VMware published the following best practices whitepaper
Also Duncan Epping blogged about PDL (permanent device loss) condition:
You need to ensure the attached documents are following as per the following detail.
Attached is the EMC “simple support matrix for VMware vSphere 5.x” and GeoSynchrony 5.1 references the following known issue and therefore patch 2 should be applied or the workaround deployed.
emc299427: VPLEX: Fabric frame drops due to SAN congestion
EMC Recommendations/Best Practices for Cluster cross-connect for VMWare ESXi (docu9765 Technical Notes, page 62)
EMC encourages any customer moving to a VPLEX-Metro to move to ESX 5.0 Update 1 to benefit from all the HA enhancements in ESX 5.0 as well as the APD/PDL handling enhancements provided in update 1
• Applies to vSphere 4.1 and newer and VPLEX Metro Spanned SAN configuration
• HA/DRS cluster is stretched across the sites. This is a single HA/DRS cluster with ESXi hosts at each site
• A single standalone vCenter will manage the HA/DRS cluster
• The vCenter host will be located at the primary datacenter
• The HA/VM /Service console/vMotion networks should use multiple NIC cards on each ESX for redundancy
• The latency limitation of 1ms is applicable to both Ethernet Networks as well as the VPLEX FC WAN networks
• The ESXi servers should use internal disks or local SAN disks for booting. The Distributed Device should not be used as a boot disk
• All ESXi hosts initiators must be registered as ―default‖ type in VPLEX
• VPLEX Witness must be installed at a third location isolating it from failures that could affect VPLEX clusters at either site
• It is recommended to place the VM in the preferred site of the VPLEX distributed volume (that contains the datastore)
• In case of a Storage Volume failure or a BE array failure at one site, the VPLEX will continue to operated with the site that is healthy. Furthermore if a full VPLEX failure or WAN COM failure occurs and the cluster cross-connect is operational then these failures will be transparent to the host
• Create a common storage view for ESX nodes on site 1 on VPLEX cluster-1
• Create a common storage view for ESX nodes on site 2 on VPLEX cluster-2
• All Distributed Devices common to the same set of VMs should be in one consistency group
• All VM‘s associated with one consistency group should be collocated at the same site with the bias set on the consistency group to that site
• If using ESX Native Multi-Pathing (NMP) make sure to use the fixed policy and make sure the path(s) to the local VPLEX is the primary path(s) and the path(s) to the remote VPLEX is only stand-by
• vMSC is support for both non-uniform and uniform (cross-connect)
The following configuration requirements from the above VMware article kb2007545
These requirements must be satisfied to support this configuration:
• The maximum round trip latency on both the IP network and the inter-cluster network between the two VPLEX clusters must not exceed 5 milliseconds round-trip-time for a non-uniform host access configuration and must not exceed 1 millisecond round-trip-time for a uniform host access configuration. The IP network supports the VMware ESXi hosts and the VPLEX Management Console. The interface between two VPLEX clusters can be Fibre Channel or IP.
• The ESXi hosts in both data centers must have a private network on the same IP subnet and broadcast domain.
• Any IP subnet used by the virtual machine that resides on it must be accessible from ESXi hosts in both datacenters. This requirement is important so that clients accessing virtual machines running on ESXi hosts on both sides are able to function smoothly upon any VMware HA triggered virtual machine restart events.
• The data storage locations, including the boot device used by the virtual machines, must be active and accessible from ESXi hosts in both datacenters.
• vCenter Server must be able to connect to ESXi hosts in both datacenters.
• The VMware datastore for the virtual machines running in the ESX Cluster are provisioned on Distributed Virtual Volumes.
• The maximum number of hosts in the HA cluster must not exceed 32 hosts.
• The configuration option auto-resume for VPLEX consistency groups must be set to true.
• The ESXi hosts forming the VMware HA cluster can be distributed on two sites. HA Clusters can start a virtual machine on the surviving ESXi host, and the ESXi host access the Distributed Virtual Volume through storage path at its site.
• VPLEX 5.0 and above and ESXi 5.0 are tested in this configuration with the VPLEX Witness.
For any additional requirement for VPLEX Distributed Virtual Volumes, see the EMC VPLEX best practices document.
• The front-end zoning should be done in such a manner that an HBA port is zoned to either the local or the remote VPLEX cluster.
• The path policy should be set to FIXED to avoid writes to both legs of the distributed volume by the same host.
emc299427: Workaround and Permanent fixes for VPLEX GeoSynchrony 5.1
• VMware ESX and ESXi 5.x hosts can be configured to NOT send the VAAI-CAW command to the VPLEX. On all ESX and ESXi 5.x hosts connected to the VPLEX, the following actions must be completed to accomplish this.
• The setting is represented by the “HardwareAcceleratedLocking” variable in ESX:
a. Using vSphere client, go to host > Configuration > Software > Advanced Settings > VMFS3
b. Set the HardwareAcceleratedLocking value from 1 to 0. By default this is 1 in ESX or ESXi 5.x environments.
The change of the above settings can be verified by reviewing VMkernel logs at /var/log/vmkernel or /var/log/messages:
cpuN:1234)Config: 297: “HardwareAcceleratedMove” = 1, Old Value: 0, (Status: 0x0)
cpuN:1234)Config: 297: “HardwareAcceleratedInit” = 1, Old Value: 0, (Status: 0x0)
cpuN:1234)Config: 297: “HardwareAcceleratedLocking” = 0, Old Value: 1, (Status: 0x0)
• VPLEX GeoSynchrony 5.1 only utilizes VAAI-CAW [ HardwareAccelaratedLocking ] commands and hence this is the only value that needs to be set to 0.
• The values of HardwareAcceleratedMove and HardwareAcceleratedInit can be either 1 or 0.
Caution! There is an option in VPlexcli to set the ‘caw-enabled’ property under the storage-views context. Do not turn off the Compare and Write feature using the ‘caw-enabled’ property under VPLEX Storage-Views context as this may have unexpected negative consequences.This must not be done from the VPlexcli.
• Apply 5.1 Patch 2
vSphere Operations Manager implementation uses a vApp and if you are using vSphere Essentials or Advanced the deployment fails due to the vApp requiring DRS.
The following article explains how to get around this requirement by using a standalong ESX host.
It seems that there is an issue with EMC VNX and CX with VMware 4.x when ALUA (failover mode 4) is used.
If there are no data LUNs numbered 0 presented to the VMware, the hosts failover mode 4 can switch back to failover mode 1.
Refer to primus case, emc262738
Symptom After a storage processor reboot (either because of a non-disruptive upgrade [NDU] or other reboot event), the failover mode for the ESX 4.x hosts changed from 4 (ALUA) to 1 on all host initiators.
Cause On this particular array, for each Storage Group a Host LUN Zero was not configured. This allowed the array to present to the host a “LUNZ.” All host initiators had been configured to failover mode 4 (ALUA). When the storage processor rebooted due to a non-disruptive upgrade (NDU), when the connection was reestablished, the ESX host saw the LUNZ as an active/passive device and sent a command to the array to set the failover mode to 1. This changed all the failover mode settings for all the LUNs in the Storage Group and since the Failover Policy on the host was set to FIXED, when one SP was rebooting, it lost access to the LUNs.
Fix VMware will fix this issue in an upcoming patch for ESX 4.0 and 4.1. ESX 5.x does not have this issue.
To work around this issue, you can bind a small LUN, add to the Storage Group and configure the LUN as Host LUN 0 (zero). You will need to reboot each host after adding the HLU 0. For each Storage Group you will need a HLU 0. See solution emc57314 for information on changing the HLU.
These are the directions from VMware for the workaround:
Present a 1.5 GB or larger LUN0 to all ESX hosts. (This volume does not need to be formatted, but must be equal to or larger than 1.5 GB.
Roll a reboot through all hosts to guarantee that they are seeing the LUN 0 instead of the LUNZ. A rescan may work, but a reboot guarantees that they will not have any legacy data for the CommPath volume.
Thanks to Glen for pointing out the reason and solution.
There is a fix in ESX version 5, so those hosts aren’t affected.
Current customers with fibre channel environments looking to refresh their infrastructue always ask me one question, can we replace fibre channel with FCoE end to end for my whole environment.
For most customers, 100% virtualisation is a dream that wont be happening in the near future, replacing fibre channel with FCoE is in the same boat.
Unfortately unless you are replacing all of your compute, storage and connectivity end to end FCoE is simply not possible and if you have non-virtualised workloads its even further away!
I have vendors consistently telling me they have customers going 100% FCoE only to find it a small environment where the requirements are specific enough to make FCoE end to end happen.
Unfortunately we are at the stage where hype/vendors/expectations dont meet expectations and that lands us fair and square in the trough of disillusionment.
vSphere 5 will totally change the way you design/size your LUN’s for VMFS. Features like VAAI (Hardware Offloaded Locking), VMFS max size increasing up to 64TB, block size 1MB only, SDRS and Storage Profiles mean the old notions of VMFS design are no longer true.
Previously both SCSI reservations and the VMFS max size of 2TB placed a ceiling on LUN size but this ceiling has been removed. Although the ceiling is gone this doesn’t mean you should create 50TB VMFS’s, I suggest the number and size of vmdk’s will influence LUN size.
Good design principles still consider the workloads being placed on the LUN’s to correctly size VMFS’s but I expect average VMFS size to increase considerably from 500GB to 2TB on vSphere 4 up to say 2TB to 10TB or even larger depending on use case.
Update: Thanks to Scott and Joe for pointing out like VCP 3 to VCP4 there is approximately six month period where existing VCP’s need only sit the exam (until Feb 2012). This is great news!
So the upgrade path from VCP4 to VCP5 requires you attend a “Whats new V5” course (after Feb 2012). I personally have always found introduction courses a light on technical content and heavy on marketing and general features overview.
I would much prefer a specific upgrade course that was a heavy technical deep dive followed by a difficult exam. Considering VMware is will be requiring this course they must ensure that their customers and partners get value for money.
Unless you have been living under a rock, you would have heard VMware announce the release of vSphere 5 today. I have spent a few hours sifting through the information on the new features and changes. There is a lot of great content been released today, from bloggers but more importantly on the Partner Portal. VMware have obviously invested a lot of time and resources in getting the content out so quickly and this is something other vendors should take on board.
Although there are a lot of great new features in vSphere 5, I believe its release will be remembered for the changes to the licensing model. This is shame as there are some great new features being released and these may well not get the focus they deserve. I totally understand why VMware made changes to the licensing model and the market has been expecting VMware to make some sort of change. As Intel continues to produce CPU’s with more cores and Servers are capable of being fitted with more and more RAM the old license model was doomed.
New Licensing Details:http://www.vmware.com/files/pdf/vsphere_pricing.pdf
I personally believe the new licensing model (vRAM) is the right model but that the amount of vRAM allocated per license is in-adequate. Instead VMware should have used numbers that reflect what customers are using in their environments now (probably nearly twice what VMware decided).
In my experience, customers deploying VMware using new hardware with 4.1, Enterprise plus on dual socket servers would allocate between 96 to 146GB of physical RAM. Factoring in the over subscription of about 30% vRAM to physical RAM with 80% utilization of a host with 146GB of RAM. I would estimate about 152GB of vRAM total, divide that by two for Dual socket makes 76GB of vRAM per socket. Therefore to ensure customers who have existing infrastructure, that are looking to upgrade to vSphere 5 from 4.1 can without purchasing additional licenses, VMware should look to increase the vRAM to about 76GB for Enterprise Plus per processor.
The new licensing model will no doubt be attacked by many people, customers, competitors and partners but ultimately everyone should agree something had to change.