Archive for the ‘EMC’ Category


May 28, 2013 Leave a comment

EMC PLEX vMSC (VMware Metro Stretched Cluster) support and certification process. I have complied the following detail on the requirements from both VMware and EMC.


There is no formal process to certify a vMSC installation but as long as the storage infrastructure is supported by EMC, equipment is on the VMware HCL and the following the kb articles are followed, the environment will be supported. Ultimately the configuration defined in the below kb articles was verified by and directly supported by EMC.

VMware kb articles:

vSphere 4.x : Using VPLEX Metro with VMware HA (1026692)

vSphere 5.x : Implementing vSphere Metro Storage Cluster (vMSC) using EMC VPLEX (2007545)

VMware published the following best practices whitepaper

Also Duncan Epping blogged about PDL (permanent device loss) condition:


You need to ensure the attached documents are following as per the following detail.

Attached is the EMC “simple support matrix for VMware vSphere 5.x” and GeoSynchrony 5.1 references the following known issue and therefore patch 2 should be applied or the workaround deployed.

emc299427: VPLEX: Fabric frame drops due to SAN congestion$solution_id&docPropValue=emc299427&passedTitle=null

EMC Recommendations/Best Practices for Cluster cross-connect for VMWare ESXi (docu9765 Technical Notes, page 62)

EMC encourages any customer moving to a VPLEX-Metro to move to ESX 5.0 Update 1 to benefit from all the HA enhancements in ESX 5.0 as well as the APD/PDL handling enhancements provided in update 1

• Applies to vSphere 4.1 and newer and VPLEX Metro Spanned SAN configuration
• HA/DRS cluster is stretched across the sites. This is a single HA/DRS cluster with ESXi hosts at each site
• A single standalone vCenter will manage the HA/DRS cluster
• The vCenter host will be located at the primary datacenter
• The HA/VM /Service console/vMotion networks should use multiple NIC cards on each ESX for redundancy
• The latency limitation of 1ms is applicable to both Ethernet Networks as well as the VPLEX FC WAN networks
• The ESXi servers should use internal disks or local SAN disks for booting. The Distributed Device should not be used as a boot disk
• All ESXi hosts initiators must be registered as ―default‖ type in VPLEX
• VPLEX Witness must be installed at a third location isolating it from failures that could affect VPLEX clusters at either site
• It is recommended to place the VM in the preferred site of the VPLEX distributed volume (that contains the datastore)
• In case of a Storage Volume failure or a BE array failure at one site, the VPLEX will continue to operated with the site that is healthy. Furthermore if a full VPLEX failure or WAN COM failure occurs and the cluster cross-connect is operational then these failures will be transparent to the host
• Create a common storage view for ESX nodes on site 1 on VPLEX cluster-1
• Create a common storage view for ESX nodes on site 2 on VPLEX cluster-2
• All Distributed Devices common to the same set of VMs should be in one consistency group
• All VM‘s associated with one consistency group should be collocated at the same site with the bias set on the consistency group to that site
• If using ESX Native Multi-Pathing (NMP) make sure to use the fixed policy and make sure the path(s) to the local VPLEX is the primary path(s) and the path(s) to the remote VPLEX is only stand-by
• vMSC is support for both non-uniform and uniform (cross-connect)

The following configuration requirements from the above VMware article kb2007545
These requirements must be satisfied to support this configuration:
• The maximum round trip latency on both the IP network and the inter-cluster network between the two VPLEX clusters must not exceed 5 milliseconds round-trip-time for a non-uniform host access configuration and must not exceed 1 millisecond round-trip-time for a uniform host access configuration. The IP network supports the VMware ESXi hosts and the VPLEX Management Console. The interface between two VPLEX clusters can be Fibre Channel or IP.
• The ESXi hosts in both data centers must have a private network on the same IP subnet and broadcast domain.
• Any IP subnet used by the virtual machine that resides on it must be accessible from ESXi hosts in both datacenters. This requirement is important so that clients accessing virtual machines running on ESXi hosts on both sides are able to function smoothly upon any VMware HA triggered virtual machine restart events.
• The data storage locations, including the boot device used by the virtual machines, must be active and accessible from ESXi hosts in both datacenters.
• vCenter Server must be able to connect to ESXi hosts in both datacenters.
• The VMware datastore for the virtual machines running in the ESX Cluster are provisioned on Distributed Virtual Volumes.
• The maximum number of hosts in the HA cluster must not exceed 32 hosts.
• The configuration option auto-resume for VPLEX consistency groups must be set to true.
• The ESXi hosts forming the VMware HA cluster can be distributed on two sites. HA Clusters can start a virtual machine on the surviving ESXi host, and the ESXi host access the Distributed Virtual Volume through storage path at its site.
• VPLEX 5.0 and above and ESXi 5.0 are tested in this configuration with the VPLEX Witness.
For any additional requirement for VPLEX Distributed Virtual Volumes, see the EMC VPLEX best practices document.
VPLEX zoning:
• The front-end zoning should be done in such a manner that an HBA port is zoned to either the local or the remote VPLEX cluster.
• The path policy should be set to FIXED to avoid writes to both legs of the distributed volume by the same host.
emc299427: Workaround and Permanent fixes for VPLEX GeoSynchrony 5.1
• VMware ESX and ESXi 5.x hosts can be configured to NOT send the VAAI-CAW command to the VPLEX. On all ESX and ESXi 5.x hosts connected to the VPLEX, the following actions must be completed to accomplish this.
• The setting is represented by the “HardwareAcceleratedLocking” variable in ESX:

a. Using vSphere client, go to host > Configuration > Software > Advanced Settings > VMFS3
b. Set the HardwareAcceleratedLocking value from 1 to 0. By default this is 1 in ESX or ESXi 5.x environments.
The change of the above settings can be verified by reviewing VMkernel logs at /var/log/vmkernel or /var/log/messages:

cpuN:1234)Config: 297: “HardwareAcceleratedMove” = 1, Old Value: 0, (Status: 0x0)
cpuN:1234)Config: 297: “HardwareAcceleratedInit” = 1, Old Value: 0, (Status: 0x0)
cpuN:1234)Config: 297: “HardwareAcceleratedLocking” = 0, Old Value: 1, (Status: 0x0)
• VPLEX GeoSynchrony 5.1 only utilizes VAAI-CAW [ HardwareAccelaratedLocking ] commands and hence this is the only value that needs to be set to 0.
• The values of HardwareAcceleratedMove and HardwareAcceleratedInit can be either 1 or 0.
Caution! There is an option in VPlexcli to set the ‘caw-enabled’ property under the storage-views context. Do not turn off the Compare and Write feature using the ‘caw-enabled’ property under VPLEX Storage-Views context as this may have unexpected negative consequences.This must not be done from the VPlexcli.
Permanent Fix:
• Apply 5.1 Patch 2

Categories: EMC, VMware

EMC VNX and CX issues with ALUA failover mode 4

May 21, 2012 2 comments

It seems that there is an issue with EMC VNX and CX with VMware 4.x when ALUA (failover mode 4) is used.
If there are no data LUNs numbered 0 presented to the VMware, the hosts failover mode 4 can switch back to failover mode 1.

Refer to primus case, emc262738

Symptom After a storage processor reboot (either because of a non-disruptive upgrade [NDU] or other reboot event), the failover mode for the ESX 4.x hosts changed from 4 (ALUA) to 1 on all host initiators.

Cause On this particular array, for each Storage Group a Host LUN Zero was not configured. This allowed the array to present to the host a “LUNZ.” All host initiators had been configured to failover mode 4 (ALUA). When the storage processor rebooted due to a non-disruptive upgrade (NDU), when the connection was reestablished, the ESX host saw the LUNZ as an active/passive device and sent a command to the array to set the failover mode to 1. This changed all the failover mode settings for all the LUNs in the Storage Group and since the Failover Policy on the host was set to FIXED, when one SP was rebooting, it lost access to the LUNs.

Fix VMware will fix this issue in an upcoming patch for ESX 4.0 and 4.1. ESX 5.x does not have this issue.

To work around this issue, you can bind a small LUN, add to the Storage Group and configure the LUN as Host LUN 0 (zero). You will need to reboot each host after adding the HLU 0. For each Storage Group you will need a HLU 0. See solution emc57314 for information on changing the HLU.

These are the directions from VMware for the workaround:

Present a 1.5 GB or larger LUN0 to all ESX hosts. (This volume does not need to be formatted, but must be equal to or larger than 1.5 GB.
Roll a reboot through all hosts to guarantee that they are seeing the LUN 0 instead of the LUNZ. A rescan may work, but a reboot guarantees that they will not have any legacy data for the CommPath volume.

Thanks to Glen for pointing out the reason and solution.

There is a fix in ESX version 5, so those hosts aren’t affected.

Categories: EMC, VMware

FCoE end to end and the trough of disillusionment

December 23, 2011 2 comments

Current customers with fibre channel environments looking to refresh their infrastructue always ask me one question, can we replace fibre channel with FCoE end to end for my whole environment.
For most customers, 100% virtualisation is a dream that wont be happening in the near future, replacing fibre channel with FCoE is in the same boat.
Unfortately unless you are replacing all of your compute, storage and connectivity end to end FCoE is simply not possible and if you have non-virtualised workloads its even further away!
I have vendors consistently telling me they have customers going 100% FCoE only to find it a small environment where the requirements are specific enough to make FCoE end to end happen.
Unfortunately we are at the stage where hype/vendors/expectations dont meet expectations and that lands us fair and square in the trough of disillusionment.

Categories: Cisco, EMC, IBM, UCS, Uncategorized, VMware

Flare code warning, and

May 13, 2011 2 comments

If you have EMC CX4 storage and are using Mirrorview, I suggest you hold off any Flare upgrades past 009 unless directed by EMC Technical Support. Some EMC customers have had issues with 012 including serious Mirrorview performance problems.

EMC engineering have pulled version 009 and prior from Powerlink due to “emc261544: Detecting memory leaks with Pool LUNs in FLARE Release 30.” which is resolved in FLARE Release but this code introduces another different issue, the management server requires reboots from time to time to allow Unisphere management.

There is also a required frumon update (emc261645- for LCC to version 7.85 to protect against possible dual Power supply failure, this is unlikely but bad stuff does happen.

Categories: EMC

Useful naviseccli commands

May 12, 2011 Leave a comment

The following are some handy naviseccli commands, replace username, password and the with your storage log in details and IP.

Check the FRUMON code level on the LCC’s
naviseccli -scope 0 -user username -password password -address getcrus –lccreva
naviseccli -scope 0 -user username -password password -address getcrus -lccrevb

Check the Backend Bus speeds
naviseccli -scope 0 -user username -password password –address backendbus –get –speeds 0

SP cache details
naviseccli -scope 0 -user username -password password -address getcache

Get all the details of the Lun’s on the array
naviseccli -scope 0 -user username -password password -address getlun

Review IO Ports on an array
naviseccli -h sanipaddress -user username -password password -scope 0 ioportconfig -list |more

All details from the Array
naviseccli -scope 0 -user username -password password -address getall

Categories: EMC

Data Domain introduction training notes

April 15, 2011 1 comment

SISL (Stream-Informed Segment Layout)

Leverages the continued advancement of CPU performance to add direct benefit to system throughput scalability.

Other deduplication technologies require additional disk drives or “spindles” to achieve the throughput speeds needed for efficient deduplication. Ironically, these other hybrid technologies that mandate the use of more disk drives require more storage, time and cost to achieve a similar, yet fundamentally inferior result.

Data Domain SISL Technology Provides Many Unique Advantages
99% of duplicate data segments are identified in RAM, inline, before storing to disk.
Block data transfers with related segments and fingerprints are stored together, so large groups are written or read at once.
Efficient disk access minimizes disk seeks to enable increased performance and minimizes the number of large capacity, cost-efficient SATA disks needed to deliver high throughput.
Minimal spindle count reduces the amount of total physical storage needed, along with associated storage management.
In SISL, Data Domain has developed a proven architecture that uses deduplication to achieve high throughput with economical storage hardware. Over time, this will allow the continued scaling of CPUs to add direct benefit to system scalability in the form of additional throughput while minimizing the storage footprint.


File Dedup (not efficient)
Segment based Dedup (fixed seg)
Variable Segement Size (not fixed seg)
Inline and post process (post process limited to disk)

First full back 2-4X
First week back 7-10X
Second Fri full back 50-60X


Up to 99% identified inline in RAM
Storing related segments in RAM before written out to Disk

Data stream into RAM
Slices into segments 4-12K
fingerprint for each segment
compairs segment fingerprints

summary vector used
segment localities contain all similar data
storing unique segments into containers

DIA (Data Invulnerability Architecture)

Defense against integrity issues

End to End data verification
-reading after it’s written

Self-healing file system
-Activly reverify data

NVRAM fast restarts

Data Domain Replication

Source to Destination

license both systems

Replication types:

Collection – full system mirrior, chnages only on source, destination is read only

Directory – directory based at dir level, all systems can be source or destination, must have post compressed size of maximum expected size, CIFS and NFS ok but separate dirs

Pool – VTL pools, works like dir replication

Replication pair = context

Replication streams:

Model Source Destination

DD140, DD610 15 20
DD630 30 20
DD670 60 90
DD860 90 90
DD890 135 270

Relpication Topologies

One to One
src to des


src to des
des to src

many to one

src to des

one to many

src to des


src to pri to des


src to pri to des
src des

Data Domain Supported Protocals

FC Eth

Dedupe Storage

Data Paths

eth cifs/nfs
eth replication
fc vtl

Data Domain FS

ddvar administration file system
NFS /ddvar
CIFS \ddvar

These contain DD system core and log files
-can’t rename/delete
-Can’t access all dirs
-Data streams change per OS verions and DD model

mtrees Storage File system
5.0 and later

nfs /backup
cifs \backup

data \ col1 \backup Mtree /a (cant be delete or renamed)
\Mtree /a

Mtree – you can add up to 14 dirs Mtrees under /data/col1/

you can manage each mtree dir separately (compression rates etc)

DD Products

DLH data less head – Controller

Speed DD Boost (expects 10GbE)
Speed other (NFS, CIFS or VLT)
Logical capacity = total data including dedupe
Usable capacity = storage space

ES20 Expansion shelf, 16 drives

Models supporting external storage only:

DD690, DD860, DD880, DD890, DD Archiver and GDA
(have 4 internal disks for DDOS, boot and logs)

Models support internal storage only:

DD610 and DD630 (7 disks expandable to 12)
DD140 branch office (fixed 5 drives, RAID 5 only)

DD800 Series


Dual socket, six core 2.8GHz
Two 1GB NVRAM cards
Four 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
dual path exp shelf connectivity


Dual socket, quad core
36GB exp to 72GB RAM
One 1GB NVRAM cards
Four 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
dual path exp shelf connectivity

DD600 Series


Single socket, quad core
Two 1GB NVRAM cards
12 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
up to two 32TB exp
up to four 16TB exp

DD140 Remote Office Appliance

3 Disks, RAID5
2 ETH port

Data Domain Archiver

Larger tier of storage behind a standard DD
one controller
up to 24 ES20-32TB exp shelves
570TB usable storagae
30 x logical data capacity

Data migration

Active tier receives the data
Based on data movement schedule data to moved from the active tier to the first archive unit in the archive tier
DIA checks file atfer the are moved
data removed from active tier only once DIA verifies data

Archive Tier sealing
one or multiple shelves can be configured as an archive unit
Arcive unit automatically sealed when it fills up
data is not written into a sealed unit but files can be deleted

DD Archiver Hardware

One NVRAM card 1GB
three quad port SAS cards
One to 24 ES20 shelves
24 shelves used all 12 ports dual pathed
two 1Gb rth ports
up to optimal 1Gb or 10Gb NIC

DD Archiver replication

Controller to Controller replication

Global Depuplication Array GDA

Largest system

750 TB usable

2 DD890 controllers

Networker supports DD boost and DD VTL

Either VLT or DD Boost, not both

Data Domaim System Management

DD Enterprise Manager

DD Management Framework (CLI)

IPMI management power (Power, status, off, on, cycle)

SOL Serial over LAN

Data Domain Software licenses

DD Boost
VTL with IBM i
DD Encryption
DD Retention Lock

Hardware and Capacity Licenses

Expanded storage 7 to 12 disks DD510 or DD630
DD Archiver
Capacity Active 1 shelf
Capacity Archive 1 shelf

DD Boost

1. Improved throughput or retaining data (OST)
2. Backup Server controller replication
3. Backup Server replica awareness

DSP Distributed Segment Processing

Backup Server

1. Segments
2. Fingerprints
4. Compresses


3. Filters
5. Writes

DD Boost enables Advanced Load Balancing and Link Failover


DD systems support backups over the SAN and LAN
Backup application managed all data movement to and from the DD system
Backup aplication manages physical tape creation
DD Replication software manages virtual tape replication
DD Enterprise Manager is used to configure and manager tape emulations

DD up to 64 Tape Libraries per system
up to 256 VLT per system for single node systems
VTL Slots up to 800GB

NDMP Tape Server support for NAS backup

DD Replicator

async IP replication

supports SSL encryption
minimal performance impact

DD Encrytion


Data-in-flight (as data is transported)
Data-at-rest (data stored encrypted)


Encrypt before dedupe
Encrypt after dedupe (requires hardware)
Integrated dedupe and encryption

DD Inline Encryption

immediately, SISL used to optimized, no hardware needed

DD Retention Lock

Electronic Data Shredding
Enforced Retention for active archiving

policy based, file, database, email

Categories: Data Domain, EMC

How to Failover EMC Clariion Mirrorview/S LUNs on VMware

January 12, 2011 1 comment

This is the process you can use to failover EMC MirrorView/S LUN’s presented to VMware ESX (3.0.2 old skool!)

1. Pick one a LUN/Datastore to perform cutover on.
2. Shutdown all the VM’s on that Datastore.
3. Remove all the VM’s on that Datastore from Virtual Centre inventory, right-click and remove.
4. Switch to Navisphere, Check mirror sync state = synchronized (right click on mirror, properties, secondary tab)
5. Expand Storage Groups, select Prod ESX host group, right click and properties, LUNs, note LUN ID, select LUNs and remove LUN from Storage Group.
6. Switch to VI Client, Select a DR Host, configuration, Storage adapters, Rescan Datastores.
7. Switch to Navisphere, expand Remote Mirrors, right click on secondary image and promote.
8. Switch to VI Client, select DR ESX host, Configuration tab, advanced, Set LVM.EnableResignature to 1 (required for ESX 3.0).
9. Switch to Navisphere, Add LUN to DR ESX host groups on DR CX, ensure the same host LUN ID is used
10. Switch to VI Client, Rescan ESX in DR with Enable resignature setting enabled, check LUN is found, if not rescan again and reboot if necessary.
11. Refresh storage on all hosts in cluster, ensuring old Datastore appears (without this you can do the next step and rename the snapshot).
12. Ensure Datastore appears on host where rescan was run, rename Datastore snap-000000X-datastore name to the old Datastore name(may not need to be renamed).
13. Browse Datastore and Add VM’s to inventory by right clicking on vmx files.
14. Power on VM, select to create new identifier (safest option but copies works too).
15. Turn off resignature on DR ESX host, Configuration, advanced, Set LVM.EnableResignature to 0.

Categories: EMC, VMware