Archive

Archive for the ‘Data Domain’ Category

Data Domain introduction training notes

April 15, 2011 1 comment

SISL (Stream-Informed Segment Layout)

Leverages the continued advancement of CPU performance to add direct benefit to system throughput scalability.

Other deduplication technologies require additional disk drives or “spindles” to achieve the throughput speeds needed for efficient deduplication. Ironically, these other hybrid technologies that mandate the use of more disk drives require more storage, time and cost to achieve a similar, yet fundamentally inferior result.

Data Domain SISL Technology Provides Many Unique Advantages
99% of duplicate data segments are identified in RAM, inline, before storing to disk.
Block data transfers with related segments and fingerprints are stored together, so large groups are written or read at once.
Efficient disk access minimizes disk seeks to enable increased performance and minimizes the number of large capacity, cost-efficient SATA disks needed to deliver high throughput.
Minimal spindle count reduces the amount of total physical storage needed, along with associated storage management.
In SISL, Data Domain has developed a proven architecture that uses deduplication to achieve high throughput with economical storage hardware. Over time, this will allow the continued scaling of CPUs to add direct benefit to system scalability in the form of additional throughput while minimizing the storage footprint.

Deduplication

File Dedup (not efficient)
Segment based Dedup (fixed seg)
Variable Segement Size (not fixed seg)
Inline and post process (post process limited to disk)

First full back 2-4X
First week back 7-10X
Second Fri full back 50-60X

SISL

Up to 99% identified inline in RAM
Storing related segments in RAM before written out to Disk

Data stream into RAM
Slices into segments 4-12K
fingerprint for each segment
compairs segment fingerprints

summary vector used
segment localities contain all similar data
storing unique segments into containers

DIA (Data Invulnerability Architecture)

Defense against integrity issues

End to End data verification
-reading after it’s written

Self-healing file system
-Activly reverify data

Other
RAID6
NVRAM fast restarts
Snapshots

Data Domain Replication

Source to Destination

license both systems

Replication types:

Collection – full system mirrior, chnages only on source, destination is read only

Directory – directory based at dir level, all systems can be source or destination, must have post compressed size of maximum expected size, CIFS and NFS ok but separate dirs

Pool – VTL pools, works like dir replication

Replication pair = context

Replication streams:

Model Source Destination

DD140, DD610 15 20
DD630 30 20
DD670 60 90
DD860 90 90
DD890 135 270

Relpication Topologies

One to One
src to des

bi-directional

src to des
des to src

many to one

src
src to des
src

one to many

des
src to des
des

cascaded

src
src to pri to des
src

Cascaded

src
src to pri to des
src des

Data Domain Supported Protocals

FC Eth
VTL DD Boost,NFS,CIFS,NDMP

DD OS
DDFS
Dedupe Storage

Data Paths

eth cifs/nfs
eth replication
fc vtl

Data Domain FS

ddvar administration file system
NFS /ddvar
CIFS \ddvar

These contain DD system core and log files
-can’t rename/delete
-Can’t access all dirs
-Data streams change per OS verions and DD model

mtrees Storage File system
5.0 and later

backup
nfs /backup
cifs \backup

data \ col1 \backup Mtree /a (cant be delete or renamed)
/b
\Mtree /a
/b

Mtree – you can add up to 14 dirs Mtrees under /data/col1/

you can manage each mtree dir separately (compression rates etc)

DD Products

DLH data less head – Controller

Speed DD Boost (expects 10GbE)
Speed other (NFS, CIFS or VLT)
Logical capacity = total data including dedupe
Usable capacity = storage space

ES20 Expansion shelf, 16 drives

Models supporting external storage only:

DD690, DD860, DD880, DD890, DD Archiver and GDA
(have 4 internal disks for DDOS, boot and logs)

Models support internal storage only:

DD610 and DD630 (7 disks expandable to 12)
DD140 branch office (fixed 5 drives, RAID 5 only)

DD800 Series

DD890

Dual socket, six core 2.8GHz
96GB RAM
Two 1GB NVRAM cards
Four 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
dual path exp shelf connectivity

DD860

Dual socket, quad core
36GB exp to 72GB RAM
One 1GB NVRAM cards
Four 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
dual path exp shelf connectivity

DD600 Series

DD670

Single socket, quad core
96GB RAM
Two 1GB NVRAM cards
12 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
up to two 32TB exp
up to four 16TB exp

DD140 Remote Office Appliance

3 Disks, RAID5
2 ETH port
1 NVRAM

Data Domain Archiver

Larger tier of storage behind a standard DD
one controller
up to 24 ES20-32TB exp shelves
570TB usable storagae
30 x logical data capacity

Data migration

Active tier receives the data
Based on data movement schedule data to moved from the active tier to the first archive unit in the archive tier
DIA checks file atfer the are moved
data removed from active tier only once DIA verifies data

Archive Tier sealing
one or multiple shelves can be configured as an archive unit
Arcive unit automatically sealed when it fills up
data is not written into a sealed unit but files can be deleted

DD Archiver Hardware

DD860
72GB RAM
One NVRAM card 1GB
three quad port SAS cards
One to 24 ES20 shelves
24 shelves used all 12 ports dual pathed
two 1Gb rth ports
up to optimal 1Gb or 10Gb NIC

DD Archiver replication

Controller to Controller replication

Global Depuplication Array GDA

Largest system

750 TB usable

2 DD890 controllers

Networker supports DD boost and DD VTL

Either VLT or DD Boost, not both

Data Domaim System Management

DD Enterprise Manager

DD Management Framework (CLI)

IPMI management power (Power, status, off, on, cycle)

SOL Serial over LAN

Data Domain Software licenses

DD Boost
VTL
VTL with IBM i
DD Encryption
DD Retention Lock

Hardware and Capacity Licenses

Expanded storage 7 to 12 disks DD510 or DD630
GDA
DD Archiver
Capacity Active 1 shelf
Capacity Archive 1 shelf

DD Boost

1. Improved throughput or retaining data (OST)
2. Backup Server controller replication
3. Backup Server replica awareness

DSP Distributed Segment Processing

Backup Server

1. Segments
2. Fingerprints
4. Compresses

DD

3. Filters
5. Writes

DD Boost enables Advanced Load Balancing and Link Failover

DD VTL

DD systems support backups over the SAN and LAN
Backup application managed all data movement to and from the DD system
Backup aplication manages physical tape creation
DD Replication software manages virtual tape replication
DD Enterprise Manager is used to configure and manager tape emulations

DD up to 64 Tape Libraries per system
LTO1-LTO3
up to 256 VLT per system for single node systems
VTL Slots up to 800GB

NDMP Tape Server support for NAS backup

DD Replicator

async IP replication

supports SSL encryption
minimal performance impact

DD Encrytion

Types:

Data-in-flight (as data is transported)
Data-at-rest (data stored encrypted)

Challenges

Encrypt before dedupe
Encrypt after dedupe (requires hardware)
Integrated dedupe and encryption

DD Inline Encryption

immediately, SISL used to optimized, no hardware needed

DD Retention Lock

Electronic Data Shredding
Enforced Retention for active archiving

policy based, file, database, email

Categories: Data Domain, EMC