Data Domain introduction training notes
SISL (Stream-Informed Segment Layout)
Leverages the continued advancement of CPU performance to add direct benefit to system throughput scalability.
Other deduplication technologies require additional disk drives or “spindles” to achieve the throughput speeds needed for efficient deduplication. Ironically, these other hybrid technologies that mandate the use of more disk drives require more storage, time and cost to achieve a similar, yet fundamentally inferior result.
Data Domain SISL Technology Provides Many Unique Advantages
99% of duplicate data segments are identified in RAM, inline, before storing to disk.
Block data transfers with related segments and fingerprints are stored together, so large groups are written or read at once.
Efficient disk access minimizes disk seeks to enable increased performance and minimizes the number of large capacity, cost-efficient SATA disks needed to deliver high throughput.
Minimal spindle count reduces the amount of total physical storage needed, along with associated storage management.
In SISL, Data Domain has developed a proven architecture that uses deduplication to achieve high throughput with economical storage hardware. Over time, this will allow the continued scaling of CPUs to add direct benefit to system scalability in the form of additional throughput while minimizing the storage footprint.
Deduplication
File Dedup (not efficient)
Segment based Dedup (fixed seg)
Variable Segement Size (not fixed seg)
Inline and post process (post process limited to disk)
First full back 2-4X
First week back 7-10X
Second Fri full back 50-60X
SISL
Up to 99% identified inline in RAM
Storing related segments in RAM before written out to Disk
Data stream into RAM
Slices into segments 4-12K
fingerprint for each segment
compairs segment fingerprints
summary vector used
segment localities contain all similar data
storing unique segments into containers
DIA (Data Invulnerability Architecture)
Defense against integrity issues
End to End data verification
-reading after it’s written
Self-healing file system
-Activly reverify data
Other
RAID6
NVRAM fast restarts
Snapshots
Data Domain Replication
Source to Destination
license both systems
Replication types:
Collection – full system mirrior, chnages only on source, destination is read only
Directory – directory based at dir level, all systems can be source or destination, must have post compressed size of maximum expected size, CIFS and NFS ok but separate dirs
Pool – VTL pools, works like dir replication
Replication pair = context
Replication streams:
Model Source Destination
DD140, DD610 15 20
DD630 30 20
DD670 60 90
DD860 90 90
DD890 135 270
Relpication Topologies
One to One
src to des
bi-directional
src to des
des to src
many to one
src
src to des
src
one to many
des
src to des
des
cascaded
src
src to pri to des
src
Cascaded
src
src to pri to des
src des
Data Domain Supported Protocals
FC Eth
VTL DD Boost,NFS,CIFS,NDMP
DD OS
DDFS
Dedupe Storage
Data Paths
eth cifs/nfs
eth replication
fc vtl
Data Domain FS
ddvar administration file system
NFS /ddvar
CIFS \ddvar
These contain DD system core and log files
-can’t rename/delete
-Can’t access all dirs
-Data streams change per OS verions and DD model
mtrees Storage File system
5.0 and later
backup
nfs /backup
cifs \backup
data \ col1 \backup Mtree /a (cant be delete or renamed)
/b
\Mtree /a
/b
Mtree – you can add up to 14 dirs Mtrees under /data/col1/
you can manage each mtree dir separately (compression rates etc)
DD Products
DLH data less head – Controller
Speed DD Boost (expects 10GbE)
Speed other (NFS, CIFS or VLT)
Logical capacity = total data including dedupe
Usable capacity = storage space
ES20 Expansion shelf, 16 drives
Models supporting external storage only:
DD690, DD860, DD880, DD890, DD Archiver and GDA
(have 4 internal disks for DDOS, boot and logs)
Models support internal storage only:
DD610 and DD630 (7 disks expandable to 12)
DD140 branch office (fixed 5 drives, RAID 5 only)
DD800 Series
DD890
Dual socket, six core 2.8GHz
96GB RAM
Two 1GB NVRAM cards
Four 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
dual path exp shelf connectivity
DD860
Dual socket, quad core
36GB exp to 72GB RAM
One 1GB NVRAM cards
Four 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
dual path exp shelf connectivity
DD600 Series
DD670
Single socket, quad core
96GB RAM
Two 1GB NVRAM cards
12 1TB disks
Two Quad-port SAS cards (up to 12 ES20’s)
up to two 32TB exp
up to four 16TB exp
DD140 Remote Office Appliance
3 Disks, RAID5
2 ETH port
1 NVRAM
Data Domain Archiver
Larger tier of storage behind a standard DD
one controller
up to 24 ES20-32TB exp shelves
570TB usable storagae
30 x logical data capacity
Data migration
Active tier receives the data
Based on data movement schedule data to moved from the active tier to the first archive unit in the archive tier
DIA checks file atfer the are moved
data removed from active tier only once DIA verifies data
Archive Tier sealing
one or multiple shelves can be configured as an archive unit
Arcive unit automatically sealed when it fills up
data is not written into a sealed unit but files can be deleted
DD Archiver Hardware
DD860
72GB RAM
One NVRAM card 1GB
three quad port SAS cards
One to 24 ES20 shelves
24 shelves used all 12 ports dual pathed
two 1Gb rth ports
up to optimal 1Gb or 10Gb NIC
DD Archiver replication
Controller to Controller replication
Global Depuplication Array GDA
Largest system
750 TB usable
2 DD890 controllers
Networker supports DD boost and DD VTL
Either VLT or DD Boost, not both
Data Domaim System Management
DD Enterprise Manager
DD Management Framework (CLI)
IPMI management power (Power, status, off, on, cycle)
SOL Serial over LAN
Data Domain Software licenses
DD Boost
VTL
VTL with IBM i
DD Encryption
DD Retention Lock
Hardware and Capacity Licenses
Expanded storage 7 to 12 disks DD510 or DD630
GDA
DD Archiver
Capacity Active 1 shelf
Capacity Archive 1 shelf
DD Boost
1. Improved throughput or retaining data (OST)
2. Backup Server controller replication
3. Backup Server replica awareness
DSP Distributed Segment Processing
Backup Server
1. Segments
2. Fingerprints
4. Compresses
DD
3. Filters
5. Writes
DD Boost enables Advanced Load Balancing and Link Failover
DD VTL
DD systems support backups over the SAN and LAN
Backup application managed all data movement to and from the DD system
Backup aplication manages physical tape creation
DD Replication software manages virtual tape replication
DD Enterprise Manager is used to configure and manager tape emulations
DD up to 64 Tape Libraries per system
LTO1-LTO3
up to 256 VLT per system for single node systems
VTL Slots up to 800GB
NDMP Tape Server support for NAS backup
DD Replicator
async IP replication
supports SSL encryption
minimal performance impact
DD Encrytion
Types:
Data-in-flight (as data is transported)
Data-at-rest (data stored encrypted)
Challenges
Encrypt before dedupe
Encrypt after dedupe (requires hardware)
Integrated dedupe and encryption
DD Inline Encryption
immediately, SISL used to optimized, no hardware needed
DD Retention Lock
Electronic Data Shredding
Enforced Retention for active archiving
policy based, file, database, email