3q03-hou

Upload: cicerocetoc

Post on 05-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 3q03-Hou

    1/5

    Storage area networks (SANs) offer an effective means

    of storing and sharing data. As the amount of data

    stored on a SAN increases, however, backup windows

    lengthen and disaster recovery requires more time. EMC

    SnapView 2.1 and MirrorView 1.7 storage management

    software can help facilitate efficient backups and disaster

    recovery in Dell | EMC SAN environments.

    To demonstrate how IT administrators can improve SAN

    management in a typical data center environment using

    EMC storage management software, this article presents a

    scenario using a fictional company called Acme. In this sce-

    nario, Acme has a local data center that uses a Dell |EMC

    SAN for consolidated, redundant, high-availability storage.

    In addition, the company has heterogeneous servers shar-

    ing the storage and tape library.

    To protect the companys valuable business data, the

    Acme IT department implemented a disaster recovery

    plan. The plan involved deploying a fully redundant

    Dell|EMC SAN, which allows Acme to achieve high avail-

    ability at the hardware level. To prevent failures at the oper-

    ating system (OS) and application levels, the IT department

    deployed the companys main application in a Microsoft

    Cluster Service (MSCS) environment. Every application

    server that is connected to the SAN has two host bus

    adapter (HBA) cards to provide redundancy and increase

    bandwidth. Acme connected a tape library to the SAN for

    backup and restore; tapes are stored in a remote site after

    tapes at the production site are backed up and verified.

    The remote site also acts as a disaster recovery site for

    the primary (production) site, and the IT department

    established a secondary SAN for the applications running

    at the remote location.

    Currently, Acme faces three business challenges:

    Increasing backup window: As the databasegrows, the backup window will soon exceed the

    time available for the daily backups.

    Lengthy time to recovery: The length of timerequired for disaster recoveryreferred to as mean

    time to recovery (MTTR)is growing because

    larger databases require longer restore times.

    The companys main application is inaccessible

    during restoration.

    Overhead on the production environment: Acmeuses its production database to perform application

    development work; however, developer access to

    the production database creates overhead on the

    database engine. Performing online backups also

    incurs overhead that affects the performance of the

    production servers.

    POWER SOLUTIONS August 200368

    Data Replication and Recovery Using

    EMC SnapView and MirrorViewEMC SnapView 2.1 and MirrorView 1.7 software help administrators protect and

    recover their data in Dell|EMC storage area networks (SANs). This article explains the

    differences between these applications, demonstrating how they can be used to achievehigh availability in a disaster recovery environment, reduce backup windows, and

    remove the processing overhead for backups from production servers.

    S T O R A G E E N V I R O N M E N T

    B Y R I C H A R D H O U , S T E V E F EI BU S , A N D P A T TY Y O U N G

  • 7/31/2019 3q03-Hou

    2/5

    To resolve these issues, Acme decided to update its disasterrecovery plan. The Acme IT department connected the primary-site

    SAN with the secondary-site SAN, as shown in Figure 1. The

    company plans to use SnapView snapshots and clones to create

    replicas for online backups and for development use. Acme will use

    MirrorView software across the Dell| EMC SANs to create remote

    copies for disaster recovery. Using SnapView and MirrorView will

    also enable Acme to create a plan for recovery at the file, logical

    unit number (LUN), and array levels, as well as to complete online

    backups without affecting the production environment.

    Creating snapshots and clones for backups using SnapView

    SnapView 2.1 creates either a virtual point-in-time copy (snapshot)

    of the original data or a full, physical point-in-time copy (clone) of

    the original data. Currently, SnapView is supported on Dell | EMC

    FC4700-2, CX400, and CX600 storage arrays as a nondisruptive

    upgrade, meaning that the software can be added at any time with-

    out disturbing the production environment.

    Snapshots depend on source LUN

    At Acme, a long backup window consumes resources from the data-

    base engine and creates overhead on the production environment.

    The necessity for Acme development engineers to access the production

    database for development work contributes to these two problems.

    SnapView can help resolve both of these business issues.

    Using SnapView, administrators can create up to eight point-

    in-time snapshots of a LUN, which can subsequently be made

    accessible to as many as eight hosts. For example, the Acme SAN

    administrator can make a snapshot accessible to a backup server,

    allowing the production server to continue processing without the

    downtime traditionally associated with backup processes. An admin-

    istrator also can create additional snapshot sessions for use by

    the development engineers without affecting the data on the pro-

    duction, or source, LUN.

    The snapshot feature uses a cache-and-pointer design, where achunk map table keeps track of data chunks (groups of blocks)

    based on their state at a given time. As the first write request to a

    block is made to the source LUN, the chunk to be modified is copied

    to a snapshot cache on private LUNsa process known as copy on

    first write (COFW). The source LUN, the snapshot cache, and the

    chunk map table work together to create the virtual snapshot LUN.

    The snapshot LUN is an exact copy of the production LUN, and

    thus the snapshot must be accessed by a different host, such as a

    development or backup server. The backup server can read from

    and write to a snapshot LUN, but any changes made to the

    snapshot LUN do not replicate back to the source LUN. When

    the snapshot session is deactivated, the virtual snapshot LUN will

    be invisible to the server.

    As Figure 2 indicates, every source LUN can have as many as

    eight sessions and eight snapshots. Snapshots have a one-to-one

    relationship with a server. Each snapshot must be assigned to a

    different server, whereas sessions can be related to any server,

    depending on which session is activated and when it is activated.

    The most common use of a snapshot is to produce a backup

    copy of a large database. Performing an online backup of a data-

    base can help to shorten the backup window without interrupting

    STORAGE ENVIRONMENT

    www.dell.com/powersolutions POWER SOLUTIONS 69

    NAS serverStand-alone

    application serverClustered application server group

    Primary storage array

    Primary site

    Backup/restoreserver

    Remoteapplication server

    Remote storage array

    Secondary site

    Remote backup/restore tape library

    Figure 1. Primary and secondary SANs connected for disaster recovery

    One sourceLUN

    Multiplesnapshot cache

    Up to eightsessions

    Up to eightsnapshots

    Up to eightservers

    Possible relationshipbetween snapshot

    LUN and the session

    Storage group 1

    Storage group 2

    Storage group 3

    Storage group 8

    Figure 2. Source LUN, session, and snapshot LUN relationship

  • 7/31/2019 3q03-Hou

    3/5

    STORAGE ENVIRONMENT

    POWER SOLUTIONS August 200370

    production access to the database. However, online backups create

    overhead on the production database server, sometimes even requir-

    ing that the database be stopped during the backup window. A

    SnapView snapshot allows the database to be replicated instanta-

    neously. The replica can then be used for online backups, as well

    as for development work, without putting additional overhead onthe application server.

    SnapView snapshots also improve and simplify file-level recov-

    ery. Administrators can maintain a repository of snapshot sessions

    across multiple days on the network attached storage (NAS) server

    connected to the SAN, as shown in Figure 3. If, for example, a user

    wants to access files from the Friday snapshot session, the SAN

    administrator can simply activate the Friday session and share that

    snapshot LUN with the user. The user can then retrieve the needed

    files by copying files from the snapshot LUN to the source LUN.

    Clones produce full, independent copies

    Although SnapView reduces the backup window and removes

    backup overhead from the production server, the snapshot fea-

    tures cache-and-pointer design means that snapshot LUNs depend

    on the existence of the source LUN. If the source LUN is damaged

    or destroyed, administrators would need to rebuild the source LUN

    and recover the data from tape or another backup medium (assum-

    ing that the local Dell | EMC storage array is still up and running).

    The MTTR after such an event might take hours depending on the

    size of the LUN and the speed of the tape technology. For a com-

    pany requiring fast disaster recovery, as in the Acme scenario,

    snapshot LUNsthat is, virtual LUNsare not an ideal solution.

    To decrease MTTR, administrators can use the SnapView clone

    function to create LUN copies that are independent of the source

    LUN. Unlike snapshots, which are point-in-time views of a

    source LUN, clones are synchronous copies of the source LUN.

    Each clone LUN consumes exactly the same amount of physical

    space as the source LUN. Essentially a local mirror of the source

    LUN, a clone offers high availability and can withstand storage

    processor failures or source LUN failures, as well as path failures,

    provided that EMC PowerPath or Application Transparent Failover

    (ATF) software is installed and properly configured. Clones, there-

    fore, are business continuance volumes (BCVs).

    To create a clone, the initial data is copied, or synchronized,

    to the clone (see Figure 4). During synchronization, any host write

    requests made to the source LUN are copied to the clone. Once the

    clone is 100 percent synchronized, it is fractured manually at a point

    in time to create a stand-alone BCV that is independent of

    the source LUN. Servers cannot access the clone LUN until it is

    fracturedthough application I/O can still access the source

    LUN during synchronization.

    Resynchronization can occur in either direction. To recover data

    from the clone to the source LUN, administrators can use the

    reverse synchronization feature while I/O continues to the source

    LUN. A clone becomes available for read and write access once it

    is fractured. Administrators also can access a clone by creating

    a snapshot and then assigning the snapshot to a second server

    storage group as long as the snapshot is in a different storage

    group than the source LUN. This manner of implementation not

    only removes the overhead on the server, but it also enables thesource LUN to access snapshots without I/O overhead.

    After synchronization and fracturing, a clone becomes a fully

    populated, physical copy of its source LUN. Because clones are not

    pointer-based replicas, they are not affected by the COFW perfor-

    mance penalty; the data is replicated to the clone instead of being

    copied to nonvolatile memory along with the modified chunks.

    This process results in lower performance overhead for clones

    than snapshots.

    A clone is commonly used in environments that require quick

    MTTR or online backups based on the point-in-time copies that have

    zero impact on the production data. A server can read from and

    write to a fractured clone without affecting the source LUN. Also,

    resynchronizing the clone is fast because clones use a space in

    memory called the clone private log (CPL) to keep track of the

    changes that occur after they have been fractured. For efficiency,

    100 percent resynchronization is avoided; only post-fracture changes

    are resynchronized.

    Enabling array-level disaster recovery through MirrorView

    The Acme disaster recovery plan protects critical business data

    by outlining a procedure for recovery when the primary site is

    Productionstorage group

    Snapshot LUN

    Clone group(Up to eight clones)

    Fracture aftersynchronization

    Figure 4. Clone creation and access

    Source LUN

    Monday 6:00 P.M. session

    Tuesday 6:00 P.M. session

    Wednesday 6:00 P.M. session

    Thursday 6:00 P.M. session

    Friday 6:00 P.M. session

    NAS server

    Snapshot LUN

    Backup/restoreserver

    Local area network

    Figure 3. File recovery from a snapshot LUN to the source LUN

  • 7/31/2019 3q03-Hou

    4/5

    STORAGE ENVIRONMENT

    POWER SOLUTIONS August 200372

    down. The plan also addresses the replication of data from the

    primary location to the secondary location so that applications run-

    ning at the secondary site can access the same business data. To

    implement these processes, the Acme scenario uses the EMC

    MirrorView add-on software option. MirrorView is similar to the

    SnapView clone option, but works between Dell | EMC arrays

    instead of within a single array. Because MirrorView is array-

    based software, it does not use server I/O or CPU resources, and

    it supports all of the operating systems used on the array.

    Provision for disaster recovery is the major benefit of

    MirrorView mirroring. As shown in Figure 5, multiple arrays in dif-

    ferent locations can mirror to a common disaster recovery site,

    which makes it the central mirroring site for disaster recovery. If a

    disaster cripples the primary site, a MirrorView secondary image can

    be used to recover data and operations at the disaster recovery site.

    MirrorView runs redundantly across arrays. If one storage

    processor fails, MirrorViewrunning on the other storageprocessorwill take ownership of the mirrored LUNs. If the host

    can fail over I/O to the remaining storage processor (using

    PowerPath software), then mirroring will continue as normal.

    After the primary-site array has been recovered, the data at the

    secondary site can be synchronized back to the primary site.

    Although the mirrored target cannot be directly assigned to a

    server while it is acting as a mirrored target, SnapView software

    can be used to take a snapshot of the secondary mirrored LUN and

    then assign the snapshot to the servers on the secondary

    site for immediate access, even if the two sites are mirroring.

    MirrorView mirroring is synchronous, thus the longer the distance,

    the longer the delay, because the application must wait for a

    commitment to be returned from the remote array. For disaster

    recovery, primary and secondary storage systems should be

    relatively far apart (within 10 km) and connected through dedicated

    redundant pairs of fiber-optic cabling for Fibre Channelbased

    mirroring. For longer distances, other solutions exist.

    MirrorView can ensure that data from the primary storage

    system replicates to the secondary array (see Figure 6). The host

    (if any) connected to the secondary array might normally sit idle

    until the primary site fails. With SnapView at the secondary site,

    the host at the secondary site can take snapshot copies of the

    mirror images (that is, secondary LUNs) and back them up to other

    media. This technique provides point-in-time snapshots of pro-

    duction data with little impact to production server performance.

    MirrorView provides a synchronous mirroring solution, which

    can help ensure that any write to the primary array also is com-

    mitted on the secondary array before the production server gets an

    acknowledgment. Although this technique is commonly imple-

    mented on most mirroring technologies, it also requires that latency

    between two storage arrays be calculated and considered to pre-

    vent any performance degradation. Currently, MirrorView runs

    through either Fibre Channel (using dedicated fiber-optic cables)

    or Fibre Channel over IP (using routers and sufficient dedicatedbandwidth on an IP wide area network, or WAN).

    Selecting the appropriate data-protection strategy

    SnapView snapshots, SnapView clones, and MirrorView mirrors

    provide different levels of data protection. Snapshots are most likely

    to be used in a parallel processing environment to provide online

    backups or file-level recovery, whereas clones and mirrors are more

    often used in disaster recovery situations.

    Clones may be used for fast recovery of local corrupt LUNs;

    clones support read and write access to both source LUN and clone

    once the clone has been fractured. Mirrors usually enable recov-

    ery of arrays or sites. Mirrors also can be used to replicate data to

    multiple sites, and then used with snapshots for remote access.

    Mirroring provides read and write capability only to the source

    LUN, but read and write access to the remote copy of the data can

    be accomplished by using SnapView on the target array to take a

    snapshot of the mirror.

    To support either MirrorView or SnapView, administrators must

    install the EMC Access Logix tool. This software masks source and

    target LUNs to different servers to prevent LUN corruption.

    Primary location A

    Primary location B

    Primary location C

    Primary location D

    Snapshot storage group

    Disaster recovery site

    Figure 5. Central mirroring for disaster recovery

    Secondary location B

    Secondary location A

    Production storage group

    Primary location A

    Snapshot storage group

    Snapshot of thesecondary

    image

    Snapshot storage group

    Snapshot of thesecondary

    image

    Figure 6. Using MirrorView for data replication

  • 7/31/2019 3q03-Hou

    5/5

    Combined solutions reduce backup window

    and production server overhead

    In the Acme scenario, administrators were able to use both SnapView

    and MirrorView to solve the three business problems that the com-

    pany faced. The company now uses its NAS server, to which any

    user can map, for storing snapshots. This server enables adminis-

    trators to recover data from a specific point in time without a large

    backup window. The company created a local clone as a develop-

    ment server for its main clustered application, removing overhead

    from the production environment. Acme also mirrored its data to

    the remote site and created a snapshot of the mirror to enable online

    backups that will not affect the production environment.

    Mirroring the companys main application to the remote site

    provides quick MTTR and allows for remote backups in case of dis-

    aster at the primary site. Through snapshots, data can be assigned

    to servers at the remote location for other applications. Figure 7

    provides a decision tree to help administrators choose the right repli-

    cation and recovery tools for their own companys specific imple-

    mentations.

    Enabling comprehensive data-recovery plans

    using EMC software

    Dell |EMC SANs provide a reliable environment for data consoli-

    dation. The optional SnapView and MirrorView software add-ons

    enable administrators to create a comprehensive data-recovery plan

    for different disaster scenarios. When administrators use the fea-

    tures provided in SnapView and MirrorView, they enable online

    development work or data mining to be performed without

    affecting the production environment. These features also provide

    a way to replicate data to multiple locations as well as maintain

    data consistency.

    Richard Hou ([email protected]) is a systems engineer and consultant for the Dell

    Enterprise Technology and Education Center (ETEC), part of the Dell Enterprise Services

    and Support Group, where he specializes in SAN and Microsoft solutions. Richard has an

    M.S. in Electrical and Computer Engineering from The University of Texas at Austin and a

    B.S. in Mechanical Engineering from Zhejiang University, Hangzhou, China.

    Steve Feibus ([email protected]) has been a storage enterprise technologist in

    the Advanced Systems Group at Dell for the past two years and was recently promoted to

    manager of the Client Technologist team at Dell. Steve has a B.S. in Electrical Engineering

    from the University of Florida and has spent many years solving customer storage issues

    using the latest technologies and products.

    Patty Young ([email protected]) is a storage enterprise technologist in the Advanced

    Systems Group at Dell. She has been working with storage solutions for many years,

    supporting field system consultants in architecting storage solutions for their customers and

    providing feedback from customers to Dell regarding storage challenges and requirements.

    Patty has a B.A. from North Carolina State University.

    STORAGE ENVIRONMENT

    www.dell.com/powersolutions POWER SOLUTIONS 73

    CX400, CX600,or FC4700-2?

    Yes

    No

    Singleor multiple

    array?

    Single

    Multiple

    What is thepurpose of the

    data copy?

    Snapshot

    BCV, data replication,online backup,

    and data recoverywithin array

    Online backup,decision support,and testing for

    instantaneous copy

    Clone

    Tape or third-partysolutions for

    data replication

    Data replicationacross arrays; BCV

    on remote site

    Customeroperatingsystems

    Microsoft Windows 2000 Server,IBM AIX, Linux, Sun Solaris,

    Novell NetWare, HP-UX

    Arraysto be

    utilized

    CX400, CX600,FC4700-2

    Distancebetween mirrored

    locations

    Mainframe

    CX200,Dell PowerVault 660F,

    Dell PowerVault 650F

    STOPMirrorView

    not a solution

    Over 500 km

    60 km500 km

    10 km60 km

    Up to 10 km

    Up to 500 m

    Up to 300 mFibre Channel-1

    Fibre Channel-2

    Fibre Channel LW-GBIC

    Dense wavelength divisionmultiplexing (DWDM) extender

    MirrorView IPor third-party

    solution

    MirrorViewFibre Channel

    or MirrorView IP

    Figure 7. Decision tree for selecting snapshot, cloning, or mirroring

    F O R M O R E I N F O R M A T I O N

    EMC: http://www.emc.com

    Dell|EMC: http://www.dell.com/emc