microsoft sql server always on solutions guide for high availability and disaster recovery

33

Upload: wissam-bou-malhabmba-itil

Post on 24-Jan-2017

388 views

Category:

Technology


2 download

TRANSCRIPT

  • MicSolAvaLeRoy

    ContribMishra Review(SQLHAMattheThoma

    SummamaximizAlwaysO

    A key gobetweeninfrastru

    CategorApplies Source: E-book 32 page

    crosoutionailaby Tuttle,

    butors: Li

    wers: KeviA), Alexei ews, AyadSs, Benjam

    ry: This whze applicatioOn high ava

    oal of this pn business sucture engin

    ry: Quick Gto: SQL SeWhite pappublicatios

    oft SQns Guility a, Jr.

    indsey All

    n Farlee, SKhalyako,Shammou

    min Wright

    ite paper don availabililability and

    paper is to estakeholderneers, and d

    uide erver 2012 er (link to s

    on date: Ma

    QL Seuide and

    en, Justin

    Shahryar G, Wolfganut (Caregrt-Jones

    iscusses hoity, and pro

    d disaster re

    establish a rs, technicadatabase ad

    source contay 2012

    erverfor HDisas

    Erickson,

    G. Hashemg Kutsche

    roup), Dav

    ow to reducovide data pecovery sol

    common col decision mdministrato

    ent)

    r AlwHigh ster

    Min He, C

    mi (Motricera (Bwin vid P. Smit

    ce planned protection utions.

    ontext for rmakers, systors.

    waysO

    Reco

    Cephas Li

    city), AllanParty), Chth (Service

    and unplanusing SQL S

    related disctem archite

    On

    overy

    n, Sanjay

    n Hirt harles eU), Juerg

    nned downtServer 2012

    ussions ects,

    y

    gen

    time, 2

    http://sqlcat.com/sqlcat/b/whitepapers/archive/2012/02/25/microsoft-sql-server-alwayson-solutions-guide-for-high-availability-and-disaster-recovery.aspx

  • This page intentionally left blank

  • Copyright 2012 by Microsoft Corporation

    All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher.

    Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other marks are property of their respective owners. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This book expresses the authors views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

    http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery iv

    ContentsHighAvailabilityandDisasterRecoveryConcepts.........................................................................1

    DescribingHighAvailability................................................................................................................................................1

    Plannedvs.UnplannedDowntime..........................................................................................................................................1

    DegradedAvailability..............................................................................................................................................................2

    QuantifyingDowntime.........................................................................................................................................................2

    RecoveryObjectives................................................................................................................................................................3

    JustifyingROIorOpportunityCost..........................................................................................................................................3

    MonitoringAvailabilityHealth................................................................................................................................................4

    PlanningforDisasterRecovery...............................................................................................................................................4

    Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5

    SQLServerAlwaysOn..............................................................................................................................................................5

    SignificantlyReducePlannedDowntime.................................................................................................................................5

    EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6

    EasyDeploymentandManagement.......................................................................................................................................6

    ContrastingRPOandRTOCapabilities....................................................................................................................................6

    SQLServerAlwaysOnLayersofProtection..........................................................................................7

    InfrastructureAvailability...................................................................................................................................................8

    WindowsOperatingSystem....................................................................................................................................................8

    WindowsServerFailoverClustering.......................................................................................................................................9

    WSFCClusterValidationWizard...........................................................................................................................................11

    WSFCQuorumModesandVotingConfiguration..................................................................................................................12

    WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15

    SQLServerInstanceLevelProtection...........................................................................................................................17

    AvailabilityImprovementsSQLServerInstances...............................................................................................................17

    AlwaysOnFailoverClusterInstances.....................................................................................................................................18

    DatabaseAvailability..........................................................................................................................................................21

    AlwaysOnAvailabilityGroups...............................................................................................................................................21

    AvailabilityGroupFailover....................................................................................................................................................22

    AvailabilityGroupListener....................................................................................................................................................24

    AvailabilityImprovementsDatabases................................................................................................................................26

    ClientConnectivityRecommendations........................................................................................................................27

    Conclusion..............................................................................................................................................................................28

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 1

    HighAvailabilityandDisasterRecoveryConceptsYoucanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecoverysolutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.

    ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywithMicrosoftSQLServer2012sectionofthispaper.

    DescribingHighAvailabilityForagivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsoftheendusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemaybeexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,contractualdamages,orthelossofgoodwill.

    Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.AsoundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)withtechnicalcapabilitiesandinfrastructurecosts.

    Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersandstakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:

    100%

    Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolutionprovides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesofdowntime.

    Numberof9s AvailabilityPercentage TotalAnnualDowntime2 99% 3days,15hours3 99.9% 8hours,45minutes4 99.99% 52minutes,34seconds5 99.999% 5minutes,15seconds

    Plannedvs.UnplannedDowntimeSystemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplannedfailure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokeytypesofforeseeabledowntime:

    Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenancetaskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,dataloading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperationalproceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 2

    canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplannedoutagescenarios.

    Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedoruncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orareconsideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesoffailures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.

    WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformanceindicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyoutocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanneddowntime.

    DegradedAvailabilityHighavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoacompleteoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohavelimitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude:

    Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisasterrecovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybetemporarilyhaltedorqueued.

    Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,orapartialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.Userexperiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.

    Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthatretriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduserasdatalatencyorpoorapplicationresponsiveness.

    Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayersofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferentfunctionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthefeaturesorcomponentsthatareaffected.

    Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegradedavailabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.

    QuantifyingDowntimeWhendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthesystembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.Withunplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutageoccurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 3

    Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostopinvestigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthesystembackonline,andifneeded,reestablishfaulttolerance.

    RecoveryObjectivesDataredundancyisakeycomponentofahighavailabilitydatabasesolution.TransactionalactivityonyourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondaryinstances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybelostonthesecondaryinstancesduetodelaysindatapropagation.

    Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackinbusiness,andhowmuchtimelatencythereisinthelasttransactionrecovered:

    RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthesystembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.

    RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itisthetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthemostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependingupontheworkloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhighavailabilitysolutionused.

    YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeandacceptabledataloss,andasmetricsformonitoringavailabilityhealth.

    JustifyingROIorOpportunityCostThebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecostsmayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditiontoprojectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcanalsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPOgoalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude:

    Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthefirstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventivemaintenance.

    Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeonthecustomerexperiencethroughautomaticandtransparentrecovery.

    Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocanbeleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributingworkloadsacrossallavailablehardware.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 4

    ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththeprojectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactualoutage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.

    MonitoringAvailabilityHealthFromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderallrelevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordatalatencyonyourstandbyinstancesasaproxyforexpectedRPO.

    Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduringtheoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupondetailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.

    PlanningforDisasterRecoveryWhilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddresswhatisdonetoreestablishhighavailabilityaftertheoutage.

    Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforeanactualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedormanualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescopeofasounddisasterrecoveryplanshouldinclude:

    Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantakecorrectiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,orworkload.

    Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,anddiagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.

    Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethesystemandbusinessdependencies?

    Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesroleresponsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecoverysteps.

    Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthesystemhasreturnedtonormaloperations?

    Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailandclaritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistypeofdocumentationiscommonlyreferredasarunbookoracookbook.

    Recoveryrehearsals.RegularlyexercisethedisasterrecoveryplantoestablishbaselineexpectationsforRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimaryandeachofthedisasterrecoverysites.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 5

    Overview:HighAvailabilitywithMicrosoftSQLServer2012AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplicationsandprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetoffeaturesandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.

    ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothedeepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.

    SQLServerAlwaysOnAlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.Itcanprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplicationfailovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibilityinconfigurationandenablesreuseofexistinghardwareinvestments.

    AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityatboththedatabaseandtheinstancelevel:

    AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabasemirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodatalossthroughlogbaseddatamovementfordataprotectionwithoutshareddisks.

    Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofalogicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,andautomaticpagerepair.

    AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureandsupportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServerinstances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfasterapplicationrecovery.

    SignificantlyReducePlannedDowntimeThekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperatingsystempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentoftheoutagesinanITenvironment.

    SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsandenablingmoreonlinemaintenanceoperations:

    WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.Thisoperatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystempatchingrequirementsbyasmuchas60percent.

    OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumnswithdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 6

    RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingofinstances,whichhelpssignificantlytoreduceapplicationdowntime.

    SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivetheadditionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhostswithzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithoutimpactingapplications.

    EliminateIdleHardwareandImproveCostEfficiencyandPerformanceTypicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOnAvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleserversforreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.Theabilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimproveperformanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardwareinvestments.

    EasyDeploymentandManagementFeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandlineinterface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystemCenterintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.

    ContrastingRPOandRTOCapabilitiesThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekeydriversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:

    HighAvailabilityandDisasterRecoverySQLServerSolution

    PotentialDataLoss(RPO)

    PotentialRecoveryTime(RTO)

    AutomaticFailover

    ReadableSecondaries(1)

    AlwaysOnAvailabilityGroupsynchronouscommit

    Zero Seconds Yes(4) 02

    AlwaysOnAvailabilityGroupasynchronouscommit

    Seconds Minutes No 04

    AlwaysOnFailoverClusterInstance NA(5) Secondstominutes

    Yes NA

    DatabaseMirroring(2)Highsafety(sync+witness)

    Zero Seconds Yes NA

    DatabaseMirroring(2)Highperformance(async)

    Seconds(6) Minutes(6) No NA

    LogShipping Minutes(6) Minutestohours(6)

    No Notduringarestore

    Backup,Copy,Restore(3) Hours(6) Hourstodays(6)

    No Notduringarestore

    (1)AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.(2)ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.

    (3)Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.(4)Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.(5)TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.(6)Highlydependentupontheworkload,datavolume,andfailoverprocedures.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 7

    SQLServerAlwaysOnLayersofProtectionSQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogicalandphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommonpracticetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.

    Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,andtoofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.

    AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers:

    Infrastructurelevel.ServerlevelfaulttoleranceandintranodenetworkcommunicationleveragesWindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination.

    SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServerinstancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.ThenodesthathosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).

    Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailabilitygroupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.EachreplicaishostedbyaninstanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.

    Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstancenetworkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailabilitygrouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.

    ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 8

    InfrastructureAvailabilityBothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindowsServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successfulMicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.

    WindowsOperatingSystemSQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfornetworking,storage,security,patching,andmonitoring.

    ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesandcapacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,andWindowsServer2008R2Datacenteroperatingsystem.

    Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).

    WindowsServerCoreInstallationOptionAsakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallationoptioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimalenvironmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplicationsupport.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.

    Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcansignificantlyreduceongoingmaintenance,servicing,andpatchingrequirements.

    AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbedoneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineorremotetools.

    OptimizingSQLServerforPrivateCloudHighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloudenvironment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkandstorageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperationalexpenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresourcesondemandwithoutcompromisingcontrol.

    InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQLServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithnodiscernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.

    Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivateCloud(http://www.microsoft.com/SqlServerPrivateCloud).

    http://www.microsoft.com/SqlServerPrivateCloudhttp://msdn.microsoft.com/en-us/library/ms143506(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ms143506(SQL.110).aspxhttp://www.microsoft.com/SqlServerPrivateCloud

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 9

    WindowsServerFailoverClusteringWindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehighavailabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.

    IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbeautomaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.WithAlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.

    ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities:

    Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadataismaintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusinadditiontohostedapplicationsettings.Changestothemetadataorstatusononenodeareautomaticallypropagatedtotheothernodesinthecluster.

    Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchasdirectattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hostedapplications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigurestartupandhealthdependenciesuponotherresources.

    Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthroughacombinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverallhealthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.

    Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbeautomaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailoverpolicycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhostedapplicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.

    Formoreinformation,seeWindowsServer|FailoverClusteringandNodeBalancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).

    Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFCclustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecoverystepsareallintrinsicallytiedtoyourWSFCconfiguration.

    WSFCStorageConfigurationsWindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnectedstoragedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremelyrobust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeisconsideredtobeatfault.

    Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusingaSCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifanodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.

    http://www.microsoft.com/windowsserver2008/en/us/failover-clustering-main.aspxhttp://www.microsoft.com/windowsserver2008/en/us/failover-clustering-main.aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 10

    SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfigurationcombinations,including:

    Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,ortheyarepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).RemotestoragetechnologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,aswellasServerMessagingBlock(SMB)filesharebasedsolutions.

    Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldiskvolumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysicalimplementationandcapacityoftheunderlyingdiskvolumescanvary.

    Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthecluster.Sharedstorageisaccessibletomultiplenodesinthecluster.ControlandownershipofcompliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfilesharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoasharedvolume.

    Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenodeownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannowdeploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,localorremotestorage.

    WSFCResourceHealthDetectionandFailoverEachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.Avarietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemoryerrors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.

    YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentupononeanother.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththehealthofeachofitsresourcedependencies.

    ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregisteredasWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQLServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentupontheinstancesvirtualnetworknameresource.

    IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,theconfiguredfailoverpolicycausestheclusterservicetodooneofthefollowing:

    Restarttheresourceonthecurrentnode. Settheresourceoffline. Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 11

    Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthortheoverallhealthofthecluster.

    WSFCClusterValidationWizardTheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensurethataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.

    Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofserversthatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlyinghardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFCclusterwouldbesupportedonagivenconfiguration.

    Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories:

    Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operatingsystemversions,devices,services,drivers,andsoon.

    Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewallconfiguration.ValidatesinternodecommunicationsonallNICs.

    Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSIcommands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration.

    Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memorydumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,andservicepackandWindowsSoftwareUpdatelevels.

    Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.

    YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyouinstallSQLServer,andasapartofanydisasterrecoveryprocess.AclustervalidationreportisrequiredbyMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFCclusterconfiguration.

    Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).

    Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeoclusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedtoapplyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidationsteps.

    Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).

    http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspxhttp://msdn.microsoft.com/en-us/library/ff878487(SQL.110).aspx#SystemReqsForAOAGhttp://msdn.microsoft.com/en-us/library/ff878487(SQL.110).aspx#SystemReqsForAOAG

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 12

    WSFCQuorumModesandVotingConfigurationWSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfaulttolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisveryimportanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisasterrecoverysolution.

    ClusterHealthDetectionbyQuorumEachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode'shealthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.

    AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.TheoverallhealthandstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeansthattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.

    Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbemaintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailoverto.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.ThisalsocausesallSQLServerinstancesregisteredwiththeclustertobestopped.

    Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobringitbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.

    QuorumModesAquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorumvoting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodesinthecluster.

    Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes:

    NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyfortheclustertobehealthy.

    NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefileshareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.

    Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshouldbevisibletoallnodesinthecluster.

    NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskclusterresourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 13

    DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodetothatshareddiskiscountedasanaffirmativevote.

    Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuoruminaCluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).

    Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.

    VotingandNonVotingNodesBydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,filesharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.ThequorumdiscussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteonclusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.

    EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeintheclustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygivenmoment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,orappeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunicationfailure.AkeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodeintheWSFCclusterisindeedthatactualstateofthosenodes.

    ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliablecommunicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenallnodesareonthesamephysicalsubnet.

    However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactuallyonlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetweensubnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,thatnetworkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.

    Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasasplitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,andinconflictwithoneanother.

    Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforcedquorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthequorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.

    Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodesNodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.

    http://technet.microsoft.com/en-us/library/cc770620(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc770620(WS.10).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 14

    RecommendedAdjustmentstoQuorumVotingTodeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,insequentialorder:

    1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.

    2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoristhepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.

    3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,astheresultofanautomaticfailover,shouldhaveavote.

    4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondarydisasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisiontotaketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.

    5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQLServerinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossibletiesinthequorumvote.

    6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfigurationthatdoesnotsupportahealthyquorum.

    Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeightSettings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).

    Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodetoincludeorexcludeitsvote.

    Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyouadministersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.

    Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878305(SQL.110).aspx).

    http://msdn.microsoft.com/en-us/library/hh270281(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ff878305(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh270281(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 15

    WSFCDisasterRecoverythroughForcedQuorumQuorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolvingseveralnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQLServerinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausetheclustercannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFCclusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemayhavejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilitytocommunicatewithaquorum.

    TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleastonenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureoridentifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFCclustertoreflectthesurvivingclustertopologyaswell.

    YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthattooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andletsyoubringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.

    Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:

    1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesarenonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthenexaminetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshouldpreserveforensicdataandsystemlogsforlateranalysis.

    2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotentialdataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.

    Formoreinformation,seeForceaWSFCClustertoStartWithoutaQuorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).

    Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFCclusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeofoperation.

    3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothavetospecifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.

    AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodestosynchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetopreventpotentialraceconditionsinresolvingthelastknownstateofthecluster.

    Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,oryouruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyourfindingsinstep1areaccurate,thisshouldnotoccur.

    http://msdn.microsoft.com/en-us/library/hh270275(v=SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh270275(v=SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 16

    4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesintheclusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorumfailure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.

    Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFCclusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.

    Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestoredbacktoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverClusterManager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriateDMVs,verifythatahealthyquorumhasbeenrestored.

    5) Recoveravailabilitygroupdatabasereplicasasneeded.SomedatabasesmayrecoverandcomebackonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofotherdatabasesmayrequireadditionalmanualsteps.

    Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringingthembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,asynchronoussecondaryreplicas.

    6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfromtheinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjustrelatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroupreplicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.

    Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncatethetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailedreplicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyouwillruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.

    7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhighavailabilityforhealthyoperations.

    8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,andWindowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPointandRecoveryTimeexperiences.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 17

    SQLServerInstanceLevelProtectionThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilitiesandfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServerinfrastructurecomponents.

    AvailabilityImprovementsSQLServerInstancesThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOnFailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.

    Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios:

    FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailuredetection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityofafailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactstheSQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServerinternalcomponenterror.

    Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:serverdown,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.TheFailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.

    PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;anyservicelevelfailurecausedfailover.

    Formoreinformation,seeFailoverPolicyforFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).

    Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystemconfigurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthatcapturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOndeployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.

    Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).

    SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefileshareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedriveletterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageonaphysicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/Operformancecanverynearlyapproximatethatofdirectattachedstorage.

    Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthe

    scenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabasesonfilesharesitstimetoreconsiderthescenario.aspx).

    http://msdn.microsoft.com/en-us/library/ff878664(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ff877943(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ms187341(SQL.110).aspxhttp://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sql-databases-on-file-shares-it-s-time-to-reconsider-the-scenario.aspxhttp://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sql-databases-on-file-shares-it-s-time-to-reconsider-the-scenario.aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 18

    Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServerresourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthefilesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.

    WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygrouplistenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIPaddresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtualnetworkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIPaddressinavaryingroundrobinsequence.

    AlwaysOnFailoverClusterInstancesTheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailabilityofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.

    AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailoverClustering(WSFC)cluster,butonlyactiveononenodeatatime.ClientapplicationsconnecttoavirtualnetworknameandvirtualIPaddressthatareownedbytheactiveclusternode.

    EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCclusterservicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeachinstallednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstanceanditsresources,withinapreferredfailoversequence.

    DatabasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththeWSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.

    Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ms189134(SQL.110).aspx).

    FCIFailoverProcessIfadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFCclusterserviceusingthishighlevelprocesstodoafailover:

    1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfigurationindicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeisinitiated.Atimeoutintherestartattemptindicatesaresourcefailure.

    2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.

    3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServerserviceisattempted.

    4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupanditsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednodeowneroftheFCI.

    http://msdn.microsoft.com/en-us/library/ms189134(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 19

    5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartupprocedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputstheresourceonthisnewnodeinafailedstate.

    6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymodewhiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.

    FCIImprovementsPreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeatureenhancementsinSQLServer2012improveavailabilityrobustnessandserviceability:

    Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanonesubnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetworkinterfaceisavailable;thisisknownasanORclusterresourcedependency.

    PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServerservicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.

    Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnetclustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicatedataandcoordinatestoragefailoverbetweenclusternodes.

    Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverClusterInstance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012alwayson_3a00_multisitefailoverclusterinstance.aspx).

    Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnectiontoeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystemstoredprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnosticinformation.

    PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasasimpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanewSQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailuretoconnect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailablediagnosticinformation.

    Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/enus/library/ff878233(SQL.110).aspx).

    ThereisnowbroadersupportforFCIstoragescenarios:

    Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.ThespecifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresourcedependencyduringsetup.

    tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,suchasalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.

    http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012-alwayson_3a00_-multisite-failover-cluster-instance.aspxhttp://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012-alwayson_3a00_-multisite-failover-cluster-instance.aspxhttp://msdn.microsoft.com/en-us/library/ff878233(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 20

    PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstoragevolumethatfailedoverwithothersystemdatabases.

    Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduringfailover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotentialnodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 21

    DatabaseAvailabilityThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponentsworktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetofoptionsforexplicitlyprotectingdatabasedataanddatatierapplications.

    AlwaysOnAvailabilityGroupsAnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstancetoanotherwithinthesameWSFCcluster.ClientapplicationscanconnecttotheavailabilitygroupsdatabasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstractstheunderlyingSQLServerinstances.

    AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServerinstancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,anditdoesnotrequiretheuseofsymmetricalsharedstorage.

    Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff877884(SQL.110).aspx).

    AvailabilityReplicasandRolesEachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyoftheuserdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafromagivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQLServerinstancemusthavededicated(nonshared)storagevolumes.

    Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyoftheavailabilitygroupdatabasesandisenabledforread/writeoperations.

    Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateachseparatelyserveintheroleofasecondaryreplica.

    AvailabilityReplicaSynchronizationThecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeachofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,alldatabasesintheavailabilitygroupmustbesettothefullrecoverymodel.

    Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesandtransactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportionofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroringendpointoneachofthesecondaryreplicanodes.

    Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthesecondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequencenumber(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionloghasbeenhardenedandflushedtotheremotedisk.

    http://msdn.microsoft.com/en-us/library/ff877884(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 22

    Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenotpartoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocessonthesecondaryreplicasasdatalatency.

    Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailabilitymode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRANstatement:

    Synchronouscommitmode.TheprimaryreplicacommitsagiventransactiononlyafterallsynchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheirrespectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2synchronouscommitsecondaryreplicas.

    Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butitensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.

    Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocaltransactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondaryreplicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommitsecondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.

    Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbutallowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.

    Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/enus/library/ff877931(SQL.110).aspx).

    Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronizationstateofeachreplica.YouwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawithasynchronizationstateofanythingotherthanSynchronizedorSynchronizing.

    Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondaryreplicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itistemporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoesnotimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicaishealthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommitmodeoperations.

    AvailabilityGroupFailoverTheavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesintheWSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealthandfailoverpolicyoftheprimaryreplica.

    AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseveritytolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththesp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.

    http://msdn.microsoft.com/en-us/library/ff877931(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 23

    Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanothernode,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeovertheroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredtothatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.

    Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplicahasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.ThisreplicahealthinformationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_statessystemview.

    Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhenfailoverisindicated.

    Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOnconfigurationbecausethesecondaryreplicatransactionlogisalreadyhardenedandsynchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicaroleistransferredtoasecondaryreplicawithoutanyuserintervention.

    Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbesettosynchronouscommitavailabilitymode.ThesynchronizationstatebetweenthereplicasmustbeSynchronized.Additionally,theWSFCclustermusthaveahealthyquorum.

    AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.ThisisblockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.

    Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakeadecisiontodeliberatelyfailovertoasecondaryreplicaornot.

    Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:

    o Plannedmanualfailover(withoutdataloss).YoucanperformthistypeoffailoveronlyifboththeprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionallyequivalenttoanautomaticfailover.

    o Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatispossibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitisnotsynchronizedwiththeprimaryreplica.

    Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimaryreplicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicastosynchronouscommitandthenperformaplannedmanualfailover.

    Formoreinformation,seePerformaForcedManualFailoverofanAvailabilityGroup(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).

    http://msdn.microsoft.com/en-us/library/ff877957(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ff877957(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 24

    Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimaryreplicaorthesecondaryreplicathatyouwanttofailoverto:

    Failovermodeissettomanual. Availabilitymodeissettoasynchronouscommit. ReplicaresidesonanFCI.

    Formoreinformation,seeFailoverModes(AlwaysOnAvailabilityGroups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).

    Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,thesecondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondaryreplicasuntiltheprimaryreplicaissettosynchronouscommitmode.

    AvailabilityGroupListenerAnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessadatabaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceonwhichtheprimaryreplicaresides.

    ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduringconfigurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerareregisteredwithDNSunderthesamevirtualnetworkname.

    Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworknameastheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultinaconnectiontotheSQLServerinstancethatishostingtheprimaryreplica.

    Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmaptothevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitissuccessful,oruntilitreachestheconnectiontimeout.TheclientwillattempttomaketheseconnectionsinparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.

    Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygrouplistenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointisboundtothenewinstancesvirtualIPaddressesandTCPports.

    Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/enus/library/hh213417(SQL.110).aspx).

    http://msdn.microsoft.com/en-us/library/hh213151(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh213417(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh213151(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 25

    ApplicationIntentFilteringWhileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentistobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,thedefaultapplicationintentfortheclientisreadwrite.

    Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnectionaccesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServershouldfilteroutclientconnectionrequestsusingthefollowingrules.

    Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:

    Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.

    Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto:

    Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery. Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.

    Formoreinformation,seeConfigureConnectionAccessonanAvailabilityReplica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).

    ApplicationIntentReadOnlyRoutingAkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandbyhardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyoursecondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimaryreplicas.

    Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,operationalsupport,andadhocqueries.

    Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServerinstanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedtoredirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailablesecondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.

    Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisboundtotheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.

    Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQLServer)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)

    http://msdn.microsoft.com/en-us/library/hh213002(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh653924(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh213002(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh653924(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 26

    AvailabilityImprovementsDatabasesSQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationandcapabilities.

    Thefollowingimprovementreducesrecoverytime:

    PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisusedtocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccursperiodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofarestartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeachcheckpoint,andincreasingrecoverytime(RTO)predictability.

    PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.

    Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/enus/library/ms189573(SQL.110).aspx).

    Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime:

    OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline.

    OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwithadefaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystemmetadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.

    SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorreindexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.

    Thereisanexampleofbroadersupportforstoragescenarios:

    AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingitunreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesoferrorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfromadifferentavailabilityreplica.

    SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhancedtosupportmultiplereplicas.

    Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/enus/library/bb677167(SQL.110).aspx).

    http://msdn.microsoft.com/en-us/library/ms189573(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/bb677167(SQL.110).aspx

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 27

    ClientConnectivityRecommendationsFollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012AlwaysOntechnologies:

    AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOnfeatures.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,andtheSQLNativeClient11.0.

    Connectionproviderproperty:MultiSubnetFailover=True.UsethiskeywordinyourconnectionstringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatareregisteredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.

    Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonlyworkloadsfromyourprimaryreplicaontothesecondaryreplicas.

    Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallelconnectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachofthemsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.

    YoushouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotentialsequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15seconds+21secondsforeverysecondaryreplica.

  • MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 28

    ConclusionThiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanneddowntime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012AlwaysOnhighavailabilityanddisasterrecoverysolutions.

    Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailabledatabaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecoveryTimeObjectives(RTO).

    SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevelthatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,inamannerthatcanbewelljustifiedusingRPOandRTOgoals.

    For more information:

    http://www.microsoft.com/sqlserver/: SQL Server Web site

    http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter

    http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter

    Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example:

    Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason?

    Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?

    This feedback will help us improve the quality of white papers we release.

    Send feedback.

    Version 1.1, 21 February 2012.

    http://www.microsoft.com/sqlserver/http://technet.microsoft.com/en-us/sqlserver/http://msdn.microsoft.com/en-us/sqlserver/mailto:[email protected]?subject=White%20Paper%20Feedback:%20[Microsoft%20SQL%20Server%20AlwaysOn%20Solutions%20Guide%20for%20High%20Availability%20and%20Disaster%20Recovery]

    CoverContentsHigh Availability and Disaster Recovery ConceptsDescribing High AvailabilityPlanned vs. Unplanned DowntimeDegraded Availability

    Quantifying DowntimeRecovery ObjectivesJustifying ROI or Opportunity CostMonitoring Availability HealthPlanning for Disaster Recovery

    Overview: High Availability with Microsoft SQL Server 2012SQL Server AlwaysOnSignificantly Reduce Planned DowntimeEliminate Idle Hardware and Improve Cost Efficiency and PerformanceEasy Deployment and ManagementContrasting RPO and RTO Capabilities

    SQL Server AlwaysOn Layers of ProtectionInfrastructure AvailabilityWindows Operating SystemWindows Server Failover ClusteringWSFC Cluster Validation WizardWSFC Quorum Modes and Voting ConfigurationWSFC Disaster Recovery through Forced Quorum

    SQL Server Instance Level ProtectionAvailability Improvements SQL Server InstancesAlwaysOn Failover Cluster Instances

    Database AvailabilityAlwaysOn Availability GroupsAvailability Group FailoverAvailability Group ListenerAvailability Improvements Databases

    Client Connectivity Recommendations

    Conclusion