microsoft sql server always on solutions guide for high availability and disaster recovery
TRANSCRIPT
-
MicSolAvaLeRoy
ContribMishra Review(SQLHAMattheThoma
SummamaximizAlwaysO
A key gobetweeninfrastru
CategorApplies Source: E-book 32 page
crosoutionailaby Tuttle,
butors: Li
wers: KeviA), Alexei ews, AyadSs, Benjam
ry: This whze applicatioOn high ava
oal of this pn business sucture engin
ry: Quick Gto: SQL SeWhite pappublicatios
oft SQns Guility a, Jr.
indsey All
n Farlee, SKhalyako,Shammou
min Wright
ite paper don availabililability and
paper is to estakeholderneers, and d
uide erver 2012 er (link to s
on date: Ma
QL Seuide and
en, Justin
Shahryar G, Wolfganut (Caregrt-Jones
iscusses hoity, and pro
d disaster re
establish a rs, technicadatabase ad
source contay 2012
erverfor HDisas
Erickson,
G. Hashemg Kutsche
roup), Dav
ow to reducovide data pecovery sol
common col decision mdministrato
ent)
r AlwHigh ster
Min He, C
mi (Motricera (Bwin vid P. Smit
ce planned protection utions.
ontext for rmakers, systors.
waysO
Reco
Cephas Li
city), AllanParty), Chth (Service
and unplanusing SQL S
related disctem archite
On
overy
n, Sanjay
n Hirt harles eU), Juerg
nned downtServer 2012
ussions ects,
y
gen
time, 2
http://sqlcat.com/sqlcat/b/whitepapers/archive/2012/02/25/microsoft-sql-server-alwayson-solutions-guide-for-high-availability-and-disaster-recovery.aspx
-
This page intentionally left blank
-
Copyright 2012 by Microsoft Corporation
All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher.
Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other marks are property of their respective owners. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This book expresses the authors views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.
http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery iv
ContentsHighAvailabilityandDisasterRecoveryConcepts.........................................................................1
DescribingHighAvailability................................................................................................................................................1
Plannedvs.UnplannedDowntime..........................................................................................................................................1
DegradedAvailability..............................................................................................................................................................2
QuantifyingDowntime.........................................................................................................................................................2
RecoveryObjectives................................................................................................................................................................3
JustifyingROIorOpportunityCost..........................................................................................................................................3
MonitoringAvailabilityHealth................................................................................................................................................4
PlanningforDisasterRecovery...............................................................................................................................................4
Overview:HighAvailabilitywithMicrosoftSQLServer2012..................................................................................5
SQLServerAlwaysOn..............................................................................................................................................................5
SignificantlyReducePlannedDowntime.................................................................................................................................5
EliminateIdleHardwareandImproveCostEfficiencyandPerformance................................................................................6
EasyDeploymentandManagement.......................................................................................................................................6
ContrastingRPOandRTOCapabilities....................................................................................................................................6
SQLServerAlwaysOnLayersofProtection..........................................................................................7
InfrastructureAvailability...................................................................................................................................................8
WindowsOperatingSystem....................................................................................................................................................8
WindowsServerFailoverClustering.......................................................................................................................................9
WSFCClusterValidationWizard...........................................................................................................................................11
WSFCQuorumModesandVotingConfiguration..................................................................................................................12
WSFCDisasterRecoverythroughForcedQuorum................................................................................................................15
SQLServerInstanceLevelProtection...........................................................................................................................17
AvailabilityImprovementsSQLServerInstances...............................................................................................................17
AlwaysOnFailoverClusterInstances.....................................................................................................................................18
DatabaseAvailability..........................................................................................................................................................21
AlwaysOnAvailabilityGroups...............................................................................................................................................21
AvailabilityGroupFailover....................................................................................................................................................22
AvailabilityGroupListener....................................................................................................................................................24
AvailabilityImprovementsDatabases................................................................................................................................26
ClientConnectivityRecommendations........................................................................................................................27
Conclusion..............................................................................................................................................................................28
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 1
HighAvailabilityandDisasterRecoveryConceptsYoucanmakethebestselectionofadatabasetechnologyforahighavailabilityanddisasterrecoverysolutionwhenallstakeholdershaveasharedunderstandingoftherelatedbusinessdrivers,challenges,andobjectivesofplanning,managing,andmeasuringRTOandRPOobjectives.
ReaderswhoarefamiliarwiththeseconceptscanmoveaheadtotheOverview:HighAvailabilitywithMicrosoftSQLServer2012sectionofthispaper.
DescribingHighAvailabilityForagivensoftwareapplicationorservice,highavailabilityisultimatelymeasuredintermsoftheendusersexperienceandexpectations.Thetangibleandperceivedbusinessimpactofdowntimemaybeexpressedintermsofinformationloss,propertydamage,decreasedproductivity,opportunitycosts,contractualdamages,orthelossofgoodwill.
Theprincipalgoalofahighavailabilitysolutionistominimizeormitigatetheimpactofdowntime.AsoundstrategyforthisoptimallybalancesbusinessprocessesandServiceLevelAgreements(SLAs)withtechnicalcapabilitiesandinfrastructurecosts.
Aplatformisconsideredhighlyavailablepertheagreementandexpectationsofcustomersandstakeholders.Theavailabilityofasystemcanbeexpressedasthiscalculation:
100%
Theresultingvalueisoftenexpressedbyindustryintermsofthenumberof9sthatthesolutionprovides;meanttoconveyanannualnumberofminutesofpossibleuptime,orconversely,minutesofdowntime.
Numberof9s AvailabilityPercentage TotalAnnualDowntime2 99% 3days,15hours3 99.9% 8hours,45minutes4 99.99% 52minutes,34seconds5 99.999% 5minutes,15seconds
Plannedvs.UnplannedDowntimeSystemoutagesareeitheranticipatedandplannedfor,ortheyaretheresultofanunplannedfailure.Downtimeneednotbeconsiderednegativelyifitisappropriatelymanaged.Therearetwokeytypesofforeseeabledowntime:
Plannedmaintenance.Atimewindowispreannouncedandcoordinatedforplannedmaintenancetaskssuchassoftwarepatching,hardwareupgrades,passwordupdates,offlinereindexing,dataloading,ortherehearsalofdisasterrecoveryprocedures.Deliberate,wellmanagedoperationalproceduresshouldminimizedowntimeandpreventanydataloss.Plannedmaintenanceactivities
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 2
canbeseenasinvestmentsneededtopreventormitigateotherpotentiallymoresevereunplannedoutagescenarios.
Unplannedoutage.Systemlevel,infrastructure,orprocessfailuresmayoccurthatareunplannedoruncontrollable,orthatareforeseeable,butconsideredeithertoounlikelytooccur,orareconsideredtohaveanacceptableimpact.Arobusthighavailabilitysolutiondetectsthesetypesoffailures,automaticallyrecoversfromtheoutage,andthenreestablishesfaulttolerance.
WhenestablishingSLAsforhighavailability,youshouldcalculateseparatekeyperformanceindicators(KPIs)forplannedmaintenanceactivitiesandunplanneddowntime.Thisapproachallowsyoutocontrastyourinvestmentinplannedmaintenanceactivitiesagainstthebenefitofavoidingunplanneddowntime.
DegradedAvailabilityHighavailabilityshouldnotbeconsideredasanallornothingproposition.Asanalternativetoacompleteoutage,itisoftenacceptabletotheenduserforasystemtobepartiallyavailable,ortohavelimitedfunctionalityordegradedperformance.Thesevaryingdegreesofavailabilityinclude:
Readonlyanddeferredoperations.Duringamaintenancewindow,orduringaphaseddisasterrecovery,dataretrievalisstillpossible,butnewworkflowsandbackgroundprocessingmaybetemporarilyhaltedorqueued.
Datalatencyandapplicationresponsiveness.Duetoaheavyworkload,aprocessingbacklog,orapartialplatformfailure,limitedhardwareresourcesmaybeovercommittedorundersized.Userexperiencemaysuffer,butworkmaystillgetdoneinalessproductivemanner.
Partial,transient,orimpendingfailures.Robustnessintheapplicationlogicorhardwarestackthatretriesorselfcorrectsuponencounteringanerror.Thesetypesofissuesmayappeartotheenduserasdatalatencyorpoorapplicationresponsiveness.
Partialendtoendfailure.Plannedorunplannedoutagesmayoccurgracefullywithinverticallayersofthesolutionstack(infrastructure,platform,andapplication),orhorizontallybetweendifferentfunctionalcomponents.Usersmayexperiencepartialsuccessordegradation,dependinguponthefeaturesorcomponentsthatareaffected.
Theacceptabilityofthesesuboptimalscenariosshouldbeconsideredaspartofaspectrumofdegradedavailabilityleadinguptoacompleteoutage,andasintermediatestepsinaphaseddisasterrecovery.
QuantifyingDowntimeWhendowntimedoesoccur,eitherplanned,orunplanned,theprimarybusinessgoalistobringthesystembackonlineandminimizedataloss.Everyminuteofdowntimehasdirectandindirectcosts.Withunplanneddowntime,youmustbalancethetimeandeffortneededtodeterminewhytheoutageoccurred,whatthecurrentsystemstateis,andwhatstepsareneededtorecoverfromtheoutage.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 3
Atapredeterminedpointinanyoutage,youshouldmakeorseekthebusinessdecisiontostopinvestigatingtheoutageorperformingmaintenancetasks,recoverfromtheoutagebybringingthesystembackonline,andifneeded,reestablishfaulttolerance.
RecoveryObjectivesDataredundancyisakeycomponentofahighavailabilitydatabasesolution.TransactionalactivityonyourprimarySQLServerinstanceissynchronouslyorasynchronouslyappliedtooneormoresecondaryinstances.Whenanoutageoccurs,transactionsthatwereinflightmayberolledback,ortheymaybelostonthesecondaryinstancesduetodelaysindatapropagation.
Youcanbothmeasuretheimpact,andsetrecoverygoalsintermshowlongittakestogetbackinbusiness,andhowmuchtimelatencythereisinthelasttransactionrecovered:
RecoveryTimeObjective(RTO).Thisisthedurationoftheoutage.Theinitialgoalistogetthesystembackonlineinatleastareadonlycapacitytofacilitateinvestigationofthefailure.However,theprimarygoalistorestorefullservicetothepointthatnewtransactionscantakeplace.
RecoveryPointObjective(RPO).Thisisoftenreferredtoasameasureofacceptabledataloss.Itisthetimegaporlatencybetweenthelastcommitteddatatransactionbeforethefailureandthemostrecentdatarecoveredafterthefailure.Theactualdatalosscanvarydependingupontheworkloadonthesystematthetimeofthefailure,thetypeoffailure,andthetypeofhighavailabilitysolutionused.
YoushoulduseRTOandRPOvaluesasgoalsthatindicatebusinesstolerancefordowntimeandacceptabledataloss,andasmetricsformonitoringavailabilityhealth.
JustifyingROIorOpportunityCostThebusinesscostsofdowntimemaybeeitherfinancialorintheformofcustomergoodwill.Thesecostsmayaccruewithtime,ortheymaybeincurredatacertainpointintheoutagewindow.Inadditiontoprojectingthecostofincurringanoutagewithagivenrecoverytimeanddatarecoverypoint,youcanalsocalculatethebusinessprocessandinfrastructureinvestmentsneededtoattainyourRTOandRPOgoalsortoavoidtheoutagealltogether.Theseinvestmentthemesshouldinclude:
Avoidingdowntime.Outagerecoverycostsareavoidedalltogetherifanoutagedoesntoccurinthefirstplace.Investmentsincludethecostoffaulttolerantandredundanthardwareorinfrastructure,distributingworkloadsacrossisolatedpointsoffailure,andplanneddowntimeforpreventivemaintenance.
Automatingrecovery.Ifasystemfailureoccurs,youcangreatlymitigatetheimpactofdowntimeonthecustomerexperiencethroughautomaticandtransparentrecovery.
Resourceutilization.Secondaryorstandbyinfrastructurecansitidle,awaitinganoutage.Italsocanbeleveragedforreadonlyworkloads,ortoimproveoverallsystemperformancebydistributingworkloadsacrossallavailablehardware.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 4
ForgivenRTOandRPOgoals,theneededavailabilityandrecoveryinvestments,combinedwiththeprojectedcostsofdowntime,canbeexpressedandjustifiedasafunctionoftime.Duringanactualoutage,thisallowsyoutomakecostbaseddecisionsbasedontheelapseddowntime.
MonitoringAvailabilityHealthFromanoperationalpointofview,duringanactualoutage,youshouldnotattempttoconsiderallrelevantvariablesandcalculateROIoropportunitycostsinrealtime.Instead,youshouldmonitordatalatencyonyourstandbyinstancesasaproxyforexpectedRPO.
Intheeventofanoutage,youshouldalsolimittheinitialtimespentinvestigatingtherootcauseduringtheoutage,andinsteadfocusonvalidatingthehealthofyourrecoveryenvironment,andthenrelyupondetailedsystemlogsandsecondarycopiesofdataforsubsequentforensicanalysis.
PlanningforDisasterRecoveryWhilehighavailabilityeffortsentailwhatyoudotopreventanoutage,disasterrecoveryeffortsaddresswhatisdonetoreestablishhighavailabilityaftertheoutage.
Asmuchaspossible,disasterrecoveryproceduresandresponsibilitiesshouldbeformulatedbeforeanactualoutageoccurs.Baseduponactivemonitoringandalerts,thedecisiontoinitiateanautomatedormanualfailoverandrecoveryplanshouldbetiedtopreestablishedRTOandRPOthresholds.Thescopeofasounddisasterrecoveryplanshouldinclude:
Granularityoffailureandrecovery.Dependinguponthelocationandtypeoffailure,youcantakecorrectiveactionatdifferentlevels;thatis,datacenter,infrastructure,platform,application,orworkload.
Investigativesourcematerial.Baselineandrecentmonitoringhistory,systemalerts,eventlogs,anddiagnosticqueriesshouldallbereadilyaccessiblebyappropriateparties.
Coordinationofdependencies.Withintheapplicationstack,andacrossstakeholders,whatarethesystemandbusinessdependencies?
Decisiontree.Apredetermined,repeatable,validateddecisiontreethatincludesroleresponsibilities,faulttriage,failovercriteriaintermsofRPOandRTOgoals,andprescribedrecoverysteps.
Validation.Aftertakingstepstorecoverfromtheoutage,whatmustbedonetoverifythatthesystemhasreturnedtonormaloperations?
Documentation.Capturealloftheaboveitemsinasetofdocumentation,withsufficientdetailandclaritysothatathirdpartyteamcanexecutetherecoveryplanwithminimalassistance.Thistypeofdocumentationiscommonlyreferredasarunbookoracookbook.
Recoveryrehearsals.RegularlyexercisethedisasterrecoveryplantoestablishbaselineexpectationsforRTOgoals,andconsiderregularrotationofhostingtheprimaryproductionsiteontheprimaryandeachofthedisasterrecoverysites.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 5
Overview:HighAvailabilitywithMicrosoftSQLServer2012AchievingtherequiredRPOandRTOgoalsinvolvesensuringcontinuousuptimeofcriticalapplicationsandprotectionofcriticaldatafromunplannedandplanneddowntime.SQLServerprovidesasetoffeaturesandcapabilitiesthatcanhelpachievethosegoalswhilekeepingthecostandcomplexitylow.
ReaderswhohaveahighlevelfamiliaritywiththenewAlwaysOncapabilitiescanmoveaheadtothedeepercoverageintheSQLServerAlwaysOnLayersofProtectionsectionofthispaper.
SQLServerAlwaysOnAlwaysOnisanewintegrated,flexible,costefficienthighavailabilityanddisasterrecoverysolution.Itcanprovidedataandhardwareredundancywithinandacrossdatacenters,andimprovesapplicationfailovertimetoincreasetheavailabilityofyourmissioncriticalapplications.AlwaysOnprovidesflexibilityinconfigurationandenablesreuseofexistinghardwareinvestments.
AnAlwaysOnsolutioncanleveragetwomajorSQLServer2012featuresforconfiguringavailabilityatboththedatabaseandtheinstancelevel:
AlwaysOnAvailabilityGroups,newinSQLServer2012,greatlyenhancethecapabilitiesofdatabasemirroringandhelpsensureavailabilityofapplicationdatabases,andtheyenablezerodatalossthroughlogbaseddatamovementfordataprotectionwithoutshareddisks.
Availabilitygroupsprovideanintegratedsetofoptionsincludingautomaticandmanualfailoverofalogicalgroupofdatabases,supportforuptofoursecondaryreplicas,fastapplicationfailover,andautomaticpagerepair.
AlwaysOnFailoverClusterInstances(FCIs)enhancetheSQLServerfailoverclusteringfeatureandsupportmultisiteclusteringacrosssubnets,whichenablescrossdatacenterfailoverofSQLServerinstances.Fasterandmorepredictableinstancefailoverisanotherkeybenefitthatenablesfasterapplicationrecovery.
SignificantlyReducePlannedDowntimeThekeyreasonforapplicationdowntimeinanyorganizationisplanneddowntimecausedbyoperatingsystempatching,hardwaremaintenance,andsoon.Thiscanconstitutealmost80percentoftheoutagesinanITenvironment.
SQLServer2012helpsreduceplanneddowntimesignificantlybyreducingpatchingrequirementsandenablingmoreonlinemaintenanceoperations:
WindowsServerCore.SQLServer2012supportsdeploymentsonWindowsServerCore,aminimal,streamlineddeploymentoptionforWindowsServer2008andWindowsServer2008R2.Thisoperatingsystemconfigurationcanreduceplanneddowntimebyminimizingoperatingsystempatchingrequirementsbyasmuchas60percent.
OnlineOperations.EnhancedsupportforonlineoperationslikeLOBreindexingandaddingcolumnswithdefaultvalueshelpstoreducedowntimeduringdatabasemaintenanceoperations.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 6
RollingUpgradeandPatching.AlwaysOnfeaturesfacilitaterollingupgradesandpatchingofinstances,whichhelpssignificantlytoreduceapplicationdowntime.
SQLServeronHyperV.SQLServerinstanceshostedintheHyperVenvironmentreceivetheadditionalbenefitofLiveMigration,whichenablesyoutomigratevirtualmachinesbetweenhostswithzerodowntime.Administratorscanperformmaintenanceoperationsonthehostwithoutimpactingapplications.
EliminateIdleHardwareandImproveCostEfficiencyandPerformanceTypicalhighavailabilitysolutionsinvolvedeploymentofcostly,redundant,passiveservers.AlwaysOnAvailabilityGroupsenableyoutoutilizesecondarydatabasereplicasonotherwisepassiveoridleserversforreadonlyworkloadssuchasSQLServerReportingServicesreportqueriesorbackupoperations.Theabilitytosimultaneouslyutilizeboththeprimaryandsecondarydatabasereplicashelpsimproveperformanceofallworkloadsduetobetterresourcebalancingacrossyourserverhardwareinvestments.
EasyDeploymentandManagementFeaturessuchastheConfigurationWizard,supportfortheWindowsPowerShellcommandlineinterface,dashboards,dynamicmanagementviews(DMVs),policybasedmanagement,andSystemCenterintegrationhelpsimplifydeploymentandmanagementofavailabilitygroups.
ContrastingRPOandRTOCapabilitiesThebusinessgoalsforRecoveryPointObjective(RPO)andRecoveryTimeObjective(RTO)shouldbekeydriversinselectingaSQLServertechnologyforyourhighavailabilityanddisasterrecoverysolution.Thistableoffersaroughcomparisonofthetypeofresultsthatthosedifferentsolutionsmayachieve:
HighAvailabilityandDisasterRecoverySQLServerSolution
PotentialDataLoss(RPO)
PotentialRecoveryTime(RTO)
AutomaticFailover
ReadableSecondaries(1)
AlwaysOnAvailabilityGroupsynchronouscommit
Zero Seconds Yes(4) 02
AlwaysOnAvailabilityGroupasynchronouscommit
Seconds Minutes No 04
AlwaysOnFailoverClusterInstance NA(5) Secondstominutes
Yes NA
DatabaseMirroring(2)Highsafety(sync+witness)
Zero Seconds Yes NA
DatabaseMirroring(2)Highperformance(async)
Seconds(6) Minutes(6) No NA
LogShipping Minutes(6) Minutestohours(6)
No Notduringarestore
Backup,Copy,Restore(3) Hours(6) Hourstodays(6)
No Notduringarestore
(1)AnAlwaysOnAvailabilityGroupcanhavenomorethanatotaloffoursecondaryreplicas,regardlessoftype.(2)ThisfeaturewillberemovedinafutureversionofMicrosoftSQLServer.UseAlwaysOnAvailabilityGroupsinstead.
(3)Backup,Copy,Restoreisappropriatefordisasterrecovery,butnotforhighavailability.(4)Automaticfailoverofanavailabilitygroupisnotsupportedtoorfromafailoverclusterinstance.(5)TheFCIitselfdoesntprovidedataprotection;datalossisdependentuponthestoragesystemimplementation.(6)Highlydependentupontheworkload,datavolume,andfailoverprocedures.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 7
SQLServerAlwaysOnLayersofProtectionSQLServerAlwaysOnsolutionshelpprovidefaulttoleranceanddisasterrecoveryacrossseverallogicalandphysicallayersofinfrastructureandapplicationcomponents.Historically,ithasbeenacommonpracticetohaveaseparationofdutiesandresponsibilitiesforthevariousinvolvedaudiencesandroles,suchthateachwaspredominatelyonlyconcernedaportionofthosesolutionlayers.
Thissectionofthepaperisorganizedtowalkthroughadeeperdescriptionofeachofthoselayers,andtoofferrationaleandguidanceforyourdesigndiscussionsandimplementationdecisions.
AsuccessfulSQLServerAlwaysOnsolutionrequiresunderstandingandcollaborationacrosstheselayers:
Infrastructurelevel.ServerlevelfaulttoleranceandintranodenetworkcommunicationleveragesWindowsServerFailoverClustering(WSFC)featuresforhealthmonitoringandfailovercoordination.
SQLServerinstancelevel.ASQLServerAlwaysOnFailoverClusterInstance(FCI)isaSQLServerinstancethatisinstalledacrossandcanfailovertoservernodesinaWSFCcluster.ThenodesthathosttheFCIareattachedtorobustsymmetricsharedstorage(SANorSMB).
Databaselevel.Anavailabilitygroupisasetofuserdatabasesthatfailovertogether.Anavailabilitygroupconsistsofaprimaryreplicaandonetofoursecondaryreplicas.EachreplicaishostedbyaninstanceofSQLServer(FCIornonFCI)onadifferentnodeoftheWSFCcluster.
Clientconnectivity.DatabaseclientapplicationscanconnectdirectlytoaSQLServerinstancenetworkname,ortheymayconnecttoavirtualnetworkname(VNN)thatisboundtoanavailabilitygrouplistener.TheVNNabstractstheWSFCclusterandavailabilitygrouptopology,logicallyredirectingconnectionrequeststotheappropriateSQLServerinstanceanddatabasereplica.
ThelogicaltopologyofarepresentativeAlwaysOnsolutionisillustratedinthisdiagram:
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 8
InfrastructureAvailabilityBothAlwaysOnAvailabilityGroupsandAlwaysOnFailoverClusterInstancesleveragetheWindowsServeroperatingsystemandWSFCasaplatformtechnology.Morethaneverbefore,successfulMicrosoftSQLServerdatabaseadministratorswillrelyuponasolidunderstandingofthesetechnologies.
WindowsOperatingSystemSQLServerreliesupontheWindowsplatformtoprovidefoundationalinfrastructureandservicesfornetworking,storage,security,patching,andmonitoring.
ThedifferenteditionsofSQLServer2012progressivelybuildupontheincreasingcapabilitiesandcapacityofsimilareditionsoftheWindowsServer2008R2operatingsystem,includingWindowsServer2008R2Standardoperatingsystem,WindowsServer2008R2Enterpriseoperatingsystem,andWindowsServer2008R2Datacenteroperatingsystem.
Formoreinformation,see:HardwareandSoftwareRequirementsforInstallingSQLServer2012(http://msdn.microsoft.com/enus/library/ms143506(SQL.110).aspx).
WindowsServerCoreInstallationOptionAsakeyhighavailabilityfeature,SQLServer2012supportsdeploymentontheServerCoreinstallationoptioninWindowsServer2008orlater.TheServerCoreinstallationoptionprovidesaminimalenvironmentforrunningspecificserverroleswithlimitedfunctionalityandverylimitedGUIapplicationsupport.Bydefault,onlynecessaryservicesandacommandpromptenvironmentareenabled.
Thismodeofoperationreducestheoperatingsystemattacksurfaceandsystemoverhead,anditcansignificantlyreduceongoingmaintenance,servicing,andpatchingrequirements.
AkeyconsiderationfordeployingSQLServer2012onWindowsServerCoreisthatalldeployment,configuration,administration,andmaintenanceofSQLServerandoftheoperatingsystemmustbedoneusingascriptingenvironmentsuchasWindowsPowerShell,orthroughtheuseofcommandlineorremotetools.
OptimizingSQLServerforPrivateCloudHighavailabilityanddisasterrecoveryscenariosareincreasinglycriticalinthePrivateCloudenvironment.DeploySQLServertoyourPrivateCloudtohelpensurethatyourcomputer,networkandstorageresourcesareusedefficiently,reducingbothphysicalfootprintandcapitalandoperationalexpenses.Ithelpsyouconsolidatedeployments,scaleyourresourcesefficiently,anddeployresourcesondemandwithoutcompromisingcontrol.
InadditiontoWindowsServerFailoverClusteringsupportforbothHyperVhostandguestsystems,SQLServeralsosupportsLiveMigration,whichistheabilitytomovevirtualmachinesbetweenhostswithnodiscernibledowntime.LiveMigrationalsoworksinconjunctionwithguestclustering.
Formoreinformation,seePrivateCloudComputingOptimizingSQLServerforPrivateCloud(http://www.microsoft.com/SqlServerPrivateCloud).
http://www.microsoft.com/SqlServerPrivateCloudhttp://msdn.microsoft.com/en-us/library/ms143506(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ms143506(SQL.110).aspxhttp://www.microsoft.com/SqlServerPrivateCloud
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 9
WindowsServerFailoverClusteringWindowsServerFailoverClustering(WSFC)providesinfrastructurefeaturesthatsupportthehighavailabilityanddisasterrecoveryscenariosofhostedserverapplicationssuchasMicrosoftSQLServer.
IfaWSFCclusternodeorservicefails,theservicesorresourcesthatwerehostedonthatnodecanbeautomaticallyormanuallytransferredtoanotheravailablenodeinaprocessknownasfailover.WithAlwaysOnsolutions,thisprocessappliestobothFCIsandtoavailabilitygroups.
ThenodesintheWSFCclusterworktogethertocollectivelyprovidethesetypesofcapabilities:
Distributedmetadataandnotifications.WSFCserviceandhostedapplicationmetadataismaintainedoneachnodeinthecluster.ThismetadataincludesWSFCconfigurationandstatusinadditiontohostedapplicationsettings.Changestothemetadataorstatusononenodeareautomaticallypropagatedtotheothernodesinthecluster.
Resourcemanagement.Individualnodesintheclustermayprovidephysicalresourcessuchasdirectattachedstorage(DAS),networkinterfaces,andaccesstoshareddiskstorage.Hostedapplications,suchasSQLServer,registerthemselvesasaclusterresource,andtheycanconfigurestartupandhealthdependenciesuponotherresources.
Healthmonitoring.Internodeandprimarynodehealthdetectionisaccomplishedthroughacombinationofheartbeatstylenetworkcommunicationsandresourcemonitoring.Theoverallhealthoftheclusterisdeterminedbythevotesofaquorumofnodesinthecluster.
Failovercoordination.Eachresourceisconfiguredtobehostedonaprimarynode,andeachcanbeautomaticallyormanuallytransferredtooneormoresecondarynodes.Ahealthbasedfailoverpolicycontrolsautomatictransferofresourceownershipbetweennodes.Nodesandhostedapplicationsarenotifiedwhenfailoveroccurssothattheycanreactappropriately.
Formoreinformation,seeWindowsServer|FailoverClusteringandNodeBalancing(http://www.microsoft.com/windowsserver2008/en/us/failoverclusteringmain.aspx).
Note:ItisnowcriticallyimportantthatdatabaseadministratorsunderstandtheinnerworkingsofWSFCclustersandquorummanagement.AlwaysOnhealthmonitoring,management,andfailurerecoverystepsareallintrinsicallytiedtoyourWSFCconfiguration.
WSFCStorageConfigurationsWindowsServerFailoverClusteringreliesuponeachnodeintheclustertomanageitsconnectedstoragedevices,diskvolumes,andfilesystem.WSFCassumesthatthestoragesubsystemisextremelyrobust,andthereforeifthestoragedeviceattachedtoanodeisunavailable,theclusternodeisconsideredtobeatfault.
Forwritebasedoperations,adiskvolumeislogicallyattachedtoasingleclusternodeatatimeusingaSCSI3persistentreservation.Dependinguponstoragesubsystemcapabilitiesandconfiguration,ifanodefails,logicalownershipofthediskvolumecanbetransferredtoanothernodeinthecluster.
http://www.microsoft.com/windowsserver2008/en/us/failover-clustering-main.aspxhttp://www.microsoft.com/windowsserver2008/en/us/failover-clustering-main.aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 10
SQLServerAlwaysOnsolutionsbothleverageandarerestrictedtocertainWSFCstorageconfigurationcombinations,including:
Directattachedvs.remote.Storagedevicesaredirectlyphysicallyattachedtotheserver,ortheyarepresentedbyaremotedevicethroughanetworkorhostbusadaptor(HBA).RemotestoragetechnologiesincludeStorageAreaNetwork(SAN)basedsolutionssuchasiSCSIorFibreChannel,aswellasServerMessagingBlock(SMB)filesharebasedsolutions.
Symmetricvs.asymmetric.Storagedevicesareconsideredsymmetricifexactlythesamelogicaldiskvolumeconfigurationandfilepathsarepresentedtoeachnodeinthecluster.Thephysicalimplementationandcapacityoftheunderlyingdiskvolumescanvary.
Dedicatedvs.shared.Dedicatedstorageisreservedforuseandassignedtoasinglenodeinthecluster.Sharedstorageisaccessibletomultiplenodesinthecluster.ControlandownershipofcompliantsharedstoragedevicescanbetransferredfromonenodetoanotherusingSCSI3protocols.WSFCsupportstheconcurrentmultinodehostingofclustersharedvolumesforfilesharingpurposes.However,SQLServerdoesnotsupportconcurrentmultinodeaccesstoasharedvolume.
Note:SQLServerFCIsstillrequiresymmetricalsharedstoragetobeaccessiblebyallpossiblenodeownersoftheinstance.However,withtheintroductionofAlwaysOnAvailabilityGroups,youcannowdeploydifferentnonFCIinstancesofSQLServerinaWSFCcluster,eachwithitsownunique,dedicated,localorremotestorage.
WSFCResourceHealthDetectionandFailoverEachresourceinaWSFCclusternodecanreportitsstatusandhealth,periodicallyorondemand.Avarietyofcircumstancesmayindicateaclusterresourcefailure,including:powerfailure,diskormemoryerrors,networkcommunicationerrors,misconfiguration,ornonresponsiveservices.
YoucanmakeWSFCclusterresourcessuchasnetworks,storage,orservicesdependentupononeanother.Thecumulativehealthofaresourceisdeterminedbysuccessiverollupofitshealthwiththehealthofeachofitsresourcedependencies.
ForAlwaysOnAvailabilityGroups,theavailabilitygroupandtheavailabilitygrouplistenerareregisteredasWSFCclusterresources.ForAlwaysOnFailoverClusterInstances,theSQLServerserviceandtheSQLServerAgentserviceareregisteredasWSFCclusterresources,andbotharemadedependentupontheinstancesvirtualnetworknameresource.
IfaWSFCclusterresourceexperiencesasetnumberoferrorsorfailuresoveraperiodoftime,theconfiguredfailoverpolicycausestheclusterservicetodooneofthefollowing:
Restarttheresourceonthecurrentnode. Settheresourceoffline. Initiateanautomaticfailoveroftheresourceanditsdependenciestoanothernode.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 11
Note:WSFCclusterresourcehealthdetectionhasnodirectimpactontheindividualnodeshealthortheoverallhealthofthecluster.
WSFCClusterValidationWizardTheclustervalidationwizardisafeaturethatisintegratedintofailoverclusteringinWindowsServer2008andWindowsServer2008R2.Itisakeytoolforadatabaseadministratortousetohelpensurethataclean,healthy,stableWSFCenvironmentexists,beforedeployingaSQLServerAlwaysOnsolution.
Withtheclustervalidationwizard,youcanrunasetoffocusedtestsoneitheracollectionofserversthatyouintendtouseasnodesinacluster,oronanexistingcluster.Thisprocessteststheunderlyinghardwareandsoftwaredirectly,andindividually,toobtainanaccurateassessmentofhowwellaWSFCclusterwouldbesupportedonagivenconfiguration.
Thisvalidationprocessconsistsofaseriesoftestsanddatacollectiononeachnodeinthesecategories:
Inventory.InformationonBIOSversions,environmentlevels,hostbustadapters,RAM,operatingsystemversions,devices,services,drivers,andsoon.
Network.InformationonNICbindingorder,networkcommunications,IPconfiguration,andfirewallconfiguration.ValidatesinternodecommunicationsonallNICs.
Storage.Informationondisks,drivecapacity,accesslatency,filessystems,andsoon.ValidatesSCSIcommands,diskfailoverfunctionality,andsymmetricorasymmetricstorageconfiguration.
Systemconfiguration.ValidatesActiveDirectoryconfiguration,thatdriversaresigned,memorydumpsettings,requiredoperatingsystemfeaturesandservices,compatibleprocessorarchitecture,andservicepackandWindowsSoftwareUpdatelevels.
Theresultsofthesevalidationtestsgiveyouinformationneededtofinetuneaclusterconfiguration,tracktheconfiguration,andidentifypotentialclusterconfigurationissuesbeforetheycausedowntime.YoucansaveareportofthetestsresultsasaHTMLdocumentforlaterreference.
YoushouldrunthesetestsbeforeandafteryoumakeanychangestoWSFCconfiguration,beforeyouinstallSQLServer,andasapartofanydisasterrecoveryprocess.AclustervalidationreportisrequiredbyMicrosoftCustomerSupportServices(CSS)asaconditionofMicrosoftsupportingagivenWSFCclusterconfiguration.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ValidatingHardwareforaFailoverCluster(http://technet.microsoft.com/enus/library/cc732035(WS.10).aspx).
Note:Ifyourclusterconfigurationhasasymmetricstorage,asisthecasewithhardwarebasedgeoclusteringstoragesolutions,orasmaybethecasewithAlwaysOnAvailabilityGroups,youmayneedtoapplyanumberofhotfixestopreventtheclustervalidationwizardfromfailingthestoragevalidationsteps.
Formoreinformation,seePrerequisites,Restrictions,andRecommendationsforAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878487(SQL.110).aspx#SystemReqsForAOAG).
http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspxhttp://msdn.microsoft.com/en-us/library/ff878487(SQL.110).aspx#SystemReqsForAOAGhttp://msdn.microsoft.com/en-us/library/ff878487(SQL.110).aspx#SystemReqsForAOAG
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 12
WSFCQuorumModesandVotingConfigurationWSFCusesaquorumbasedapproachtomonitoringoverallclusterhealthandmaximizenodelevelfaulttolerance.AfundamentalunderstandingofWSFCquorummodesandnodevotingconfigurationisveryimportanttodesigning,operating,andtroubleshootingyourAlwaysOnhighavailabilityanddisasterrecoverysolution.
ClusterHealthDetectionbyQuorumEachnodeinaWSFCclusterparticipatesinperiodicheartbeatcommunicationtosharethenode'shealthstatuswiththeothernodes.Unresponsivenodesareconsideredtobeinafailedstate.
AquorumnodesetisamajorityofthevotingnodesandwitnessesintheWSFCcluster.TheoverallhealthandstatusofaWSFCclusterisdeterminedbyaperiodicquorumvote.Thepresenceofaquorummeansthattheclusterishealthyenoughtoprovidenodelevelfaulttolerance.
Theabsenceofaquorumindicatesthattheclusterisnothealthy.OverallWSFCclusterhealthmustbemaintainedinordertoensurethathealthysecondarynodesareavailableforprimarynodestofailoverto.Ifthequorumvotefails,theentireWSFCclusterissetofflineasaprecautionarymeasure.ThisalsocausesallSQLServerinstancesregisteredwiththeclustertobestopped.
Note:IfaWSFCclusterissetofflinebecauseofquorumfailure,manualinterventionisrequiredtobringitbackonline.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.
QuorumModesAquorummodeisconfiguredattheWSFCclusterleveltospecifythemethodologyusedforquorumvoting.TheFailoverClusterManagerutilityrecommendsaquorummodebasedonthenumberofnodesinthecluster.
Oneofthefollowingquorummodesdetermineswhatconstitutesaquorumofvotes:
NodeMajority.Morethanonehalfofthevotingnodesintheclustermustvoteaffirmativelyfortheclustertobehealthy.
NodeandFileShareMajority.SimilartoNodeMajorityquorummode,exceptthataremotefileshareisalsoconfiguredasavotingwitness,andconnectivityfromanynodetothatshareisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.
Asabestpractice,thewitnessfileshareshouldnotresideonanynodeinthecluster,anditshouldbevisibletoallnodesinthecluster.
NodeandDiskMajority.SimilartoNodeMajorityquorummode,exceptthatashareddiskclusterresourceisalsodesignatedasavotingwitness,andconnectivityfromanynodetothatshareddiskisalsocountedasanaffirmativevote.Morethanhalfofthepossiblevotesmustbeaffirmativefortheclustertobehealthy.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 13
DiskOnly.Ashareddiskclusterresourceisdesignatedasawitness,andconnectivitybyanynodetothatshareddiskiscountedasanaffirmativevote.
Formoreinformation,seeFailoverClusterStepbyStepGuide:ConfiguringtheQuoruminaCluster(http://technet.microsoft.com/enus/library/cc770620(WS.10).aspx).
Note:Unlesseachnodeintheclusterisconfiguredtousethesamesharedstoragequorumwitnessdisk,youshouldgenerallyusetheNodeMajorityquorummodeifyouhaveanoddnumberofvotingnodes,ortheNodeandFileShareMajorityquorummodeifyouhaveanevennumberofvotingnodes.
VotingandNonVotingNodesBydefault,eachnodeintheWSFCclusterisincludedasamemberoftheclusterquorum;eachnode,filesharewitness,anddiskwitnesshasasinglevoteindeterminingtheoverallclusterhealth.ThequorumdiscussiontothispointinthispaperhascarefullyqualifiedthesetofWSFCclusternodesthatvoteonclusterhealthasvotingnodes.Insomecircumstances,youmaynotwanteverynodetohaveavote.
EachnodeinaWSFCclustercontinuouslyattemptstoestablishaquorum.Noindividualnodeintheclustercandefinitivelydeterminethattheclusterasawholeishealthyorunhealthy.Atanygivenmoment,fromtheperspectiveofeachnode,someoftheothernodesmayappeartobeoffline,orappeartobeintheprocessoffailover,orappearunresponsiveduetoanetworkcommunicationfailure.AkeyfunctionofthequorumvoteistodeterminewhethertheapparentstateofeachofnodeintheWSFCclusterisindeedthatactualstateofthosenodes.
ForallofthequorummodelsexceptDiskOnly,theeffectivenessofaquorumvotedependsonreliablecommunicationsamongallofthevotingnodesinthecluster.Youshouldtrustthequorumvotewhenallnodesareonthesamephysicalsubnet.
However,ifanodeonanothersubnetisseenasnonresponsiveinaquorumvote,butitisactuallyonlineandotherwisehealthy,thatismostlikelyduetoanetworkcommunicationsfailurebetweensubnets.Dependingupontheclustertopology,quorummode,andfailoverpolicyconfiguration,thatnetworkcommunicationsfailuremayeffectivelycreatemorethanoneset(orsubset)ofvotingnodes.
Ifmorethanonesubsetofvotingnodesisabletoestablishaquorumonitsown,thatisknownasasplitbrainscenario.Insuchascenario,thenodesintheseparatequorumsmaybehavedifferently,andinconflictwithoneanother.
Note:Thesplitbrainscenarioispossibleonlyifasystemadministratormanuallyperformsaforcedquorumoperation,orinveryrarecircumstances,aforcedmanualfailover,explicitlysubdividingthequorumnodeset.Formoreinformation,seetheWSFCDisasterRecoverythroughForcedQuorumsectionlaterinthispaper.
Tosimplifyyourquorumconfigurationandincreaseuptime,youmaywanttoadjusteachnodesNodeWeightsetting(avalueof0or1)sothatthenodesvoteisnotcountedtowardsthequorum.
http://technet.microsoft.com/en-us/library/cc770620(WS.10).aspxhttp://technet.microsoft.com/en-us/library/cc770620(WS.10).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 14
RecommendedAdjustmentstoQuorumVotingTodeterminetherecommendedquorumvotingconfigurationforthecluster,applytheseguidelines,insequentialorder:
1. Novotebydefault.Assumethateachnodeshouldnotvotewithoutexplicitjustification.
2. Includeallprimarynodes.EachnodethathostsanAlwaysOnAvailabilityGroupprimaryreplicaoristhepreferredowneroftheAlwaysOnFailoverClusterInstanceshouldhaveavote.
3. Includepossibleautomaticfailoverowners.EachnodethatcouldhostaprimaryreplicaorFCI,astheresultofanautomaticfailover,shouldhaveavote.
4. Excludesecondarysitenodes.Ingeneral,donotgivevotestonodesthatresideatasecondarydisasterrecoverysite.Youdonotwantnodesinthesecondarysitetocontributetoadecisiontotaketheclusterofflinewhenthereisnothingwrongwiththeprimarysite.
5. Oddnumberofvotes.Ifnecessary,addawitnessfileshare,awitnessnode(withorwithoutaSQLServerinstance),orawitnessdisktotheclusterandadjustthequorummodetopreventpossibletiesinthequorumvote.
6. Reassessvoteassignmentspostfailover.Youdonotwanttofailoverintoaclusterconfigurationthatdoesnotsupportahealthyquorum.
Formoreinformationonadjustingnodevotes,seeConfigureClusterQuorumNodeWeightSettings(http://msdn.microsoft.com/enus/library/hh270281(SQL.110).aspx).
Youcannotadjustthevoteofafilesharewitness.Instead,youmustselectadifferentquorummodetoincludeorexcludeitsvote.
Note:SQLServerexposesseveralsystemdynamicmanagementviews(DMVs)thatcanhelpyouadministersettingsrelatedWSFCclusterconfigurationandnodequorumvoting.
Formoreinformation,seeMonitorAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff878305(SQL.110).aspx).
http://msdn.microsoft.com/en-us/library/hh270281(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ff878305(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh270281(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 15
WSFCDisasterRecoverythroughForcedQuorumQuorumfailureisusuallycausedbyasystemicdisasterorapersistentcommunicationsfailureinvolvingseveralnodesintheWSFCcluster.Rememberthatquorumfailurecausesallclusteredservices,SQLServerinstances,andAvailabilityGroupsintheWSFCclustertobesetoffline,becausetheclustercannotensurenodelevelfaulttolerance.AquorumfailuremeansthathealthyvotingnodesintheWSFCclusternolongersatisfythequorummodel.Somenodesmayhavefailedcompletely,andsomemayhavejustshutdowntheWSFCserviceandareotherwisehealthy,exceptforthelossoftheabilitytocommunicatewithaquorum.
TobringtheWSFCclusterbackonline,youmustcorrecttherootcauseofthequorumfailureonatleastonenodeundertheexistingconfiguration.Inadisasterscenario,youmayneedtoreconfigureoridentifyalternativehardwaretouse.YoumayalsowanttoreconfiguretheremainingnodesintheWSFCclustertoreflectthesurvivingclustertopologyaswell.
YoucanusetheforcedquorumprocedureonaWSFCclusternodetooverridethesafetycontrolsthattooktheclusteroffline.Thiseffectivelytellstheclustertosuspendthequorumvotingchecks,andletsyoubringtheWSFCclusterresourcesandSQLServerbackonlineonanyofthenodesinthecluster.
Thistypeofdisasterrecoveryprocessshouldincludethefollowingsteps:
1) Determinethescopeofthefailure.IdentifywhichavailabilitygroupsorSQLServerinstancesarenonresponsiveandwhichclusternodesareonlineandavailableforpostdisasteruse,andthenexaminetheWindowseventlogsandtheSQLServersystemlogs.Wherepractical,youshouldpreserveforensicdataandsystemlogsforlateranalysis.
2) StarttheWSFCclusterbyusingforcedquorumonasinglenode.Onanotherwisehealthynode,manuallyforcetheclustertocomeonlineusingtheforcedquorumprocedure.Tominimizepotentialdataloss,selectanodethatwaslasthostinganavailabilitygroupprimaryreplica.
Formoreinformation,seeForceaWSFCClustertoStartWithoutaQuorum(http://msdn.microsoft.com/enus/library/hh270275(v=SQL.110).aspx).
Note:Ifyouusetheforcedquorumsetting,quorumchecksareblockedclusterwideuntiltheWSFCclusterachievesamajorityofvotesandautomaticallytransitionstoaregularquorummodeofoperation.
3) StarttheWSFCservicenormallyoneachotherwisehealthynode,oneatatime.Youdonothavetospecifytheforcedquorumoptionwhenyoustarttheclusterserviceontheothernodes.
AstheWSFCserviceoneachnodecomesbackonline,itnegotiateswiththeotherhealthynodestosynchronizethenewclusterconfigurationstate.Remembertodothisonenodeatatimetopreventpotentialraceconditionsinresolvingthelastknownstateofthecluster.
Note:Ensurethateachnodethatyoustartcancommunicatewiththeothernewlyonlinenodes,oryouruntheriskofcreatingmorethanonequorumnodeset;thatisasplitbrainscenario.Ifyourfindingsinstep1areaccurate,thisshouldnotoccur.
http://msdn.microsoft.com/en-us/library/hh270275(v=SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh270275(v=SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 16
4) Applynewquorummodeandnodevoteconfiguration.Ifyousuccessfullyrestartedallnodesintheclusterusingtheforcedquorumprocedure,andifyoucorrectedtherootcauseofthequorumfailure,youdonotneedtomakechangestotheoriginalquorummodeandnodevoteconfiguration.
Otherwise,youshouldevaluatethenewlyrecoveredclusternodeandavailabilityreplicatopology,andchangethequorummodeandvoteassignmentsforeachnodeasappropriate.SettheWSFCclusterserviceonunrecoverednodesoffline,orsettheirnodevotestozero.
Note:Atthispoint,thenodesandSQLServerinstancesintheclustermayappeartoberestoredbacktoregularoperation.However,ahealthyquorummaystillnotexist.UsingFailoverClusterManager,ortheAlwaysOnDashboardwithinSQLServerManagementStudio,ortheappropriateDMVs,verifythatahealthyquorumhasbeenrestored.
5) Recoveravailabilitygroupdatabasereplicasasneeded.SomedatabasesmayrecoverandcomebackonlineontheirownaspartoftheregularSQLServerstartupprocess.Therecoveryofotherdatabasesmayrequireadditionalmanualsteps.
Youcanminimizepotentialdatalossandrecoverytimefortheavailabilitygroupreplicasbybringingthembackonlineinthissequence,ifpossible:primaryreplica,synchronoussecondaryreplicas,asynchronoussecondaryreplicas.
6) Repairorreplacefailedcomponentsandrevalidatethecluster.Nowthatyouhaverecoveredfromtheinitialdisasterandquorumfailure,youshouldrepairorreplacethefailednodesandadjustrelatedWSFCandAlwaysOnconfigurationsaccordingly.Thiscanincludedroppingavailabilitygroupreplicas,evictingnodesfromthecluster,orflatteningandreinstallingsoftwareonanode.
Note:Youmustrepairorremoveallfailedavailabilityreplicas.SQLServer2012doesnottruncatethetransactionlogpastthelastknownpointofthefarthestbehindavailabilityreplica.Ifafailedreplicaisnotrepairedorremovedfromtheavailabilitygroup,thetransactionlogswillgrowandyouwillruntheriskofrunningoutoftransactionlogspaceontheotherreplicas.
7) Repeatstep4asneeded.Thegoalistoreestablishtheappropriateleveloffaulttoleranceandhighavailabilityforhealthyoperations.
8) ConductRPO/RTOanalysis.YoushouldanalyzeSQLServersystemlogs,databasetimestamps,andWindowseventlogstodeterminerootcauseofthefailure,andtodocumentactualRecoveryPointandRecoveryTimeexperiences.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 17
SQLServerInstanceLevelProtectionThenextlayerofprotectioninanAlwaysOnsolutionisthedataplatformitself;thesearethecapabilitiesandfeaturesofferedbyMicrosoftSQLServer2012anditsintegrationwithWindowsServerinfrastructurecomponents.
AvailabilityImprovementsSQLServerInstancesThesearenewSQLServer2012instancelevelfeaturesthatenhanceavailabilityforbothAlwaysOnFailoverClusterInstances,aswellasforstandaloneinstancesthathostAlwaysOnAvailabilityGroups.
Theseimprovementsrepresentenhancementsformanagingandtroubleshootingfailoverscenarios:
FlexibleFailoverPolicy.Theoutputofthenewsystemstoredprocedureusedforrobustfailuredetection,sp_server_diagnostics,usestheFailureConditionLevelpropertytoconveytheseverityofafailureaffectingtheSQLServerinstance.AWSFCfailoverpolicygovernshowthisvalueimpactstheSQLServerinstance;rangingfromrelativetoleranceoferrors,tobeingsensitivetoanySQLServerinternalcomponenterror.
Youcanconfigurefailovertobetriggeredbyanyoneofarangeoferrorlevels,including:serverdown,serverunresponsive,criticalerror,moderateerror,oranyqualifiederror.TheFailureConditionLevelpropertycanbeusedforFCIoravailabilitygroupfailoverpolicies.
PriortoSQLServer2012,therewasnogranularityoferrorconditionstogovernfailover;anyservicelevelfailurecausedfailover.
Formoreinformation,seeFailoverPolicyforFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ff878664(SQL.110).aspx).
Enhancedinstrumentationandlogging.ThereareanumberofAlwaysOnspecificsystemconfigurationviews,DMVs,performancecounters,andanextendedeventhealthsessionthatcapturesanddumpsinformationneededtotroubleshoot,tune,andmonitoryourAlwaysOndeployment.ManyoftheseareexposedvianewSQLServerPolicyManagementfacetsandpolicies.
Formoreinformation,seeAlwaysOnAvailabilityGroupsDynamicManagementViewsandFunctions(http://msdn.microsoft.com/enus/library/ff877943(SQL.110).aspx),andsys.dm_os_cluster_nodes(http://msdn.microsoft.com/enus/library/ms187341(SQL.110).aspx).
SMBfilesharesupport.YoucanplacedatabasefilesonaWindowsServer2008orlaterremotefileshareforbothstandaloneandfailoverclusterinstances,negatingtheneedforaseparatedriveletterperFCI.Thisisagoodoptionforstorageconsolidationorforhostingdatabasefilestorageonaphysicalserverforavirtualmachineguestoperatingsystem.Withtherightconfiguration,I/Operformancecanverynearlyapproximatethatofdirectattachedstorage.
Formoreinformation,seeSQLDatabasesonFileSharesIt'stimetoreconsiderthe
scenario(http://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sqldatabasesonfilesharesitstimetoreconsiderthescenario.aspx).
http://msdn.microsoft.com/en-us/library/ff878664(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ff877943(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ms187341(SQL.110).aspxhttp://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sql-databases-on-file-shares-it-s-time-to-reconsider-the-scenario.aspxhttp://blogs.msdn.com/b/sqlserverstorageengine/archive/2011/10/18/sql-databases-on-file-shares-it-s-time-to-reconsider-the-scenario.aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 18
Note:InaWSFCcluster,youcannotaddaSMBfileshareresourcedependencytotheSQLServerresourcegroup;youmusttakeseparatemeasurestoensuretheavailabilityofthefileshare.Ifthefilesharebecomesunavailable,SQLServerthrowsanI/Oexceptionandgoesoffline.
WSFCinteroperabilitywithDNS.Thevirtualnetworkname(VNN)foranFCIoravailabilitygrouplistenerisregisteredwithDNSonlyduringVNNcreationorduringconfigurationchanges.AllvirtualIPaddresses,regardlessofonlineorofflinestate,areregisteredwithDNSunderthesamevirtualnetworkname.ClientcallstoresolvethevirtualnetworknameinDNSreturnalloftheregisteredIPaddressinavaryingroundrobinsequence.
AlwaysOnFailoverClusterInstancesTheprimarypurposeofanAlwaysOnSQLServerFailoverClusterInstance(FCI)istoenhanceavailabilityofaSQLServerinstancehostedonlocalserverandstoragehardwarewithinasingledatacenter.
AnFCIisasinglelogicalSQLServerinstancethatisinstalledacrossnodesinaWindowsServerFailoverClustering(WSFC)cluster,butonlyactiveononenodeatatime.ClientapplicationsconnecttoavirtualnetworknameandvirtualIPaddressthatareownedbytheactiveclusternode.
EachinstallednodehasanidenticalconfigurationandsetofSQLServerbinaries.TheWSFCclusterservicealsoreplicatesrelevantchangesfromtheactiveinstancesentriesintheWindowsregistrytoeachinstallednode.EachnodethattheFCIisinstalledonisdesignatedasapossibleowneroftheinstanceanditsresources,withinapreferredfailoversequence.
DatabasefilesarestoredonsharedsymmetricalstoragevolumesareregisteredasaresourcewiththeWSFCcluster,andareownedbythenodethatcurrentlyhoststheFCI.
Formoreinformation,seeAlwaysOnFailoverClusterInstances(http://msdn.microsoft.com/enus/library/ms189134(SQL.110).aspx).
FCIFailoverProcessIfadependentclusterresourcefails,anAlwaysOnFailoverClusterInstanceinteractswiththeWSFCclusterserviceusingthishighlevelprocesstodoafailover:
1) Arestartisindicated.AperiodiccheckoftheWSFCorSQLServerFailoverPolicyconfigurationindicatesafailedstate.Bydefault,aservicerestartisattemptedbeforeafailovertoanothernodeisinitiated.Atimeoutintherestartattemptindicatesaresourcefailure.
2) Afailoverisindicated.AFailoverPolicycheckindicatestheneedforanodefailover.
3) TheSQLServerserviceisstopped.Ifcurrentlyrunning,anorderlyshutdownoftheSQLServerserviceisattempted.
4) TheWSFCclusterresourceistransferred.OwnershipoftheSQLServerclusterresourcegroupanditsdependentnetworkandsharedstorageresourcesaretransferredtothenextpreferrednodeowneroftheFCI.
http://msdn.microsoft.com/en-us/library/ms189134(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 19
5) SQLServerisstartedonthenewnode.TheSQLServerinstancegoesthroughitsnormalstartupprocedures.Ifitdoesnotcomebackonlinewithinapendingtimeoutperiod,theclusterserviceputstheresourceonthisnewnodeinafailedstate.
6) Userdatabasesarerecoveredonthenewnode.Eachuserdatabaseisplacedinrecoverymodewhiletransactionlogredooperationsareappliedanduncommittedtransactionsarerolledback.
FCIImprovementsPreviousversionsofSQLServerhaveofferedaFCIinstallationoption;however,severalfeatureenhancementsinSQLServer2012improveavailabilityrobustnessandserviceability:
Multisubnetclustering.SQLServer2012supportsWSFCclusternodesthatresideinmorethanonesubnet.AgivenSQLServerinstancethatresidesonaWSFCclusternodecanstartifanynetworkinterfaceisavailable;thisisknownasanORclusterresourcedependency.
PriorversionsofSQLServerrequiredthatallnetworkinterfacesbefunctionalfortheSQLServerservicetostartorfailover,andthattheyallexistonthesamesubnetorVLAN.
Note:Storagelevelreplicationbetweenclusternodesisnotimplicitlyenabledwithmultisubnetclustering.YourmultisubnetFCIsolutionmustleverageathirdpartySANbasedsolutiontoreplicatedataandcoordinatestoragefailoverbetweenclusternodes.
Formoreinformation,seeSQLServer2012AlwaysOn:MultisiteFailoverClusterInstance(http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sqlserver2012alwayson_3a00_multisitefailoverclusterinstance.aspx).
Robustfailuredetection.TheWSFCclusterservicemaintainsadedicatedadministrativeconnectiontoeachSQLServer2012FCIonthenode.Onthisconnection,aperiodicalcalltoaspecialsystemstoredprocedure,sp_server_diagnostics,returnsaricharrayofsystemhealthdiagnosticinformation.
PriortoSQLServer2012,theprimaryhealthdetectionmechanismforaFCIwasimplementedasasimpleonewaypollingprocess.Inthisprocess,theWSFCclusterserviceperiodicallycreatedanewSQLclientconnectiontotheinstance,queriedtheservername,andthendisconnected.Afailuretoconnect,oraquerytimeout,forwhateverreason,triggeredafailoverwithverylittleavailablediagnosticinformation.
Formoreinformation,seesql_server_diagnostics(http://msdn.microsoft.com/enus/library/ff878233(SQL.110).aspx).
ThereisnowbroadersupportforFCIstoragescenarios:
Bettermountpointsupport.SQLServersetupnowrecognizesclusterdiskmountpointsettings.ThespecifiedclusterdisksandalldisksmountedtoitareautomaticallyaddedtotheSQLServerresourcedependencyduringsetup.
tempdbonlocalstorage.FCIsnowsupportplacementoftempdbonlocalnonsharedstorage,suchasalocalsolidstatedrive,potentiallyoffloadingasignificantamountofI/OfromasharedSAN.
http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012-alwayson_3a00_-multisite-failover-cluster-instance.aspxhttp://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012-alwayson_3a00_-multisite-failover-cluster-instance.aspxhttp://msdn.microsoft.com/en-us/library/ff878233(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 20
PriortoSQLServer2012,FCIsrequiredtempdbtobelocatedonasymmetricalsharedstoragevolumethatfailedoverwithothersystemdatabases.
Note:Thelocationoftempdbisstoredinthemasterdatabase,whichmovesbetweennodesduringfailover.Itmustbeonavalidsymmetricalfilepath(drive,folders,andpermissions)onallpotentialnodeowners,orelsetheSQLServerservicewillnotstartonsomenodes.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 21
DatabaseAvailabilityThehighavailabilitycapabilitiesofferedbytheinfrastructureandSQLServerinstancelevelcomponentsworktogethertoimplicitlyprotecthosteddatabases.AnAlwaysOnsolutionoffersanadditionalsetofoptionsforexplicitlyprotectingdatabasedataanddatatierapplications.
AlwaysOnAvailabilityGroupsAnavailabilitygroupisasetofuserdatabasesthatfailovertogetherfromoneSQLServerinstancetoanotherwithinthesameWSFCcluster.ClientapplicationscanconnecttotheavailabilitygroupsdatabasesthroughaWSFCvirtualnetworkname,knownasanavailabilitygrouplistener,whichabstractstheunderlyingSQLServerinstances.
AlwaysOnAvailabilityGroupsrelyuponWindowsServerFailoverClusteringforhealthmonitoring,failovercoordination,andserverconnectivity.YoumustenableAlwaysOnsupportonaSQLServerinstancethatresidesonaWSFCclusternode.However,thatinstancedoesnothavetobeaFCI,anditdoesnotrequiretheuseofsymmetricalsharedstorage.
Formoreinformation,seeOverviewofAlwaysOnAvailabilityGroups(http://msdn.microsoft.com/enus/library/ff877884(SQL.110).aspx).
AvailabilityReplicasandRolesEachSQLServerinstanceintheavailabilitygrouphostsanavailabilityreplicathatcontainsacopyoftheuserdatabasesintheavailabilitygroup.ASQLServerinstancecanhostonlyoneavailabilityreplicafromagivenavailabilitygroup,butmultipleavailabilitygroupsmayresideonthesameinstance.TheSQLServerinstancemusthavededicated(nonshared)storagevolumes.
Oneoftheavailabilityreplicasservesintheroleofprimaryreplica.Itisdesignatedasthemastercopyoftheavailabilitygroupdatabasesandisenabledforread/writeoperations.
Anavailabilitygroupcancontainfromonetofouradditionalreadonlyavailabilityreplicasthateachseparatelyserveintheroleofasecondaryreplica.
AvailabilityReplicaSynchronizationThecontentsofeachdatabaseinanavailabilitygrouparesynchronizedfromtheprimaryreplicatoeachofsecondaryreplicasthroughamechanismofSQLServerlogbaseddatamovement.Forthisreason,alldatabasesintheavailabilitygroupmustbesettothefullrecoverymodel.
Secondaryreplicasareinitializedwithafullbackupandrestoreoftheprimaryreplicasdatabasesandtransactionlogs.Asnewtransactionsarecommittedontheprimaryreplica,thecorrespondingportionofthetransactionlogiscached,queued,andthensentoverthenetworktoadatabasemirroringendpointoneachofthesecondaryreplicanodes.
Inthismanner,newentriesintheprimaryreplicatransactionlogareappendedontoeachofthesecondaryreplicastransactionlogs.Eachsecondaryreplicaperiodicallycommunicatesalogsequencenumber(LSN)backtotheprimaryreplicatoindicateawatermarkofhowmuchoftheirtransactionloghasbeenhardenedandflushedtotheremotedisk.
http://msdn.microsoft.com/en-us/library/ff877884(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 22
Note:Eachavailabilityreplicahasitsownsetofindependenttransactionlogredothreadsthatarenotpartoftheavailabilityreplicasynchronizationprocess.Youmayperceivedelaysinthelogredoprocessonthesecondaryreplicasasdatalatency.
Inadditiontohavingaroleofprimaryorsecondary,eachavailabilityreplicaalsohasanavailabilitymode,whichgovernsthecoordinationofhardeningthetransactionlogsduringaCOMMITTRANstatement:
Synchronouscommitmode.TheprimaryreplicacommitsagiventransactiononlyafterallsynchronouscommitsecondaryreplicasacknowledgethattheyhavefinishedhardeningtheirrespectivetransactionlogspastthattransactionsLSN.Anavailabilitygroupcanhaveupto2synchronouscommitsecondaryreplicas.
Synchronouscommitmodeintroducestransactionlatencyontheprimaryreplicadatabases,butitensuresthatthereisnodatalossonthesecondaryreplicasforcommittedtransactions.
Asynchronouscommitmode.Theprimaryreplicacommitstransactionsafterhardeningthelocaltransactionlog,butitdoesnotwaitforacknowledgementthatanasynchronouscommitsecondaryreplicahashardeneditstransactionlog.Anavailabilitygroupcanhaveupto4asynchronouscommitsecondaryreplicas,butnomorethanatotalof4secondaryreplicasofanytype.
Asynchronouscommitmodeminimizestransactionlatencyontheprimaryreplicadatabasesbutallowsthesecondaryreplicatransactionlogstolagbehind,makingsomedatalosspossible.
Formoreinformation,seeAvailabilityModes(http://msdn.microsoft.com/enus/library/ff877931(SQL.110).aspx).
Theoverallhealthofthedataflowbetweentheavailabilityreplicasisindicatedbythesynchronizationstateofeachreplica.YouwillmostlikelyexperiencedatalossifyoufailovertoasecondaryreplicawithasynchronizationstateofanythingotherthanSynchronizedorSynchronizing.
Eachsecondaryreplicassynchronizationstreamhasasessiontimeoutproperty.Whenasecondaryreplicaconfiguredforasynchronouscommitavailabilitymodefailswithasessiontimeout,itistemporarilymarkedinternallyasasynchronous.Thisisdonesothatthesecondaryreplicafailuredoesnotimpacthardeningofthetransactionlogontheprimaryreplica.Afterthatsecondaryreplicaishealthyandcaughtbackupwithprimaryreplica,itautomaticallyrevertstonormalsynchronouscommitmodeoperations.
AvailabilityGroupFailoverTheavailabilitygroupandacorrespondingvirtualnetworknameareregisteredasresourcesintheWSFCcluster.Anavailabilitygroupfailsoveratthelevelofanavailabilityreplica,baseduponthehealthandfailoverpolicyoftheprimaryreplica.
AnavailabilitygroupfailoverpolicyusestheFailureConditionLevelpropertytoindicatetheseveritytolerancelevelforafailureaffectingtheavailabilitygroup,inconjunctionwiththesp_server_diagnosticssystemstoredprocedure.ThissamemechanismisusedforFCIfailoverpolicies.
http://msdn.microsoft.com/en-us/library/ff877931(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 23
Intheeventofafailover,insteadoftransferringownershipofsharedphysicalresourcestoanothernode,WSFCisleveragedtoreconfigureasecondaryreplicaonanotherSQLServerinstancetotakeovertheroleofprimaryreplica.Theavailabilitygroup'svirtualnetworknameresourceisthentransferredtothatinstance.Allclientconnectionstotheinvolvedavailabilityreplicasarereset.
Baseduponthecurrenthealth,synchronizationstate,andavailabilitymodeofthereplicas,eachreplicahasacompositefailoverreadinessstatethatindicatesthepotentialfordataloss.ThisreplicahealthinformationisviewableintheAlwaysOnDashboard,orinthesys.dm_hadr_availability_replica_statessystemview.
Eachavailabilityreplicaalsohasaconfiguredfailovermode,whichgovernsreplicabehaviorwhenfailoverisindicated.
Automaticfailover(withoutdataloss).ThisallowsforthefastestfailovertimeofanyAlwaysOnconfigurationbecausethesecondaryreplicatransactionlogisalreadyhardenedandsynchronized.Opentransactionsontheprimaryreplicaarerolledback,andtheprimaryreplicaroleistransferredtoasecondaryreplicawithoutanyuserintervention.
Theprimaryandsecondaryreplicasmustbesettoautomaticfailovermode,andbothmustbesettosynchronouscommitavailabilitymode.ThesynchronizationstatebetweenthereplicasmustbeSynchronized.Additionally,theWSFCclustermusthaveahealthyquorum.
AutomaticfailoverisnotsupportediftheprimaryorsecondaryreplicaresidesonanFCI.ThisisblockedtopreventapotentialraceconditionbetweenavailabilitygroupandFCIfailovers.
Manualfailover.Thisallowstheadministratortoassessthestateoftheprimaryreplica,andmakeadecisiontodeliberatelyfailovertoasecondaryreplicaornot.
Dependingupontheavailabilitymodeandsynchronizationstate,youhavethesechoices:
o Plannedmanualfailover(withoutdataloss).YoucanperformthistypeoffailoveronlyifboththeprimaryandsecondaryreplicasarehealthyandinaSynchronizedstate.Thisisfunctionallyequivalenttoanautomaticfailover.
o Forcedmanualfailover(allowingpotentialdataloss).Thisistheonlyformoffailoverthatispossibleifthetargetsecondaryreplicaisinasynchronouscommitavailabilitymode,orifitisnotsynchronizedwiththeprimaryreplica.
Warning:Youshouldusethisfailoveroptioninadisasterrecoverysituationonly.Iftheprimaryreplicaishealthyandavailable,youshouldchangetheavailabilitymodeoftheinvolvedreplicastosynchronouscommitandthenperformaplannedmanualfailover.
Formoreinformation,seePerformaForcedManualFailoverofanAvailabilityGroup(http://msdn.microsoft.com/enus/library/ff877957(SQL.110).aspx).
http://msdn.microsoft.com/en-us/library/ff877957(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/ff877957(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 24
Youmustperformamanualfailoverifanyofthefollowingconditionsaretrueabouteithertheprimaryreplicaorthesecondaryreplicathatyouwanttofailoverto:
Failovermodeissettomanual. Availabilitymodeissettoasynchronouscommit. ReplicaresidesonanFCI.
Formoreinformation,seeFailoverModes(AlwaysOnAvailabilityGroups)(http://msdn.microsoft.com/enus/library/hh213151(SQL.110).aspx).
Note:Afterafailover,ifthenewprimaryreplicaisnotsettothesynchronouscommitmode,thesecondaryreplicaswillindicateaSuspendedsynchronizationstate.Nodatawillflowtothesecondaryreplicasuntiltheprimaryreplicaissettosynchronouscommitmode.
AvailabilityGroupListenerAnavailabilitygrouplistenerisaWSFCvirtualnetworkname(VNN)thatclientscanusetoaccessadatabaseintheavailabilitygroup.TheVNNclusterresourceisownedbytheSQLServerinstanceonwhichtheprimaryreplicaresides.
ThevirtualnetworknameisregisteredwithDNSonlyduringavailabilitygrouplistenercreationorduringconfigurationchanges.AllvirtualIPaddressesthataredefinedintheavailabilitygrouplistenerareregisteredwithDNSunderthesamevirtualnetworkname.
Tousetheavailabilitygrouplistener,aclientconnectionrequestmustspecifythevirtualnetworknameastheserver,andadatabasenamethatisintheavailabilitygroup.Bydefault,thisshouldresultinaconnectiontotheSQLServerinstancethatishostingtheprimaryreplica.
Atruntime,theclientusesitslocalDNSresolvertogetalistofIPaddressesandTCPportsthatmaptothevirtualnetworkname.TheclientthenattemptstoconnecttoeachoftheIPaddresses,untilitissuccessful,oruntilitreachestheconnectiontimeout.TheclientwillattempttomaketheseconnectionsinparalleliftheMultiSubnetFailoverparameterissettotrue,enablingmuchfasterclientfailovers.
Intheeventofafailover,clientconnectionsareresetontheserver,ownershipoftheavailabilitygrouplistenermoveswiththeprimaryreplicaroletoanewSQLServerinstance,andtheVNNendpointisboundtothenewinstancesvirtualIPaddressesandTCPports.
Formoreinformation,seeClientConnectivityandApplicationFailover(http://msdn.microsoft.com/enus/library/hh213417(SQL.110).aspx).
http://msdn.microsoft.com/en-us/library/hh213151(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh213417(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh213151(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 25
ApplicationIntentFilteringWhileconnectingthroughtheavailabilitygrouplistener,theapplicationcanspecifywhetheritsintentistobothreadandwritedataorwhetheritwillexclusivelyperformreadonlyoperations.Ifnotspecified,thedefaultapplicationintentfortheclientisreadwrite.
Fortheprimaryroleandsecondaryroleofeachavailabilityreplica,youcanalsospecifyaconnectionaccesspropertythatwillbeusedasaconnectionlevelfilterontheclientsapplicationintent.Bydefault,invalidapplicationintentandconnectionaccesscombinationsresultinarefusedconnection.SQLServershouldfilteroutclientconnectionrequestsusingthefollowingrules.
Whiletheavailabilityreplicaisintheprimaryrole,andconnectionaccessisequalto:
Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Allowonlyexplicitread/writeintent.Ifclientspecifiesreadonly,rejectconnection.
Whiletheavailabilityreplicaisinthesecondaryrole,andconnectionaccessisequalto:
Noconnectionsallowed.Refuseallconnections;replicaisusedonlyfordisasterrecovery. Allowanyapplicationintent.Donotfilteranyclientconnectionsforapplicationintent. Readonlyapplicationintent.Ifclientdoesnotspecifyreadonly,rejectconnection.
Formoreinformation,seeConfigureConnectionAccessonanAvailabilityReplica(http://msdn.microsoft.com/enus/library/hh213002(SQL.110).aspx).
ApplicationIntentReadOnlyRoutingAkeyvaluepropositionforAlwaysOnAvailabilityGroupsistheabilitytoleverageyourstandbyhardwareinfrastructureforpurposesotherthandisasterrecovery.Byconfiguringoneormoreofyoursecondaryreplicasforreadonlyaccess,youcanoffloadsignificantworkloadsfromyourprimaryreplicas.
Workloadsthatcanbereadilyadaptedtorunoffofareadonlysecondaryreplicainclude:reporting,databasebackups,databaseconsistencychecks,indexfragmentationanalysis,datapipelineextraction,operationalsupport,andadhocqueries.
Foreachavailabilityreplica,youcanoptionallyconfigureasequentialreadonlyroutinglistofSQLServerinstanceendpointstobeappliedwhilethatreplicaisintheprimaryrole.Ifpresent,thislistisusedtoredirectclientconnectionrequeststhatspecifyreadonlyapplicationintenttothefirstavailablesecondaryreplicainthelistthatsatisfiestheapplicationintentfiltersnotedearlier.
Note:Thereadonlyroutingredirectionisperformedbytheavailabilitygrouplistener,whichisboundtotheprimaryreplica.Iftheprimaryreplicaisoffline,clientredirectionwillnotfunction.
Formoreinformation,seeConfigureReadOnlyRoutingonanAvailabilityGroup(SQLServer)(http://msdn.microsoft.com/enus/library/hh653924(SQL.110).aspx)
http://msdn.microsoft.com/en-us/library/hh213002(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh653924(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh213002(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/hh653924(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 26
AvailabilityImprovementsDatabasesSQLServer2012hasanumberoffeatureenhancementsthatarespecifictodatabaseconfigurationandcapabilities.
Thefollowingimprovementreducesrecoverytime:
PredictableRecoveryTime.Youcansetatargetrecoverytimeintervalperdatabase,whichisusedtocontroltheschedulingofabackgroundCHECKPOINTcommand.Thisindirectcheckpointoccursperiodically,baseduponestimatedtimeneededtorecoverthetransactionlogintheeventofarestartorfailover.ThishastheeffectofsmoothingI/Oouttoroughlyequalproportionsforeachcheckpoint,andincreasingrecoverytime(RTO)predictability.
PriortoSQLServer2012,backgroundCHECKPOINTcommandswereissuedonafixedinterval,irrespectiveoftransactionvolumeorload,whichcouldleadtounpredictablerecoverytimes.
Formoreinformation,seeDatabaseCheckpoints(http://msdn.microsoft.com/enus/library/ms189573(SQL.110).aspx).
Theseimprovementsmitigatecommonscenariosthatcandriveplanneddowntime:
OnlineindexoperationsforLOBcolumns.Indexesthatcontaincolumnswithvarbinary(max),varchar(max),nvarchar(max),orXMLdatatypescannowberebuiltorreorganizedonline.
OnlineschemamodificationfornewNOTNULLcolumns.IfanewNOTNULLcolumnisaddedwithadefaultvaluetoaSQLServer2012databasetable,onlyaschemalockisrequiredtoupdatesystemmetadata;allrowsdonothavetobepopulatedduringtheALTERTABLEstatement.
SQLServerwillphysicallypersistthedefaultcolumnvalueonlyifarowisactuallymodifiedorreindexed.Queriesreturnthedefaultvaluefrommetadata,unlessanactualcolumnvalueexists.
Thereisanexampleofbroadersupportforstoragescenarios:
AutomaticPageRepair.Certaintypesofstoragesubsystemerrorscancorruptadatapage,makingitunreadable.AlwaysOnAvailabilityGroupscandetectandautomaticallyrecoverfromthesetypesoferrorsbyasynchronouslyrequestingandapplyingafreshcopyoftheaffecteddatapagesfromadifferentavailabilityreplica.
SimilarfunctionalityexistedpriortoSQLServer2012fordatabasemirroring,butitisnowenhancedtosupportmultiplereplicas.
Formoreinformation,seeAutomaticPageRepair(http://msdn.microsoft.com/enus/library/bb677167(SQL.110).aspx).
http://msdn.microsoft.com/en-us/library/ms189573(SQL.110).aspxhttp://msdn.microsoft.com/en-us/library/bb677167(SQL.110).aspx
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 27
ClientConnectivityRecommendationsFollowtheseguidelinestoenableclientapplicationstotakefulladvantageofMicrosoftSQLServer2012AlwaysOntechnologies:
AlwaysOnawareclientlibrary.Useaclientlibrarythatsupportsthetabulardatastream(TDS)protocolversion7.4ornewer.ThisshouldprovidethedesiredclientsidefunctionalityforAlwaysOnfeatures.ExampleclientlibrariesincludetheDataProviderforSQLServerin.NETFramework4.02,andtheSQLNativeClient11.0.
Connectionproviderproperty:MultiSubnetFailover=True.UsethiskeywordinyourconnectionstringstoenableclientlibrariestoattempttoconnectinparalleltoallIPaddressesthatareregisteredfortheavailabilitygrouplistenerortheFCIthathasIPaddressinmultiplesubnets.
Connectionproviderproperty:ApplicationIntent=ReadOnly.Wherepractical,offloadreadonlyworkloadsfromyourprimaryreplicaontothesecondaryreplicas.
Legacyclientconnectiontimeout.Legacyclientdatabaselibrariesdonotimplementparallelconnectionattempts,sowhenmultipleIPaddressesarepresent,theytrytoconnecttoeachofthemsequentially,untiltheyencounteraTCPtimeout,oruntiltheymakeasuccessfulconnection.
YoushouldadjustyourconnectiontimeoutonlegacyclientstoaccommodatethepotentialsequentialtimeoutsandretrieswhenmultipleIPaddressesarepresent,toavaluethatisatleast15seconds+21secondsforeverysecondaryreplica.
-
MicrosoftSQLServerAlwaysOnSolutionsGuideforHighAvailabilityandDisasterRecovery 28
ConclusionThiswhitepaperhasestablishedthebaselinecontextforhowtoreduceplannedandunplanneddowntime,maximizeapplicationavailability,andprovidedataprotectionusingSQLServer2012AlwaysOnhighavailabilityanddisasterrecoverysolutions.
Manyofthebusinessdriversandchallengesofplanning,managing,andmeasuringahighlyavailabledatabaseenvironmentcanbequantifiedandexpressedasRecoveryPointObjects(RPO)andRecoveryTimeObjectives(RTO).
SQLServer2012AlwaysOnprovidescapabilitiesattheinfrastructure,dataplatform,anddatabaselevelthatcanhelpyourorganizationaddresscommonhighavailabilityanddisasterrecoveryscenarios,inamannerthatcanbewelljustifiedusingRPOandRTOgoals.
For more information:
http://www.microsoft.com/sqlserver/: SQL Server Web site
http://technet.microsoft.com/en-us/sqlserver/: SQL Server TechCenter
http://msdn.microsoft.com/en-us/sqlserver/: SQL Server DevCenter
Did this paper help you? Please give us your feedback. Tell us on a scale of 1 (poor) to 5 (excellent), how would you rate this paper and why have you given it this rating? For example:
Are you rating it high due to having good examples, excellent screen shots, clear writing, or another reason?
Are you rating it low due to poor examples, fuzzy screen shots, or unclear writing?
This feedback will help us improve the quality of white papers we release.
Send feedback.
Version 1.1, 21 February 2012.
http://www.microsoft.com/sqlserver/http://technet.microsoft.com/en-us/sqlserver/http://msdn.microsoft.com/en-us/sqlserver/mailto:[email protected]?subject=White%20Paper%20Feedback:%20[Microsoft%20SQL%20Server%20AlwaysOn%20Solutions%20Guide%20for%20High%20Availability%20and%20Disaster%20Recovery]
CoverContentsHigh Availability and Disaster Recovery ConceptsDescribing High AvailabilityPlanned vs. Unplanned DowntimeDegraded Availability
Quantifying DowntimeRecovery ObjectivesJustifying ROI or Opportunity CostMonitoring Availability HealthPlanning for Disaster Recovery
Overview: High Availability with Microsoft SQL Server 2012SQL Server AlwaysOnSignificantly Reduce Planned DowntimeEliminate Idle Hardware and Improve Cost Efficiency and PerformanceEasy Deployment and ManagementContrasting RPO and RTO Capabilities
SQL Server AlwaysOn Layers of ProtectionInfrastructure AvailabilityWindows Operating SystemWindows Server Failover ClusteringWSFC Cluster Validation WizardWSFC Quorum Modes and Voting ConfigurationWSFC Disaster Recovery through Forced Quorum
SQL Server Instance Level ProtectionAvailability Improvements SQL Server InstancesAlwaysOn Failover Cluster Instances
Database AvailabilityAlwaysOn Availability GroupsAvailability Group FailoverAvailability Group ListenerAvailability Improvements Databases
Client Connectivity Recommendations
Conclusion