lokesh rac
TRANSCRIPT
-
7/31/2019 Lokesh RAC
1/51
Click to add text
ORACLE RAC
Lokesh Aggarwal
7th Sept, 2011
-
7/31/2019 Lokesh RAC
2/51
2006 IBM Corporation
2006 IBM Corporation2
AGENDA
7th Se t 2011
Introduction to RAC
Availability
Manageability
Voting Disk Crash Scenarios
Global Cache Services ASM
Grid infrastructure
Patching and up gradation
ACFS
Patching in Data Guard
New features in 11gR2
RAC concepts
Eviction scenarios
By others
-
7/31/2019 Lokesh RAC
3/51
2006 IBM Corporation
2006 IBM Corporation3
INTRODUCTION
7th Se t 2011
-
7/31/2019 Lokesh RAC
4/51
2006 IBM Corporation
2006 IBM Corporation4
What is RAC ???
Multiple instances running on separate servers (nodes)
Single database on shared storage accessible to all nodes
Instances exchange information over an interconnect network
Node 1
Instance 1
Node 2
Instance 2Interconnect
SharedStorage
LocalDisk
LocalDisk
Node 1
Instance 1
Node 2
Instance 2Interconnect
SharedStorage
LocalDisk
LocalDisk
7th Se t 2011
http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042 -
7/31/2019 Lokesh RAC
5/51
2006 IBM Corporation
2006 IBM Corporation5
What is RAC Databases ???
Located on shared storage accessible by all instances
Includes
Control Files
Data Files
Online Redo Logs
Server Parameter File
May optionally include
Archived Redo Logs
Backups
Flashback Logs (Oracle 10.1 and above)
Change Tracking Writer files (Oracle 10.1 and above)
7th Se t 2011
-
7/31/2019 Lokesh RAC
6/51
2006 IBM Corporation
2006 IBM Corporation6
Contd
Contents similar to single instance database except
One redo thread per instance
ALTER DATABASE ADD LOGFILE THREAD 2GROUP 3 SIZE 51200K,GROUP 4 SIZE 51200K;
ALTER DATABASE ENABLE PUBLIC THREAD 2;
If using Automatic Undo Management also require one UNDOtablespace per instance
CREATE UNDO TABLESPACE "UNDOTBS2"
DATAFILE SIZE 25600K AUTOEXTEND ONMAXSIZE UNLIMITED EXTENT MANAGEMENTLOCAL;
Additional dynamic performance views (V$, GV$ but not X$)
created by$ORACLE_HOME/rdbms/admin/catclust.sql7th Se t 2011
-
7/31/2019 Lokesh RAC
7/51
2006 IBM Corporation
2006 IBM Corporation7
RAC Internal Structures andServices
Global Resource Directory (GRD) Records current state and owner of each resource
Contains convert and write queues
Distributed across all instances in cluster
Global Cache Services (GCS) Implements cache coherency for database
Coordinates access to database blocks for instances
Maintains GRD
Global Enqueue Services (GES) Controls access to other resources (locks) including
library cache
dictionary cache
7th Se t 2011
-
7/31/2019 Lokesh RAC
8/51
2006 IBM Corporation
2006 IBM Corporation8
Why Do Users Deploy RAC ???
Users may deploy RAC to achieve
Increasing availability
Increasing scalability
Improving maintainability
Reduction in total cost of ownership
7th Se t 2011
-
7/31/2019 Lokesh RAC
9/51
2006 IBM Corporation
2006 IBM Corporation9
Availability
7th Se t 2011
-
7/31/2019 Lokesh RAC
10/51
2006 IBM Corporation
2006 IBM Corporation10
What is Failover?
If one node or instance fails Node detecting failure will
Read redo log of failed instance from last checkpoint
Apply redo to datafiles including undo segments (rollforward)
Rollback uncommitted transactions
Cluster is frozen during part of this process
Node 1
Instance 1
Node 2
Instance 2Interconnect
SharedStorage
LocalDisk
LocalDisk
7th Se t 2011
-
7/31/2019 Lokesh RAC
11/51
2006 IBM Corporation
2006 IBM Corporation11
What are Database Services???
Database Services are logical groups of sessions
Can be configured using DBCA Enterprise Manager (10.2 and above)
Can also be configured using SRVCTL (Oracle Cluster Registry only) SQL*Plus (Data Dictionary only)
In Oracle 10.1 and above, each service has Preferred Nodes (used by default) Available Nodes (used if preferred node fails)
7th Se t 2011
-
7/31/2019 Lokesh RAC
12/51
2006 IBM Corporation
2006 IBM Corporation12
What is Oracle Clusterware???
Introduced in Oracle 10.1 (Cluster Ready Services - CRS) Renamed in Oracle 10.2 to Oracle Clusterware
Cluster Manager providing
Node membership services
Global resource management
High availability functions
On Linux
Configured in /etc/inittab
Implemented using three daemons
CRS - Cluster Ready Service
CSS - Cluster Synchronization Service
EVM - Event Manager
In Oracle 10.2 includes High Availability framework
Allows non-Oracle applications to be managed
7th Se t 2011
-
7/31/2019 Lokesh RAC
13/51
2006 IBM Corporation
2006 IBM Corporation13
What is the OCR???
Oracle Cluster Registry (OCR) Configuration information for Oracle Clusterware / CRS
Introduced in Oracle 10.1 Replaced Server Management (SRVM) disk/file
Similar to Windows Registry
Located on shared storage
In Oracle 10.2 and above can be mirrored Maximum two copies
7th Se t 2011
-
7/31/2019 Lokesh RAC
14/51
2006 IBM Corporation
2006 IBM Corporation14
What is a Voting Disk???
Known as Quorum Disk / File in Oracle 9i
Located on shared storage accessible to all instances
Used to determine RAC instance membership
In the event of node failure voting disk is used to determinewhich instance takes control of cluster
Avoids split brain
In Oracle 10.2 and above can be mirrored
Odd number of copies (1, 3, 5 etc)
7th Se t 2011
-
7/31/2019 Lokesh RAC
15/51
2006 IBM Corporation
2006 IBM Corporation15
What is VIP???
Node application introduced in Oracle 10.1
Allows Virtual IP address to be defined for each node
All applications connect using Virtual IP addresses
If node fails Virtual IP address is automatically relocated toanother node
Only applies to newly connecting sessions
7th Se t 2011
-
7/31/2019 Lokesh RAC
16/51
2006 IBM Corporation
2006 IBM Corporation16
VIP Failover ???
7th Se t 2011
(Static)
x.x.x.101
(Static)
x.x.x.102
(VIP)
x.x.x.201
(VIP)x.x.x.202
mydb = x.x.x.201
x.x.x.202
-
7/31/2019 Lokesh RAC
17/51
2006 IBM Corporation
2006 IBM Corporation17
VIP Failover ???
7th Se t 2011
(Static)
x.x.x.101
(Static)
x.x.x.102
(VIP)
x.x.x.201
(VIP)x.x.x.202
TCP Reset
mydb = x.x.x.201
x.x.x.202
-
7/31/2019 Lokesh RAC
18/51
2006 IBM Corporation
2006 IBM Corporation18
VIP Failover ???
7th Se t 2011
(Static)
x.x.x.101
(Static)
x.x.x.102
(VIP)
x.x.x.201
(VIP)x.x.x.202
mydb = x.x.x.201
x.x.x.202
-
7/31/2019 Lokesh RAC
19/51
2006 IBM Corporation
2006 IBM Corporation19
What is TAF???
TAF is Transparent Application Failover
Requires additional coding in client
Requires configuration in TNSNAMES.ORA
RAC_FAILOVER =(DESCRIPTION =
(ADDRESS_LIST =(FAILOVER = ON)(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521))
)(CONNECT_DATA =
(SERVICE_NAME = RAC)(SERVER = DEDICATED)(FAILOVER_MODE =(TYPE=SELECT)(METHOD=BASIC)(RETRIES=30)(DELAY=5))
))
7th Se t 2011
-
7/31/2019 Lokesh RAC
20/51
2006 IBM Corporation
2006 IBM Corporation20
Does RAC Increase Availability???
Depends on definition of availability
May achieve less unplanned downtime
May have more time to respond to failures
Instance failover means any node can fail without total loss ofservice
Must provide have overcapacity in cluster to survive failover
Additional Oracle and RAC licenses
Load can be distributed over all running nodes
Can use Grid to provision additional nodes
7th Se t 2011
-
7/31/2019 Lokesh RAC
21/51
2006 IBM Corporation
2006 IBM Corporation21
Contd
Can still get data corruptions Human errors / software errors
Only one logical copy of data
Only one logical copy of application / Oracle software
Lots of possibility for human errors
Power / network cabling / storage configuration
Upgrades and patches are more complex
Can upgrade software on subset of nodes
If database is affected then still need downtime
7th Se t 2011
-
7/31/2019 Lokesh RAC
22/51
2006 IBM Corporation
2006 IBM Corporation22
Manageability
7th Se t 2011
-
7/31/2019 Lokesh RAC
23/51
2006 IBM Corporation
2006 IBM Corporation23
Server Parameter File
Introduced in Oracle 9.0.1 Must reside on shared storage
Shared by all RAC instances
Binary (not text) files
Parameters can be changed using ALTER SYSTEM
Can be backed up using the Recovery Manager (RMAN)
Created using
init.ora file on each node must contain SPFILE parameter
CREATE SPFILE [ = SPFILE_NAME ]FROM PFILE [ = PFILE_NAME ];
SPFILE =
7th Se t 2011
-
7/31/2019 Lokesh RAC
24/51
2006 IBM Corporation
2006 IBM Corporation24
Parameters
Some parameters must be same on each instance including :
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
CLUSTER_DATABASE
CONTROL_FILES
DB_BLOCK_SIZE DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME TRACE_ENABLED
UNDO_MANAGEMENT
7th Se t 2011
-
7/31/2019 Lokesh RAC
25/51
2006 IBM Corporation
2006 IBM Corporation25
Some parameters, if used, must be different on each instanceincluding : THREAD
INSTANCE_NUMBER
INSTANCE_NAME
UNDO_TABLESPACE
ROLLBACK_SEGMENTS
Contd
7th Se t 2011
-
7/31/2019 Lokesh RAC
26/51
2006 IBM Corporation
2006 IBM Corporation26
DBCA
Can be used to Create RAC database and instances
Create ASM instance
Manage ASM instance (10.2)
Add RAC instances
Create RAC database
Create clone RAC database (10.2)
Create, Manage and Drop Services
Drop instances and database
7th Se t 2011
-
7/31/2019 Lokesh RAC
27/51
2006 IBM Corporation
2006 IBM Corporation27
What is SRVCTL?
Utility used to manage cluster database Configured in Oracle Cluster Registry (OCR)
Controls
Database
Instance
ASM Listener
Node Applications
Services
Options include
Start / Stop Enable / Disable
Add / Delete
Show current configuration
Show current status
7th Se t 2011
-
7/31/2019 Lokesh RAC
28/51
2006 IBM Corporation
2006 IBM Corporation28
SRVCTL - Examples
Starting and Stopping a Database
srvctl start database -d RACsrvctl stop database -d RAC
srvctl start instance -d RAC -i RAC1srvctl stop instance -d RAC -i RAC1
Starting and Stopping an Instance
Starting and Stopping a Service
srvctl start service -d RAC -s SERVICE1srvctl stop service -d RAC -s SERVICE1
Starting and Stopping ASM on a specified node
srvctl start asm -n node1srvctl stop asm -n node1
7th Se t 2011
-
7/31/2019 Lokesh RAC
29/51
2006 IBM Corporation
2006 IBM Corporation29
What is CLUVFY?
Introduced in Oracle 10.2
Supplied with Oracle Clusterware
Can be downloaded from OTN (Linux and Windows)
Written in Java - requires JRE (supplied)
Also works with 10.1 (specify -10gR1 option)
Checks cluster configuration stages - verifies all steps for specified stage have been
completed
components - verifies specified component has beencorrectly installed
7th Se t 2011
-
7/31/2019 Lokesh RAC
30/51
2006 IBM Corporation
2006 IBM Corporation30
CLUVFY
Stages include
-post hwos post check for hardware and operatingsystem
-pre cfs pre-check for CFS setup
-post cfs post-check for CFS setup
-pre crsinst pre-check for Oracle Clusterwareinstallation
-post crsinst post-check for Oracle Clusterware
installation-pre dbinst pre-check for database installation
-pre dbcfg pre-check for database configuration
7th Se t 2011
-
7/31/2019 Lokesh RAC
31/51
2006 IBM Corporation
2006 IBM Corporation31
contd..
Components include
nodereach Checks reachability between nodes
nodecon Checks node connectivity
cfs Checks CFS integrity
ssa Checks shared storage accessibility
space Checks space availability
sys Checks minimum system requirements
clu Checks cluster integrity
clumgr Checks cluster manager integrity
ocr Checks OCR integrity
crs Checks Oracle Clusterware (CRS) integrity
nodeapp Checks node applications exist
admprv Checks administrative privileges
peer Compares properties with peers
7th Se t 2011
-
7/31/2019 Lokesh RAC
32/51
2006 IBM Corporation
2006 IBM Corporation32
contd..
For example, to check configuration before installing OracleClusterware on node1 and node2 use:
sh runcluvfy.sh stage -pre crsinst -n node1,node2
Checks:
node reachability
user equivalence
administrative privileges
node connectivity shared stored accessibility
If any checks fail append -verboseto display more
information
7th Se t 2011
-
7/31/2019 Lokesh RAC
33/51
2006 IBM Corporation
2006 IBM Corporation33
Other Utilities
Additional RAC utilities and diagnostics include OCRCONFIG
OCRCHECK
OCRDUMP
CRSCTL
CRS_STAT
Additional RAC diagnostics can be obtained using
ORADEBUG utility
DUMP option LKDEBUG option
Events
7th Se t 2011
-
7/31/2019 Lokesh RAC
34/51
2006 IBM Corporation
2006 IBM Corporation34
Does RAC Improve Manageability?
Advantages Fewer databases to manage
Easier to monitor
Easier to upgrade
Easier to control resource allocation
Resources can be shared between applications
Disadvantages
Upgrades potentially more complex
Downtime may affect more applications Requires more experienced operational staff
Higher cost / harder to replace
7th Se t 2011
-
7/31/2019 Lokesh RAC
35/51
2006 IBM Corporation
2006 IBM Corporation35
Voting Disk Crash
Scenarios
7th Se t 2011
-
7/31/2019 Lokesh RAC
36/51
2006 IBM Corporation
2006 IBM Corporation36
contd
Losing one Voting DiskVoting disks are used in a RAC configuration for maintaining nodes
membership. They are critical pieces in a cluster configuration.
Starting with ORACLE 10gR2, it is possible to mirror the OCR and
the voting disks. Using the default mirroring template, the minimum
number of voting disks necessary for a normal functioning is two.
Scenario Setup
Identify Votings:
crsctl query css votedisk
/dev/raw/raw1
/dev/raw/raw2
/dev/raw/raw3
7th Se t 2011
-
7/31/2019 Lokesh RAC
37/51
2006 IBM Corporation
2006 IBM Corporation37
contd
corrupt one of the voting disks (as root):dd if=/dev/zero /dev/raw/raw3 bs=1M
Recoverability Steps
check the $CRS_HOME/log/[hostname]/alert[hostname].log file.The following message should be written there which allows us
to determine which voting disk became corrupted:
[cssd(9120)]CRS-1604:CSSD voting file is offline:
/opt/oracle/product/10.2.0/crs_1/Voting1. Details in/opt/oracle/product/10.2.0/crs_1/log/ractest2/cssd/ocssd.log.
7th Se t 2011
-
7/31/2019 Lokesh RAC
38/51
2006 IBM Corporation
2006 IBM Corporation38
contd
According to the above listing the Voting1 is the corrupted disk.
Shutdown the CRS stack:
srvctl stop database -d fitstest -o immediate
srvctl stop asm -n ractest1srvctl stop asm -n ractest2
srvctl stop nodeapps -n ractest1
srvctl stop nodeapps -n ractest2
crs_stat t
On every node as root:
crsctl stop crs
7th Se t 2011
-
7/31/2019 Lokesh RAC
39/51
2006 IBM Corporation
2006 IBM Corporation39
contd
Pick a good voting from the remaining ones and copy it over thecorrupted one:
dd if=/dev/raw/raw4 of=/dev/raw/raw3 bs=1M
Start CRS (on every node as root)::
crsctl start crs
Check log file $CRS_HOME/log/[hostname]/alert[hostname].log.It should look like shown below:
[cssd(14463)]CRS-1601:CSSD Reconfiguration complete. Active nodes areractest1 ractest1.
2007-05-31 15:19:53.954[crsd(14268)]CRS-1012:The OCR service started on node ractest1.
2007-05-31 15:19:53.987
[evmd(14228)]CRS-1401:EVMD started on node ractest1.
2007-05-31 15:19:55.861 [crsd(14268)]CRS-1201:CRSD started on noderactest1.
7th Se t 2011
-
7/31/2019 Lokesh RAC
40/51
2006 IBM Corporation
2006 IBM Corporation40
contd
After a couple of minutes check the status of the wholeCRS stack:
[oracle@ractest1 ~]$ crs_stat -t
7th Se t 2011
-
7/31/2019 Lokesh RAC
41/51
2006 IBM Corporation
2006 IBM Corporation41
GlobalCacheServices
7th Se t 2011
-
7/31/2019 Lokesh RAC
42/51
2006 IBM Corporation
2006 IBM Corporation42
Read with No Transfer
Instance 1
Instance 2
Instance 4
1318
Requestsharedresource
Instance 3
ResourceMaster
Instance 2 requestscurrent read onblock
Request
granted
SN
Readrequest
Blockreturned
1318
1
2
3
4
7th Se t 2011
-
7/31/2019 Lokesh RAC
43/51
2006 IBM Corporation
2006 IBM Corporation43
Read to Write Transfer
Instance 1
Instance 2
Instance 4
1318
Request
exclusiveresource
Instance 3
ResourceMaster
Instance 1 requestsexclusive read onblock
Transferblock toInstance 1 forexclusiveaccess
SNBlock andresourcestatus
Resourcestatus
1318
1
2
3
4
N
N
X
1320
7th Se t 2011
-
7/31/2019 Lokesh RAC
44/51
2006 IBM Corporation
2006 IBM Corporation44
Write to Write Transfer
Instance 1
Instance 2
Instance 4
1318
Instance 3
ResourceMaster
Instance 4 requestsexclusive read onblock
Transferblock toInstance 4in exclusivemode
SN
Block and resource status
Resourcestatus
1318
12
3
4
N NX
1320
N
N
X
1320 1323
Note that Instance 1 willcreate a past image (PI) ofthe dirty block
7th Se t 2011
-
7/31/2019 Lokesh RAC
45/51
2006 IBM Corporation
2006 IBM Corporation45
Past Images
When an instance passes a dirty block to another instance it Flushes redo buffer to redo log
Retains past image (PI) of block in buffer cache
PI is retained until another instance writes block to disk
Used to reduce recovery times
Recorded in V$BH.STATUS as PI
Based on X$BH.STATE (value 8 in Oracle 10.2)
7th Se t 2011
-
7/31/2019 Lokesh RAC
46/51
2006 IBM Corporation
2006 IBM Corporation46
contd..
13281329UPDATE t1SET c1 = 1324;COMMIT;
UPDATE t1SET c1 = 1329;COMMIT;
1323
Instance 1
13231324132513261327
Buffer Cache
13241323
13251324
13261325
13271326
1328
13281327
Redo Log 1
Instance 2
Buffer Cache
13291328
UPDATE t1SET c1 = 1325;COMMIT;
UPDATE t1SET c1 = 1326;COMMIT;
UPDATE t1SET c1 = 1327;COMMIT;
UPDATE t1SET c1 = 1328;COMMIT; 1328
1323
Redo Log 2
1323
132813291329
1329
1329
Assume table t1 contains asingle row in block 42
Instance 1 updates column to1324
Block 42 is read from diskUndo/Redo written to
Redo Log 1Block 42 is updated in buffer
cacheInstance 1 updates column to
1325Undo/Redo written to
Redo Log 1Block 42 is updated in buffer
cacheInstance 1 updates column to
1326Undo/Redo written to
Redo Log 1Block 42 is updated in buffer
cacheInstance 1 updates column to
1327Undo/Redo written to
Redo Log 1Block 42 is updated in buffer
cacheInstance 1 updates column to
1328Undo/Redo written to
Redo Log 1Block 42 is updated in buffer
cacheInstance 2 updates column to
1329GCS transfers block fromInstance 1 to Instance 2
Instance 1 makes block 42a Past Image block
Undo/redo written toRedo Log 2
Block 42 is updated in buffercache
Instance 2 CrashesContents of buffer cache are lostDBWR has not written changes
to block 42 back to disk yetInstance 1 must performrecovery for Instance 2
Block 42 needs recoveryInstance 1 uses Past ImageUndo/redo is applied from
Redo Log 2Block 42 is subsequently written
back to disk by DBWR
7th Se t 2011
-
7/31/2019 Lokesh RAC
47/51
2006 IBM Corporation
2006 IBM Corporation477th Se t 2011
-
7/31/2019 Lokesh RAC
48/51
2006 IBM Corporation
2006 IBM Corporation48
-
7/31/2019 Lokesh RAC
49/51
2006 IBM Corporation
2006 IBM Corporation49
-
7/31/2019 Lokesh RAC
50/51
2006 IBM Corporation
2006 IBM Corporation50
Customer Relationship Management(CRM)
What is the Interconnect???
Instances communicate with each other over the interconnect
(network)
Information transferred between instances includes
data blocks
locks
SCNs
Typically 1GB Ethernet
UDP protocol
Back
7th Se t 2011
-
7/31/2019 Lokesh RAC
51/51
Why Use Shared Storage ???
Mandatory for
Database files
Control files
Online redo logs
Server Parameter file (if used)
Optional for
Archived redo logs (recommended)
Executables (Binaries)
Password files
Parameter files
Network configuration files
Administrative directories
Alert Log
Dump FilesBack