lokesh rac

7/31/2019 Lokesh RAC

1/51

Click to add text

ORACLE RAC

Lokesh Aggarwal

7th Sept, 2011


2/51

2006 IBM Corporation

2006 IBM Corporation2

AGENDA

7th Se t 2011

Introduction to RAC

Availability

Manageability

Voting Disk Crash Scenarios

Global Cache Services ASM

Grid infrastructure

Patching and up gradation

ACFS

Patching in Data Guard

New features in 11gR2

RAC concepts

Eviction scenarios

By others


3/51



INTRODUCTION

7th Se t 2011


4/51



What is RAC ???

Multiple instances running on separate servers (nodes)

Single database on shared storage accessible to all nodes

Instances exchange information over an interconnect network

Node 1

Instance 1

Node 2

Instance 2Interconnect

SharedStorage

LocalDisk

LocalDisk

Node 1

Instance 1

Node 2


SharedStorage

LocalDisk

LocalDisk

7th Se t 2011
http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%209http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042http://localhost/var/www/apps/conversion/releases/20121022235203/tmp/scratch_2/#Slide%2042


5/51



What is RAC Databases ???

Located on shared storage accessible by all instances

Includes

Control Files

Data Files

Online Redo Logs

Server Parameter File

May optionally include

Archived Redo Logs

Backups

Flashback Logs (Oracle 10.1 and above)

Change Tracking Writer files (Oracle 10.1 and above)

7th Se t 2011


6/51



Contd

Contents similar to single instance database except

One redo thread per instance

ALTER DATABASE ADD LOGFILE THREAD 2GROUP 3 SIZE 51200K,GROUP 4 SIZE 51200K;

ALTER DATABASE ENABLE PUBLIC THREAD 2;

If using Automatic Undo Management also require one UNDOtablespace per instance

CREATE UNDO TABLESPACE "UNDOTBS2"

DATAFILE SIZE 25600K AUTOEXTEND ONMAXSIZE UNLIMITED EXTENT MANAGEMENTLOCAL;

Additional dynamic performance views (V$, GV$ but not X$)

created by$ORACLE_HOME/rdbms/admin/catclust.sql7th Se t 2011


7/51



RAC Internal Structures andServices

Global Resource Directory (GRD) Records current state and owner of each resource

Contains convert and write queues

Distributed across all instances in cluster

Global Cache Services (GCS) Implements cache coherency for database

Coordinates access to database blocks for instances

Maintains GRD

Global Enqueue Services (GES) Controls access to other resources (locks) including

library cache

dictionary cache

7th Se t 2011


8/51



Why Do Users Deploy RAC ???

Users may deploy RAC to achieve

Increasing availability

Increasing scalability

Improving maintainability

Reduction in total cost of ownership

7th Se t 2011


9/51



Availability

7th Se t 2011


10/51



What is Failover?

If one node or instance fails Node detecting failure will

Read redo log of failed instance from last checkpoint

Apply redo to datafiles including undo segments (rollforward)

Rollback uncommitted transactions

Cluster is frozen during part of this process

Node 1

Instance 1

Node 2


SharedStorage

LocalDisk

LocalDisk

7th Se t 2011


11/51



What are Database Services???

Database Services are logical groups of sessions

Can be configured using DBCA Enterprise Manager (10.2 and above)

Can also be configured using SRVCTL (Oracle Cluster Registry only) SQL*Plus (Data Dictionary only)

In Oracle 10.1 and above, each service has Preferred Nodes (used by default) Available Nodes (used if preferred node fails)

7th Se t 2011


12/51



What is Oracle Clusterware???

Introduced in Oracle 10.1 (Cluster Ready Services - CRS) Renamed in Oracle 10.2 to Oracle Clusterware

Cluster Manager providing

Node membership services

Global resource management

High availability functions

On Linux

Configured in /etc/inittab

Implemented using three daemons

CRS - Cluster Ready Service

CSS - Cluster Synchronization Service

EVM - Event Manager

In Oracle 10.2 includes High Availability framework

Allows non-Oracle applications to be managed

7th Se t 2011


13/51



What is the OCR???

Oracle Cluster Registry (OCR) Configuration information for Oracle Clusterware / CRS

Introduced in Oracle 10.1 Replaced Server Management (SRVM) disk/file

Similar to Windows Registry

Located on shared storage

In Oracle 10.2 and above can be mirrored Maximum two copies

7th Se t 2011


14/51



What is a Voting Disk???

Known as Quorum Disk / File in Oracle 9i

Located on shared storage accessible to all instances

Used to determine RAC instance membership

In the event of node failure voting disk is used to determinewhich instance takes control of cluster

Avoids split brain

In Oracle 10.2 and above can be mirrored

Odd number of copies (1, 3, 5 etc)

7th Se t 2011


15/51



What is VIP???

Node application introduced in Oracle 10.1

Allows Virtual IP address to be defined for each node

All applications connect using Virtual IP addresses

If node fails Virtual IP address is automatically relocated toanother node

Only applies to newly connecting sessions

7th Se t 2011


16/51



VIP Failover ???

7th Se t 2011

(Static)

x.x.x.101

(Static)

x.x.x.102

(VIP)

x.x.x.201

(VIP)x.x.x.202

mydb = x.x.x.201

x.x.x.202


17/51



VIP Failover ???

7th Se t 2011

(Static)

x.x.x.101

(Static)

x.x.x.102

(VIP)

x.x.x.201

(VIP)x.x.x.202

TCP Reset

mydb = x.x.x.201

x.x.x.202


18/51



VIP Failover ???

7th Se t 2011

(Static)

x.x.x.101

(Static)

x.x.x.102

(VIP)

x.x.x.201

(VIP)x.x.x.202

mydb = x.x.x.201

x.x.x.202


19/51



What is TAF???

TAF is Transparent Application Failover

Requires additional coding in client

Requires configuration in TNSNAMES.ORA

RAC_FAILOVER =(DESCRIPTION =

(ADDRESS_LIST =(FAILOVER = ON)(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = node2)(PORT = 1521))

)(CONNECT_DATA =

(SERVICE_NAME = RAC)(SERVER = DEDICATED)(FAILOVER_MODE =(TYPE=SELECT)(METHOD=BASIC)(RETRIES=30)(DELAY=5))

))

7th Se t 2011


20/51



Does RAC Increase Availability???

Depends on definition of availability

May achieve less unplanned downtime

May have more time to respond to failures

Instance failover means any node can fail without total loss ofservice

Must provide have overcapacity in cluster to survive failover

Additional Oracle and RAC licenses

Load can be distributed over all running nodes

Can use Grid to provision additional nodes

7th Se t 2011


21/51



Contd

Can still get data corruptions Human errors / software errors

Only one logical copy of data

Only one logical copy of application / Oracle software

Lots of possibility for human errors

Power / network cabling / storage configuration

Upgrades and patches are more complex

Can upgrade software on subset of nodes

If database is affected then still need downtime

7th Se t 2011


22/51



Manageability

7th Se t 2011


23/51



Server Parameter File

Introduced in Oracle 9.0.1 Must reside on shared storage

Shared by all RAC instances

Binary (not text) files

Parameters can be changed using ALTER SYSTEM

Can be backed up using the Recovery Manager (RMAN)

Created using

init.ora file on each node must contain SPFILE parameter

CREATE SPFILE [ = SPFILE_NAME ]FROM PFILE [ = PFILE_NAME ];

SPFILE =

7th Se t 2011


24/51



Parameters

Some parameters must be same on each instance including :

ACTIVE_INSTANCE_COUNT

ARCHIVE_LAG_TARGET

CLUSTER_DATABASE

CONTROL_FILES

DB_BLOCK_SIZE DB_DOMAIN

DB_FILES

DB_NAME

DB_RECOVERY_FILE_DEST

DB_RECOVERY_FILE_DEST_SIZE

DB_UNIQUE_NAME TRACE_ENABLED

UNDO_MANAGEMENT

7th Se t 2011


25/51



Some parameters, if used, must be different on each instanceincluding : THREAD

INSTANCE_NUMBER

INSTANCE_NAME

UNDO_TABLESPACE

ROLLBACK_SEGMENTS

Contd

7th Se t 2011


26/51



DBCA

Can be used to Create RAC database and instances

Create ASM instance

Manage ASM instance (10.2)

Add RAC instances

Create RAC database

Create clone RAC database (10.2)

Create, Manage and Drop Services

Drop instances and database

7th Se t 2011


27/51



What is SRVCTL?

Utility used to manage cluster database Configured in Oracle Cluster Registry (OCR)

Controls

Database

Instance

ASM Listener

Node Applications

Services

Options include

Start / Stop Enable / Disable

Add / Delete

Show current configuration

Show current status

7th Se t 2011


28/51



SRVCTL - Examples

Starting and Stopping a Database

srvctl start database -d RACsrvctl stop database -d RAC

srvctl start instance -d RAC -i RAC1srvctl stop instance -d RAC -i RAC1

Starting and Stopping an Instance

Starting and Stopping a Service

srvctl start service -d RAC -s SERVICE1srvctl stop service -d RAC -s SERVICE1

Starting and Stopping ASM on a specified node

srvctl start asm -n node1srvctl stop asm -n node1

7th Se t 2011


29/51



What is CLUVFY?

Introduced in Oracle 10.2

Supplied with Oracle Clusterware

Can be downloaded from OTN (Linux and Windows)

Written in Java - requires JRE (supplied)

Also works with 10.1 (specify -10gR1 option)

Checks cluster configuration stages - verifies all steps for specified stage have been

completed

components - verifies specified component has beencorrectly installed

7th Se t 2011


30/51



CLUVFY

Stages include

-post hwos post check for hardware and operatingsystem

-pre cfs pre-check for CFS setup

-post cfs post-check for CFS setup

-pre crsinst pre-check for Oracle Clusterwareinstallation

-post crsinst post-check for Oracle Clusterware

installation-pre dbinst pre-check for database installation

-pre dbcfg pre-check for database configuration

7th Se t 2011


31/51



contd..

Components include

nodereach Checks reachability between nodes

nodecon Checks node connectivity

cfs Checks CFS integrity

ssa Checks shared storage accessibility

space Checks space availability

sys Checks minimum system requirements

clu Checks cluster integrity

clumgr Checks cluster manager integrity

ocr Checks OCR integrity

crs Checks Oracle Clusterware (CRS) integrity

nodeapp Checks node applications exist

admprv Checks administrative privileges

peer Compares properties with peers

7th Se t 2011


32/51



contd..

For example, to check configuration before installing OracleClusterware on node1 and node2 use:

sh runcluvfy.sh stage -pre crsinst -n node1,node2

Checks:

node reachability

user equivalence

administrative privileges

node connectivity shared stored accessibility

If any checks fail append -verboseto display more

information

7th Se t 2011


33/51



Other Utilities

Additional RAC utilities and diagnostics include OCRCONFIG

OCRCHECK

OCRDUMP

CRSCTL

CRS_STAT

Additional RAC diagnostics can be obtained using

ORADEBUG utility

DUMP option LKDEBUG option

Events

7th Se t 2011


34/51



Does RAC Improve Manageability?

Advantages Fewer databases to manage

Easier to monitor

Easier to upgrade

Easier to control resource allocation

Resources can be shared between applications

Disadvantages

Upgrades potentially more complex

Downtime may affect more applications Requires more experienced operational staff

Higher cost / harder to replace

7th Se t 2011


35/51



Voting Disk Crash

Scenarios

7th Se t 2011


36/51



contd

Losing one Voting DiskVoting disks are used in a RAC configuration for maintaining nodes

membership. They are critical pieces in a cluster configuration.

Starting with ORACLE 10gR2, it is possible to mirror the OCR and

the voting disks. Using the default mirroring template, the minimum

number of voting disks necessary for a normal functioning is two.

Scenario Setup

Identify Votings:

crsctl query css votedisk

/dev/raw/raw1

/dev/raw/raw2

/dev/raw/raw3

7th Se t 2011


37/51



contd

corrupt one of the voting disks (as root):dd if=/dev/zero /dev/raw/raw3 bs=1M

Recoverability Steps

check the $CRS_HOME/log/[hostname]/alert[hostname].log file.The following message should be written there which allows us

to determine which voting disk became corrupted:

[cssd(9120)]CRS-1604:CSSD voting file is offline:

/opt/oracle/product/10.2.0/crs_1/Voting1. Details in/opt/oracle/product/10.2.0/crs_1/log/ractest2/cssd/ocssd.log.

7th Se t 2011


38/51



contd

According to the above listing the Voting1 is the corrupted disk.

Shutdown the CRS stack:

srvctl stop database -d fitstest -o immediate

srvctl stop asm -n ractest1srvctl stop asm -n ractest2

srvctl stop nodeapps -n ractest1

srvctl stop nodeapps -n ractest2

crs_stat t

On every node as root:

crsctl stop crs

7th Se t 2011


39/51



contd

Pick a good voting from the remaining ones and copy it over thecorrupted one:

dd if=/dev/raw/raw4 of=/dev/raw/raw3 bs=1M

Start CRS (on every node as root)::

crsctl start crs

Check log file $CRS_HOME/log/[hostname]/alert[hostname].log.It should look like shown below:

[cssd(14463)]CRS-1601:CSSD Reconfiguration complete. Active nodes areractest1 ractest1.

2007-05-31 15:19:53.954[crsd(14268)]CRS-1012:The OCR service started on node ractest1.

2007-05-31 15:19:53.987

[evmd(14228)]CRS-1401:EVMD started on node ractest1.

2007-05-31 15:19:55.861 [crsd(14268)]CRS-1201:CRSD started on noderactest1.

7th Se t 2011


40/51



contd

After a couple of minutes check the status of the wholeCRS stack:

[oracle@ractest1 ~]$ crs_stat -t

7th Se t 2011


41/51



GlobalCacheServices

7th Se t 2011


42/51



Read with No Transfer

Instance 1

Instance 2

Instance 4

1318

Requestsharedresource

Instance 3

ResourceMaster

Instance 2 requestscurrent read onblock

Request

granted

SN

Readrequest

Blockreturned

1318

1

2

3

4

7th Se t 2011


43/51



Read to Write Transfer

Instance 1

Instance 2

Instance 4

1318

Request

exclusiveresource

Instance 3

ResourceMaster

Instance 1 requestsexclusive read onblock

Transferblock toInstance 1 forexclusiveaccess

SNBlock andresourcestatus

Resourcestatus

1318

1

2

3

4

N

N

X

1320

7th Se t 2011


44/51



Write to Write Transfer

Instance 1

Instance 2

Instance 4

1318

Instance 3

ResourceMaster

Instance 4 requestsexclusive read onblock

Transferblock toInstance 4in exclusivemode

SN

Block and resource status

Resourcestatus

1318

12

3

4

N NX

1320

N

N

X

1320 1323

Note that Instance 1 willcreate a past image (PI) ofthe dirty block

7th Se t 2011


45/51



Past Images

When an instance passes a dirty block to another instance it Flushes redo buffer to redo log

Retains past image (PI) of block in buffer cache

PI is retained until another instance writes block to disk

Used to reduce recovery times

Recorded in V$BH.STATUS as PI

Based on X$BH.STATE (value 8 in Oracle 10.2)

7th Se t 2011


46/51



contd..

13281329UPDATE t1SET c1 = 1324;COMMIT;

UPDATE t1SET c1 = 1329;COMMIT;

1323

Instance 1

13231324132513261327

Buffer Cache

13241323

13251324

13261325

13271326

1328

13281327

Redo Log 1

Instance 2

Buffer Cache

13291328




UPDATE t1SET c1 = 1328;COMMIT; 1328

1323

Redo Log 2

1323

132813291329

1329

1329

Assume table t1 contains asingle row in block 42

Instance 1 updates column to1324

Block 42 is read from diskUndo/Redo written to

Redo Log 1Block 42 is updated in buffer

cacheInstance 1 updates column to

1325Undo/Redo written to












1329GCS transfers block fromInstance 1 to Instance 2

Instance 1 makes block 42a Past Image block

Undo/redo written toRedo Log 2

Block 42 is updated in buffercache

Instance 2 CrashesContents of buffer cache are lostDBWR has not written changes

to block 42 back to disk yetInstance 1 must performrecovery for Instance 2

Block 42 needs recoveryInstance 1 uses Past ImageUndo/redo is applied from

Redo Log 2Block 42 is subsequently written

back to disk by DBWR

7th Se t 2011


47/51


2006 IBM Corporation477th Se t 2011


48/51




49/51




50/51



Customer Relationship Management(CRM)

What is the Interconnect???

Instances communicate with each other over the interconnect

(network)

Information transferred between instances includes

data blocks

locks

SCNs

Typically 1GB Ethernet

UDP protocol

Back

7th Se t 2011


51/51

Why Use Shared Storage ???

Mandatory for

Database files

Control files

Online redo logs

Server Parameter file (if used)

Optional for

Archived redo logs (recommended)

Executables (Binaries)

Password files

Parameter files

Network configuration files

Administrative directories

Alert Log

Dump FilesBack

lokesh rac

Documents