aix hacmp操作笔记.docx

31
AIX HACMP 操操操操 201101 操操 tomroom.cublog.cn 作作 tomroom 作作作 <!--[if !supportLists]-->作 <!--[endif]-->作作作作 AIX HA 作 <!--[if !supportLists]-->作 <!--[endif]-->操操 HACMP 操操 <!--[if !supportLists]-->作 <!--[endif]-->作作 HACMP 作作 <!--[if !supportLists]-->作 <!--[endif]-->hacmp 作 log 作作作作 <!--[if !supportLists]-->作 <!--[endif]-->作作作作作 HA 作 AIX 作作作作 <!--[if !supportLists]-->作 <!--[endif]-->作作 hacmp 作作作作 <!--[if !supportLists]-->作 <!--[endif]-->作作作作作作作作作作作作作作 <!--[if !supportLists]-->作 <!--[endif]-->AIX ha move res作作作 作作作 <!--[if !supportLists]-->作 <!--[endif]-->Bring a Resource Group OnlineBring a Resource Group Offline 操 Move a Resource Group to Another Node / Site作作 作作作作 AIX OS 6.1 作作作作作作作作作 <!--[if !supportLists]-->作 <!--[endif]-->作作作作 AIX HA 作 # lssrc -s clstrmgrES

Upload: silanca

Post on 28-Oct-2015

55 views

Category:

Documents


7 download

DESCRIPTION

AIX HACMP操作笔记.docx

TRANSCRIPT

Page 1: AIX HACMP操作笔记.docx

AIX HACMP 操作笔记 201101 分享 tomroom.cublog.cn                                作者:tomroom 环保男 

<!--[if !supportLists]-->  <!--[endif]-->如何检查AIX HA 子系统的状态方法

<!--[if !supportLists]-->  <!--[endif]-->启动 HACMP 方法

<!--[if !supportLists]-->  <!--[endif]-->关闭HACMP方法

<!--[if !supportLists]-->  <!--[endif]-->hacmp的 log文件位置

<!--[if !supportLists]-->  <!--[endif]-->操作两台做HA 的AIX操作建议

<!--[if !supportLists]-->  <!--[endif]-->查看 hacmp 状态方法

<!--[if !supportLists]-->  <!--[endif]-->判断资源组在那个服务器上方法

<!--[if !supportLists]-->  <!--[endif]-->AIX ha move 一个 res去另外一个节点操作

<!--[if !supportLists]-->  <!--[endif]-->Bring a Resource Group Online, Bring a Resource Group

Offline 和 Move a Resource Group to Another Node / Site区别:

 

操作环境 AIX OS 6.1  

 

下面是相关详细内容

<!--[if !supportLists]-->  <!--[endif]-->如何检查AIX HA 子系统的状态方法

# lssrc -s clstrmgrES

Subsystem Group PID Status

clstrmgrES cluster 4326122 active

Page 2: AIX HACMP操作笔记.docx

# set -o vi

# lssrc -ls clstrmgrES

Current state: ST_STABLE

sccsid = "@(#)36 1.135.5.2 src/43haes/usr/sbin/cluster/hacmprd/main.C, hacmp.pe, 53haes_r550, 0934B_hacmp550 8/8/09 14:48:23"

i_local_nodeid 0, i_local_siteid -1, my_handle 1

ml_idx[1]=0

There are 0 events on the Ibcast queue

There are 0 events on the RM Ibcast queue

CLversion: 10

local node vrmf is 5506

cluster fix level is "0"

The following timer(s) are currently active:

Current DNP values

DNP Values for NodeId - 0 NodeName - sc1prrhas01

PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

DNP Values for NodeId - 0 NodeName - sc1prrhas02

PgSpFree = 0 PvPctBusy = 0 PctTotalTimeIdle = 0.000000

Current state: ST_STABLE 表明 CLUSTER 已经正常

Current state: ST_BARRIER 表明 正在启动操作中

Current state: ST_INIT 两边 cluster 都停止的时候,虽然 subsystem

Page 3: AIX HACMP操作笔记.docx

启动但是状态是这个

 

<!--[if !supportLists]-->  <!--[endif]-->启动 HACMP 方法

smit clstart 启动 HA

 

<!--[if !supportLists]-->  <!--[endif]-->关闭HACMP方法,smitty clstop 停止Cluster 操作(要逐台操作,比如

P1,P2两台AIX做 cluster要确认等 P1的HACMP关闭之后才能,去关闭 P2的HACMP)

 

<!--[if !supportLists]-->  <!--[endif]-->hacmp的 log文件位置

# pwd

/var/hacmp/log

# ls hacmp.out

hacmp.out

 

该目录下如下是系统自动备份的文件

hacmp.out.1

hacmp.out.2

hacmp.out.3

hacmp.out.4

hacmp.out.5

hacmp.out.6

Page 4: AIX HACMP操作笔记.docx

hacmp.out.7

<!--[if !supportLists]-->  <!--[endif]-->操作两台做HA 的AIX操作建议:推荐开 4个窗口 2个用 tail –f 命令查看

hacmp log 另外 2个窗口执行命令 

 

<!--[if !supportLists]-->  <!--[endif]-->查看 hacmp 状态方法 运行 smit hacmp选择

 

Problem Determination Tools

View Current State

HACMP for AIX

 

Move cursor to desired item and press Enter.

 

Initialization and Standard Configuration

Extended Configuration

System Management (C-SPOC)

Problem Determination Tools

 

Problem Determination Tools

 

Move cursor to desired item and press Enter.

 

Page 5: AIX HACMP操作笔记.docx

HACMP Verification

View Current State

HACMP Log Viewing and Management

Recover From HACMP Script Failure

Restore HACMP Configuration Database from Active Configurat

Release Locks Set By Dynamic Reconfiguration

Clear SSA Disk Fence Registers

HACMP Cluster Test Tool

HACMP Trace Facility

HACMP Error Notification

Manage RSCT Services

 

Open a SMIT Session on a Node 

COMMAND STATUS

 

Command: OK stdout: yes stderr: no

 

Before command completion, additional instructions may appear below.

 

[TOP]

 

Page 6: AIX HACMP操作笔记.docx

Obtaining information via SNMP from Node: sc1prrhas02...

 

_____________________________________________________________________________

Cluster Name: SAP_p1p2_cluster

Cluster State: UP

Cluster Substate: STABLE

_____________________________________________________________________________

 

Node Name: sc1prrhas01 State: UP

 

Network Name: net_diskhb_01 State: UP

 

Address: Label: heartbeatp1 State: UP

 

Network Name: net_ether_01 State: UP

 

Address: 100.0.0.1 Label: sc1prrhas01_boot1 State: UP 

<!--[if !supportLists]-->  <!--[endif]-->判断资源组在那个服务器上方法,如下黑体 sc1prrhas02 ONLINE 表明

res 在服务 P2 上

COMMAND STATUS

 

Page 7: AIX HACMP操作笔记.docx

Command: OK stdout: yes stderr: no

 

Before command completion, additional instructions may appear below.

 

[MORE...49]

Fallover Policy: Fallover To Next Priority Node In The List

Fallback Policy: Never Fallback

Site Policy: ignore

Node Group State

---------------------------- ---------------

sc1prrhas01 OFFLINE

sc1prrhas02 ONLINE

 

Resource Group Name: sc1prrvglob_res

Startup Policy: Online On Home Node Only

Fallover Policy: Fallover To Next Priority Node In The List

Fallback Policy: Never Fallback

Site Policy: ignore

Node Group State

---------------------------- ---------------

sc1prrhas01 OFFLINE

sc1prrhas02 ONLINE

Page 8: AIX HACMP操作笔记.docx

 

<!--[if !supportLists]-->  <!--[endif]-->AIX HA里使用的BOOT IP ,下面是公司其中一台AIX系统定义的BOOT IP

COMMAND STATUS

 

Command: OK stdout: yes stderr: no

 

Before command completion, additional instructions may appear belo

 

[MORE...16]

 

Address: 100.0.0.1 Label: sc1prrhas01_boot1 State: UP

Address: 100.0.1.1 Label: sc1prrhas01_boot2 State: UP

Address: 192.168.51.120 Label: sc1prrvdbin State: UP

Address: 192.168.51.121 Label: sc1prrvascs State: UP

Address: 192.168.51.122 Label: sc1prrvglob State: UP

 

Node Name: sc1prrhas02 State: DOWN

 

Network Name: net_diskhb_01 State: DOWN

 

Network Name: net_ether_01 State: DOWN

Page 9: AIX HACMP操作笔记.docx

Address: 100.0.0.2 Label: sc1prrhas02_boot1 State: DOWN

Address: 100.0.1.2 Label: sc1prrhas02_boot2 State: DOWN

上面Address: 100.0.0.1 Label: sc1prrhas01_boot1 就是 P1服务器

boot IP 1

<!--[if !vml]-->

<!--[endif]--> 

公司

P1,P2两台AIX做HA其中 3个VIRTUAL Hosts 在 ha里对应 3个 res

group,如下图里 ECC Virtual Hosts

<!--[if !vml]-->

<!--[endif]--> 

Page 10: AIX HACMP操作笔记.docx

COMMAND STATUS

 

Command: OK stdout: yes stderr: no

 

Before command completion, additional instructions may appear below.

 

[MORE...32]

Address: 100.0.1.2 Label: sc1prrhas02_boot2 State: UP

Cluster Name: SAP_p1p2_cluster

Resource Group Name: sc1prrvdbin_res

Startup Policy: Online On Home Node Only

Fallover Policy: Fallover To Next Priority Node In The List

Fallback Policy: Never Fallback

Site Policy: ignore

Node Group State

---------------------------- ---------------

sc1prrhas01 ONLINE

sc1prrhas02 OFFLINE

Resource Group Name: sc1prrvascs_res

Page 11: AIX HACMP操作笔记.docx

Startup Policy: Online On Home Node Only

Fallover Policy: Fallover To Next Priority Node In The List

COMMAND STATUS

 

Command: OK stdout: yes stderr: no

 

Before command completion, additional instructions may appear below.

 

[MORE...48]

Startup Policy: Online On Home Node Only

Fallover Policy: Fallover To Next Priority Node In The List

Fallback Policy: Never Fallback

Site Policy: ignore

Node Group State

---------------------------- ---------------

sc1prrhas01 ONLINE

sc1prrhas02 OFFLINE

 

Resource Group Name: sc1prrvglob_res

Startup Policy: Online On Home Node Only

Fallover Policy: Fallover To Next Priority Node In The List

Fallback Policy: Never Fallback

Page 12: AIX HACMP操作笔记.docx

Site Policy: ignore

Node Group State

---------------------------- ---------------

sc1prrhas01 ONLINE

sc1prrhas02 OFFLINE

P1,P2上 3个 res ,如果只切换其中一个 res 可以在上面按 res查看 那个节点

该 res状态 是 online还是 offline

 

<!--[if !supportLists]-->  <!--[endif]-->HA的验证 smit hacmp

HACMP Verification 推荐在 ha 停下的时候运行,验证 ha

Problem Determination Tools

 

Move cursor to desired item and press Enter.

 

HACMP Verification

View Current State

HACMP Log Viewing and Management

Recover From HACMP Script Failure

Restore HACMP Configuration Database from Active Configuration

Release Locks Set By Dynamic Reconfiguration

Clear SSA Disk Fence Registers

Page 13: AIX HACMP操作笔记.docx

HACMP Cluster Test Tool

HACMP Trace Facility

HACMP Error Notification

Manage RSCT Services

 

Open a SMIT Session on a Node 

<!--[if !supportLists]-->  <!--[endif]-->在HA切换一半报错时候可以运行Recover From HACMP Script Failure让

ha强制忽略报错地方,跳过继续执行。

Problem Determination Tools

 

Move cursor to desired item and press Enter.

 

HACMP Verification

View Current State

HACMP Log Viewing and Management

Recover From HACMP Script Failure

Restore HACMP Configuration Database from Active Configuration

Release Locks Set By Dynamic Reconfiguration

Clear SSA Disk Fence Registers

HACMP Cluster Test Tool

HACMP Trace Facility

Page 14: AIX HACMP操作笔记.docx

HACMP Error Notification

Manage RSCT Services

Open a SMIT Session on a Node 

 

<!--[if !supportLists]-->  <!--[endif]-->AIX ha move 一个 res去另外一个节点(ha之前都已经配置并测试好)

运行 smit hacmp选菜单:

System Management (C-SPOC)

HACMP Resource Group and Application Management

Move a Resource Group to Another Node / Site

Move Resource Groups to Another Node

再选中 res 进行操作

上面move res操作不限制,在任何 cluter上的节点上都能操作(需要 team

人沟通同时只有一个人进行该操作,避免互相干扰)

HACMP for AIX

 

Move cursor to desired item and press Enter.

 

Initialization and Standard Configuration

Extended Configuration

Page 15: AIX HACMP操作笔记.docx

System Management (C-SPOC)

Problem Determination Tools

 

System Management (C-SPOC)

 

Move cursor to desired item and press Enter.

 

Manage HACMP Services

HACMP Communication Interface Management

HACMP Resource Group and Application Management

HACMP Log Viewing and Management

HACMP File Collection Management

HACMP Security and Users Management

HACMP Logical Volume Management

HACMP Concurrent Logical Volume Management

HACMP Physical Volume Management

Configure GPFS

 

Open a SMIT Session on a Node 

HACMP Resource Group and Application Management

Page 16: AIX HACMP操作笔记.docx

 

Move cursor to desired item and press Enter.

 

Show the Current State of Applications and Resource Groups

Bring a Resource Group Online

Bring a Resource Group Offline

Move a Resource Group to Another Node / Site

Suspend/Resume Application Monitoring

Application Availability Analysis 

Move a Resource Group to Another Node / Site

 

Move cursor to desired item and press Enter.

 

Move Resource Groups to Another Node

Move Resource Groups to Another Site

下面选择一个其中 res ,选择之后按执行如下 会让选择 Select a

Destination Node由于公司 P1,P2 HA 就两个节点,当下选择另外一个服务

节点名 sc1prrhas02没有其他选项,后面画面,显示 res名和节点名,再按确

认才开始运行

Move a Resource Group to Another Node / Site

Page 17: AIX HACMP操作笔记.docx

 

Move cursor to desired item and press Enter.

 

Move Resource Groups to Another Node

Move Resource Groups to Another Site

+--------------------------------------------------------------------------+

| Select a Destination Node |

| |

| Move cursor to desired item and press Enter. |

| |

| # *Denotes Originally Configured Highest Priority Node |

| sc1prrhas02 |

| |

| F1=Help F2=Refresh F3=Cancel |

| F8=Image F10=Exit Enter=Do |

F1=H| /=Find n=Find Next |

F9=S+--------------------------------------------------------------------------+

运行完成之后如下显示,按翻页键会 log里显示这个 cluster里所有 res的当

Page 18: AIX HACMP操作笔记.docx

前在服务器上状态

COMMAND STATUS

 

Command: OK stdout: yes stderr: no

 

Before command completion, additional instructions may appear below.

 

[TOP]

Attempting to move resource group sc1prrvascs_res to node sc1prrhas02.

 

Waiting for the cluster to process the resource group movement request....

 

Waiting for the cluster to stabilize.................

 

Resource group movement successful.

Resource group sc1prrvascs_res is online on node sc1prrhas02.

 

 

 

Cluster Name: SAP_p1p2_cluster

Page 19: AIX HACMP操作笔记.docx

 

Resource Group Name: sc1prrvdbin_res

Node State

---------------------------- ---------------

sc1prrhas01 OFFLINE

sc1prrhas02 ONLINE

[MORE...13] 

<!--[if !supportLists]-->  <!--[endif]-->公司HA软件切换的脚本所在的路径/etc/hacmp下

# pwd

/etc/hacmp

# ls

startsc1prrvascs.sh startsc1prrvglob.sh stopsc1prrvdbin.sh

startsc1prrvdbin.sh stopsc1prrvascs.sh stopsc1prrvglob.sh 

 

<!--[if !supportLists]-->  <!--[endif]-->Bring a Resource Group Online, Bring a Resource Group

Offline 和 Move a Resource Group to Another Node / Site区别:

move res总在 ha的一个 node里 online 而Bring 一个 res offline吧 res 在

所有节点上 offline,比如 res是 vg的话,bring res offline后 res里指定 vg

在所有 ha中服务器节点都 varryoff不可用

 

Page 20: AIX HACMP操作笔记.docx

HACMP Resource Group and Application Management

 

Move cursor to desired item and press Enter.

 

Show the Current State of Applications and Resource Groups

Bring a Resource Group Online

Bring a Resource Group Offline

Move a Resource Group to Another Node / Site

Suspend/Resume Application Monitoring

Application Availability Analysis

上面的 

Suspend/Resume Application Monitoring

Application Availability Analysis

公司没有使用这个功能