2009 하반기 sma 세미나 hacmp best practices 제대로작동하지않는일반적인이유:...

2009 하반기 SMA 세미나

HACMP Best Practices

2009. 10. 15.백진훈 (jhb@kr.ibm.com)MTS, GTS, IBM Korea

Contents 1. HACMP 개요

2. 요소별 고려사항

3. 실 고객 구성 사례

4. HACMP with PowerVM

5. Do’s and Don’ts

6. 첨부

PowerHA Message

• HACMP is now PowerHA for AIX

Renamed as part of Power software initiativePublications and product (binaries) continue to use HACMP

개요 및 문제점

HACMP : • 1991년에 출시, 최신 5.5까지 20번째 release

• 전세계적으로 production 환경에서 60,000 이상의 Reference

• AIX 기반의 강력하고 안정적인 고 가용성 Product

• 다양한 형태의 유연한 구성 지원

• 구성단계에서 검증 절차도 통과되어 작동되고, 운영 중이기는 하지만 실제 고 가용성을 제공해야 한다는 측면에서 볼 때는 최적의 구성이 아닌 경우가 있음

• 높은 수준의 가용성을 확보하기 위해서 고려되어야 할 점들은 무엇인가?

Why do good clusters turn bad?

HACMP가 제대로 작동하지 않는 일반적인 이유 :• 잘못된 cluster design과 철저하지 못한 planning

- design, planning, test

• 기본 TCP/IP and LVM 구성 문제

• HACMP cluster topology and resource 구성 문제

• Cluster를 운영하는데 있어 변경관리 규율(통제) 부재

• Cluster 관리자에 대한 교육 부족

• Performance/capacity 문제

Designing High Availability

• “…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs) through appropriate design, planning, selection of hardware, configuration of software, and carefully controlled change management discipline.…”

Fault resilience vs. Fault tolerance

High Availability does not mean no interruption to the application thus we say,

fault resilient instead of tolerant.

Eliminating single points of failure

Cluster Object Eliminated as a single point of failure by . . .

Node Using multiple nodes

Power Source Using multiple circuits or uninterruptible power supplies

Network adapterNetwork

Using redundant network adaptersUsing multiple networks to connect nodes

TCP/IP Subsystem Using non-IP networks to connect adjoining nodes and clients

Disk adapterDisk

Using redundant disk adapter or multipath hardwareUsing multiple disks with mirroring or raid

Application Add node for takeover; configure application monitor

Administrator Add backup or very detailed operations guide

Site Add additional site

• “…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs).…”

6. 첨부

HACMP Cluster Configuration – Step by Step

RG1 (NodeA, NodeB)

NODE A NODE B

StorageSubsystem

RG2 (NodeB)

Service IPVolume GroupDevelopment App

Topology Components:• Cluster name• Node Names• IP Network

• Interfaces• Serial Network

Types of Resources• Service IP• Volume Group/s• Application Server

Resource Components:• Resource Group/s

• Policies- startup- fallover- fallback

• Dependencies• Parent / Child• Location

Cluster Name: Cluster 1

Network: net_ether0

Network: rs232_net

Network: diskhb_net

Service IPVolume GroupProduction App

HACMP Location Dependencies

RG1 (NodeA, NodeB)

NODE A NODE B

StorageSubsystem

RG2 (NodeB)

Service IPVolume GroupDevelopment App

Network: net_ether0

Network: rs232_net

Network: diskhb_net

Location Dependencies:- RGs can coexist on same node- RGs can coexist on different nodes

- Can also set Priorities:- High- Intermediate- Low

Diagram Assumptions:RG1 – High PriorityRG2 – Low Priority

On Fallover:- RG2 Offline- RG1 will move to Node B

HACMP Fallover Scenarios Mutual Fallover

RG1 (NodeA, NodeB)

NODE A NODE B

StorageSubsystem

RG2 (NodeB, NodeA)

Service IPVolume GroupApplication 2

Network: net_ether0

Network: rs232_net

Network: diskhb_net

Environment:- Each Machine is running its own

Production application- Node A fails to Node B- Node B fails to Node A

Fallover Behavior:- On fallover the target machine will

need enough CPU & memory resources in order to handle the load of both applications

RG2 (NodeB, NodeA)

RG1 (NodeA, NodeB)

ONLINE

Resource Group Tips

RG Decisions beyond: Startup Fallover & Fallback behavior

RG1 (NodeA, NodeB)

Service IPVG1APP Server 1

NODE A

RG3 (NodeA, NodeB)

RG2 (NodeB, NodeA)

NODE B

RG4 (NodeB, NodeA)

Further Options• 1 RG vs. Multiple RGs

– Selective Fallover behavior (VG / IP)• RG Processing

– Parallel vs. Sequential• Delayed Fallback Timer

• RG Dependencies– Parent / Child– Location Dependencies

Best Practice:Always try to keep it simple, but stay current with new features and take advantage of existing functionality to avoid added manual customization.

Cluster Components

Nodes• up to 32 nodes, active와 standby nodes의 조합

• 모든 node가 running인 구성이 가능 (mutual takeover)

• 신뢰할 만한 cluster들은 최소한 한 개의 standby node가 있음

• 공동의 power supply를 가지면 안됨. Ex) 단일 rack에 위치하는 power supply

• 단일 frame내에 lpar로 cluster node를 생성해서는 안됨.

• 여분의 network, disk adapter들을 설치할 수 있는 충분한 I/O slot이 있어야 함

• 즉, Single node에 필요한 수량의 두 배수 만큼의 slot

• 모든 cluster resource는 backup이 있어야 함, 각 node의 rootvg는 mirror되거나 RAID device에 놓여야 함

Cluster Components(Cont.)

Nodes• Production application이 peak로 수행될 때에도 HACMP가 정상적으로 작동되도록 충분한 CPU

cycles과 I/O bandwidth가 확보되어야 함

• Takeover를 고려할 때 resource 사용률이 40% 이하를 유지해야 함

• 단일 standby node가 여러 active node를 backup하면, 모든 가능한 workload를 수행할 만큼의 충분한 용량이 확보 되어야 함

• Dlpar를 지원하는 H/W에서, application이 기동되기 전에 node를 takeover하기 위해서는 HACMP가processor와 memory를 할당 받아 구성할 수 있어야 함. 모든 resource(CPU, memory)는 사용 가능하거나, CoD를 통해서 획득 가능해야 함

DLPAR/CoD configuration

• Primary node에서 HACMP가 장애를 감지• Running in a partition on another server, HACMP grows the backup partition, activates the

required inactive processors and restarts application

Production Database Server

DLPAR/CoD Server

(running applications on active processors)

Active Processors Inactive Processors

HACMPHACMP

Database Server

Shared Disk

HACMPHACMP

Configuration Requirements

• HMC IP와 관리대상 시스템의 이름은 각 DLPAR에 구성되어야만 함

• 모든 DLPAR node들은 HMC와 SSH를 이용한 통신이 가능해야만 함- clverify checks the SSH connectivity

• Cod를 사용할 경우 key값은 HMC를 통해 수작업으로 활성화 되어야 함

• HACMP가 구성할 수 있는 최대 resource는 DLPAR profile 상의 최대 값과 같거나 그 이하임

Shared Disk

• Takeover로 인한 업무 증가로 CPU자원이 더 필요함• hypervisor에서 각 파티션의 CPU사용률을 모니터링하고 CPU자원이 필요한 파티션에 더 많은 CPU자원을 할당함

HACMP with micro-partition

WAS 서버

Test 서버 1

Production 서버1 (active)

CPU 사용률CPU CPU 사용률사용률

개발 서버

Test 서버 2

Production 서버2

CPU 사용률CPU CPU 사용률사용률

Production 서버1에 장애 발생& fallover

Micro-Partition환경 하에서하이퍼바이저가자동으로 CPU자원을 동적으로이동하여 부하를조절함.

Infrastructure Considerations

• Power Redundancy• I/O Drawers• SCSI Backplane• SAN HBAs• Virtualized Environments• Application Fallover Protection

I/O drawer

Ie 2. Two nodes sharing I/O drawer

Ie 1. SCSI adapters for rootvg on same BUS

Real Customer Scenarios:

Moral of the Story:* High Availability goes beyond just installing the cluster software

내장 SCSI 디스크 상의 Single Point of Failure

# lsdev -Ccdiskhdisk0 Available 0A-08-00-8,0 16 Bit LVD SCSI Disk Drivehdisk1 Available 0A-08-00-9,0 16 Bit LVD SCSI Disk Drivehdisk2 Available 0A-08-00-10,0 16 Bit LVD SCSI Disk Drive## lsdev -Cc adapter……………..scsi0 Available 0A-08 Wide/Ultra-3 SCSI I/O Controller……………..

Single Point of Failure

Single Point of Failure 예시(IO Drawer)

Internal Disk Drive ( 전면 ) PCI Adapter ( 후면 )

Planar 1 Planar 2

rootvg

2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

L8 L9 L10 L11 L8 L9 L10 L11 L8 L9 L10 L11 L8 L9 L10 L11

Planar 2 Planar 1

T5 T6 T5

rootvg

Benefit of Boot from SAN

• 구성의 유연성 - Power 시스템의 Internal adapter & - Bay 등의 제한에 따른 LPAR 구성 한계를 극복

• High performance- 내장 Disk 대비 월등한 외장 스토리지의 I/O 성능 활용가능- 외장 스토리지의 Disk cache 활용

• High reliability- DDM 장애 등에 무관하게 시스템 영속성 보장- 외장 스토리지의 높은 가용성 활용- 내부 복제, 원격복제 솔루션 등 활용가능

• 용량 효율성- 외장 스토리지에 복수개의 OS, backup image 등을 위치시킬 수 있음- 기존의 DDM size 단위로 rootvg 할당대비 유연한 할당 가능 (ex, 20GB, 40GB)

• 손쉬운 AIX 백업 가능 – Flashcopy 가 “mksysb” 를 대체- Point-In-Time OS backup 을 여러 벌 확보 가능- OS 백업(Flashcopy target본)을 portable 하게 다른 시스템에서 바로 사용가능

rootvg X1

datavg

I/O drawer * 2ea

16 core

Networks

• Network은 IP and Non-IP network로 구성됨. IP-Network과 Non-IP Network을 통해서 node간에통신하며 heartbeat을 주고 받음

• Network 장애인지 Node 장애인지를 구분하기 위해서 non-IP network 구성이 반드시 구성되어야 함

• HACMP는 다음의 세가지 type의 failure만을 다룸:- Network interface card (NIC) failures- Node failures- Network failures

Partitioned Cluster (Split Brain)

• IP network 문제로 heartbeat check가 정상적으로 이뤄지지 않아 각 node는 살아있음에도 상대 node가 down 되었다고 판단하여 서로가 takeover를 시도함.

• 이로 인해 양 node에서 RG가 동시에 online되는 현상이 발생.

• 공유 disk를 사용하는 application이 동시에 기동됨에 따라 data corruption이 발생할 수도 있음

• Proving that Node Isolation caused the problem:– /tmp/clstrmgr.debug file– AIX error log entry GS_DOM_MERGE_ER

Disk Heartbeating vs. RS232 Links

• HACMP 5.1에서 소개된 기능 : non-IP heartbeating을 위해 SAN 환경을 활용

• 사용 이유 :- RS232 cables의 거리 제약- integrated serial ports의 부족- 일부 model은 integrated port를 heartbeat용으로 사용하는데 제약사항이 있음- Clusters with more than two nodes may require an async adapter with a RAND

• 필요사항 : - 작은 size의 disk 또는 LUN (ex. 1GB)- bos.clvm.enh fileset 설치- An Enhanced Concurrent Mode VG

(RG로 정의될 필요는 없음)

Checking for Fast Failure Detection (FFD)

# lssrc –ls topsvcs | grep FastFast Failure Detection enabled

# odmget –q name=diskhb HACMPnimHACMPnim:

name = "diskhb"desc = "Disk Heartbeat Serial protocol"addrtype = 1path = "/usr/sbin/rsct/bin/hats_diskhb_nim"para = “FFD_ON“grace = 60hbrate = 3000000cycle = 8gratarp = 0entry_type = "adapter_type"next_generic_type = "transport"next_generic_name = ""src_routing = 0

Note: new feature starting with HACMP 5.4This change will not take effect until the NIM is recycled during a cluster restart

Reason to set this option:

In the event of a crash this option will allow for the takeover to start up immediately instead of waiting for the Failure Detection Rate timeout to pass.

Important network best practices for high availability

• IP address, subnetmasks, switch port setting, VLAN 등의 network단에서의 변경에 조심 - 장애 감지는 node당 최소 두장의 물리적 adapter가 동일한 물리적 network/VLAN 안에서 있을 때 가능함

• 최소 한 개의 non-IP network을 구성

• Network 가용성 확보 차원에서 HACMP에서 Etherchannel을 구성하여 사용하면 유용함

• 구성 시 secondary switch로 연결된 backup adapter를 포함시켜 구성할 것

• HACMP는 etherchannel 구성을 단일 adapter network으로 간주. adapter 장애 시 문제해결 지원을위해 netmon.cf file을 별도로 구성. Cluster 외부의 다른 interface로 ICMP echo request(ping)를 전송해서 adapter 장애를 판단

• 각 node에 persistent IP를 구성- 원격 관리, monitoring에 유용함

HACMP Topology Considerations

• IPAT via Replacement vs. IPAT via Aliasing *Considerations:- Max number service IPs within HACMP network- Hardware Address Takeover (HWAT)- Speed of Takeover- Firewall Issues

Node A Node B

en0 – 9.19.10.1 (boot) en0 – 9.19.10.2 (boot)

en0 - 9.19.10.28 (service IP) en0 – 9.19.10.2 (boot)

en1 – 192.168.11.1 (standby) en1 – 192.168.11.2 (standby)

net_ether_0

IPAT via Replacement

HACMP Topology Considerations

• Contrast between Replacement & AliasingConsiderations:- Max number service IPs within HACMP network- Speed of swap- Hardware Address Takeover (HWAT)- Firewall Issues

Node A Node B

en0 – 192.168.10.1 (base1) en0 – 192.168.10.2 (base1)

9.19.10.28 (persistent a) 9.19.10.29 (persistent b)

9.19.10.51 (service IP)

en1 – 192.168.11.1 (base2) en1 – 192.168.11.2 (base2)

9.19.10.50 (service IP)

net_ether_0

IPAT via Aliasing

Etherchannel with HACMP• Example of an EtherChannel (Backup) Configuration

Network Switches

ent1ent2 ent2en2 10.10.100.1 boot boot 10.10.101.1 en2

192.168.10.1 persistent a

192.168.10.2 service IP

persistent b 192.168.10.4

Network: net_ether_0

Verification Messages:

For nodes with a single Network Interface Card per logical network configured, it is recommended to include the file '/usr/es/sbin/cluster/netmon.cf' with a "pingable“ IP address as described in the 'HACMP Planning Guide'.

WARNING: File 'netmon.cf' is missing or empty on the following nodes:Node A Node B

Node A Node B

primary secondary

Netmon.cf

• Etherchannel과 같은 단일 adapter의 장애를 HACMP가 정확하게 판단하기가 어려울 수 있음

• RSCT Topology Services가 단일 adapter의 정상 작동 여부를 확증하기 위해 packet을 강제적으로 전송할 수 없기 때문임

• Etherchannel과 같이 single adapter network 구성에서는 netmon.cf file을 생성. - 일반적으로 default G/W IP 주소를 사용- 다른 node로 부터의 heartbeat packet 수신이 안되면, 미리 설정되었던 G/W 로 ping을 시도

• /usr/sbin/cluster/netmon.cfEx) 180.146.181.119

steamerchowder

180.146.181.121

Topology environments with a Firewall

If multiple IPs on the same subnet are configured on the same interface AIX will utilize the 1st one configured for the outbound traffic.

Tip:If you only need to manage one service IP per HACMP network consider using IPAT via Replacement to avoid having multiple IPs on the same interface.

Node X

en0 – 10.10.10.1 boot9.19.51.1 persistent IP9.19.51.2 service IP1

en1 – 10.10.11.1 boot

Firewall

Clients can talk to the cluster nodes, but node initiated traffic from the

9.19.51.X network will look like its coming from the

persistent IP not the service IP

Network set to use IPAT via Aliasing

Overriding default AIX behavior sometimes requires some creativity

Work around 1:Perform an ifconfig down of the persistent alias within the application start script followed by an immediate ifconfig up to make it the second IP on the list

Node X

en0 – 10.10.10.1 boot9.19.51.2 service IP9.19.51.1 persistent IP

en1 – 10.10.11.1 boot

Firewall

Clients still talk to the cluster nodes via both IPs, but node initiated traffic now from the 9.19.51.X network will look like its

coming from the service IP

Network set to use IPAT via Aliasing

IPAT via Replacement will only host one IP on an interface at any given time hence avoiding multiple IPs within the same subnet

Work Around 2:If you only need to manage one service IP per HACMP network consider using IPAT via Replacement to avoid having multiple IPs on the same interface.

Node X

en0 – 9.19.51.1 boot

en1 – 10.10.11.1 standby

Firewall

When HACMP is activated the base address will be

replaced by the new service IP address that the clients

use to connect to

Network set to use IPAT via Replacement

en0 – 9.19.51.2 service IP1

Enhanced Concurrent VG

• AIX 5L V5.1 처음 소개됨• HACMP에서 사용 가능한 모든 Disk에서 구현 가능함• JFS and JFS2 filesystems 지원

– File systems은 한번에 한 node에만 mount• 기존의 classic concurrent volume groups을 대체• Enhanced concurrent VGs는 다음의 기능을 사용하기 위해 필요함 :

– Heartbeat over disk for a non-IP network– Fast disk takeover

Converting VG to ECM

• Stop Cluster

• 한 Node씩 아래와 같은 절차를 수행- varyonvg <VGNAME>- chvg -C <VGNAME>- varyoffvg <VGNAME>

• 모든 Node에서 lsattr -El <VGNAME> 을 통해 VG 의 속성이 이상이 없는지 확인auto_on n N/A Trueconc_auto_on n N/A Trueconc_capable y N/A True

• Verification and synchronization

Converting VG to ECM(cont.)

Active varyon vs. passive varyon

• Active varyon

Active varyon vs. passive varyon(Cont.)

• Passive varyon

Adapters

• 다중 port adapter를 사용할 경우 한 adapter내에서 빈 port를 이용하여 backup을 구성해서는 안됨.

• Built-in Ethernet adapter를 사용할 경우, Node내에 별도의 backup adapter가 있어야 함

• 가능하면 별도의 IO Drawer 또는 서로 다른 backplane, BUS에 adapter를 위치시켜서 이중화 구성을해야 함

Applications & application Scripts

• 자동화

- 관리자에 의한 수작업 없음

• 일부 application들은 uname이나 serial no. 또는 IP address와 같은 특정한 OS 특성과 밀접한 관계를보이는 경향이 있음 (ex. SAP)

• Application이 현재 running 중인지 확인- RG가 unmanaged 상태일 때, default startup option을 사용하면 HACMP가 application start script를 재 수행함

• Data 상태 확인. 복구가 필요한가?

• Correct Coding :- start with declaring a shell (ex. #!/bin/usr/ksh)- exit with RC=0- application이 정말 중단되었는지 확인하는 절차 포함- fuser

• Smart Assist : DB2, Websphere, Oracle

Application Monitoring

세가지 type의 monitoring 법• Startup Monitors – run one time• Process Monitors – check specified process instance in the process table• Custom Monitors – run your specified script during reiterating interval

Resource monitors 는 문제 발생 시 단순 알림(notify) event를 수행하도록 구성할 수도 있고, 정상인 상대 node로 서비스가 fallover 되도록 구성할 수도 있음

Don’t stop at just the base configuration - with thorough testing these can be great tools to automate recovery and save an admin time.

Application Monitoring(Cont.)

• Configure Process Application Monitorsex)

Application Monitoring(Cont.)

• Configure Custom Application Monitorsex)

Testing Best Practices

• Production으로 이행하기 전에 application scripts와 application monitoring을 철저히 test해야 함

• 모든 방향으로의 fallover를 test 해야 함

• Test Cluster - Lpars within same frame- Virtual resources

• 가용한 Tool을 활용 – Cluster Test Tool- Further customization enhancements in HACMP 5.4

• 정기적인 test 계획이 세워져야 함 – ex) node fallover and fallback test- 최소 반기 1회 이상

Maintenance

• 통제된 관리 환경

- 고 가용성을 확보하기 위해 가장 중요한 것 : 엄격한 변경제어, 변경절차 관리, Test

• Cluster node에 변경작업을 하기 전에 HACMP snapshot을 받아 놓아야 함

• OLPW(Online Planning Worksheet)을 이용한 HTML 보고서 생성- http://www-03.ibm.com/systems/power/software/availability/aix/apps/download.html

• CSPOC operations - Cluster Single Point of Control - No TIPing (Testing In Production) : Production 환경에서는 절대로 test 해서는 안됨

• Production과 동일한 test cluster 시스템 유지 – ex) PowerVm을 이용한 test환경 구축

• 문서화된 복구 절차

- test를 통해 복구절차와 예상되는 결과를 문서화 해서 활용

Documenting the Environment with OLPW

• Online Planning Worksheets: HTML report file

What about PowerHA and distance?

• Two options, Remote PowerHA or PowerHA/XD• Remote PowerHA

– Systems see same copy of data (shared access to same LUNs)– LVM mirroring (with the optional but recommended Cross-site LVM Mirroring configured) keep copies in sync across sites– System-down condition is a site-down condition– Cross-site LVM Mirroring facilitates integration/reintegration, managing LVM mirror copy consistency between sites

• PowerHA/XD– Systems see “own” copy of data that is replicated– Data replication mechanism keeps copies in sync across sites• GLVM (Geographic Logical Volume Manager)• Metro-Mirror in DS8000 or SVC– Local fallover option can be employed to keep a system-down condition local to the site– PowerHA/XD facilitates the integration/reintegration, including replication role reversal, between sites

• Again, networking (IP/non-IP) used for monitoring in both cases

6. 첨부

Case #1 – A사

NODE A NODE B

StorageSubsystem

2 Node Cluster (A + S)

IP Aliasing

Etherchannel : (A + B)

Rs232/MNDHB : 모두 사용

Application Monitoring : 사용

Takeover Test : 안함

Application Monitoring timeout이 짧아서 fallover 되는도중에 다시 fallback을 시도하는현상이 있었음(duration time을늘려서 정상 조치함)

Cluster Name: A사

Network: net_ether0

Network: rs232_net

Network: diskhb_net

RG1(NodeA, NodeB)

Standby Node

Case #2 – B사

NODE A NODE C

StorageSubsystem

RG3 (NodeC,NodeB)

3 Node Cluster :(A + A + A)

IP Aliasing

Etherchannel : (A + B)

Rs232/MNDHB : 모두 사용

Application Monitoring : 미사용

Takeover Test : 주기적

Cluster Name: B사

Network: net_ether0

Network: rs232_net

Network: diskhb_net

RG1(NodeA, NodeB)

RG2 (NodeB,NodeA)

NODE B

Case #3 – C사

NODE A NODE D

StorageSubsystem

4 Node Cluster :(A + A + A +A)

IP Aliasing

Etherchannel : (A + B)※ Active 2장

Rs232 : 사용

Application Monitoring : 미사용

Takeover Test : 부정기적

Cluster Name: C사

Network: net_ether0

Network: rs232_net

NODE B

Case #4 – D사

1 x 64 way IBM System p6 595

13 dedicated partitions for DB19 Dedicated partitions for App

8 VIOS with production clients

NIM/CSM partition

13 dedicated partitions for failover 9 Dedicated partitions for App

8 VIOS with production clients

NIM partition

ThinkCenter

Total StorageDS8300 Total Storage

DS8100

HMC(Redundant)

40 miles

2 VIOS with 20 VIO clients for Tivoli applications

2 VIOS with 13 VIO clients for Dev/QA servers

Site 1(IDC) Site 2(Juneau)

No Physical Adapters except VIOs !

Power 595 Power 595

1,400 Business Applications are planning to run!

6. 첨부

VSCSI General Diagram

FRAME 1

Hypervisor

VIO Server

vhost0

AIX Client LPAR 1

hdisk0 rootvgvscsi0

hdisk1 datavg1

AIX Client LPAR 2

vhost1

hdisk0 rootvg

hdisk1 datavg2vscsi0hdisk2(72GB)

WholeDisk

Mapping(no_reserve)

hdisk0(5GB)

rootvg

hdisk1(36GB)

(10GB)

(21GB) free partitions

client_lunsvg

client1_datavg_lv

client1_rootvg_lv

VIO Client로 lun을 export 하는 2가지 방법:• Logical volumes within VG• Whole Disk mapping – required when sharing across VIO Servers

(Enhanced Concurrent VGs required on VIO Clients)

VSCSI General Diagram – (Mapping entire Disk)

FRAME 1

Hypervisor

AIX Client LPAR 2

VIO Server 1

vhost0

hdisk0 rootvg

hdisk1 datavgvscsi0

hdisk1(72GB)

WholeDisk

Mapping(no_reserve)

hdisk0(5GB)

rootvg

StorageSubsystem

VIO Server 2

vhost0hdisk1(72GB)

WholeDisk

Mapping(no_reserve)

hdisk0(5GB)

rootvg vscsi1MPIO

Virtual SCSI & HACMP 구성 절차

• On Storage device- 대응하는 VIO server 두 대에 Luns 할당

• On HMC- Mappings 정의– (vhost & vscsi)

• On VIO Server 1- “no_reserve” 속성 설정

chdev -l <hdisk#> -a reserve_policy=no_reserve –a algorith=round_robin

- 각 client로 luns exportmkvdev –vdev hdisk# -vadapter vhost0mkvdev –f –vdev hdisk# -vadapter vhost1

• On VIO Server 2- “no_reserve” 속성 설정

chdev -l <hdisk#> -a reserve_policy=no_reserve

- 각 client로 luns exportmkvdev –vdev hdisk# -vadapter vhost0mkvdev –f –vdev hdisk# -vadapter vhost1

Virtual SCSI & HACMP 구성 절차(Cont.)

• On Clients- MPIO SDDPCM 설치- 첫 번째 client에서 공유 VG(volume group)을 ECVG (Enhanced Concurrent VG)로 생성

(bos.clvm.enh fileset이 필요함) - Client 1에서 varyoffvg- Client 2에서 importvg- Define to HACMP as a shared resource

VSCSI Disks & HACMP (Same Frame - Single HBA on VIO Servers)

FRAME 1

Hypervisor

HACMP Node A

VIOS 2

VIOS 1hdisk0

hdisk1 }sharedvg

STORAGESUBSYSTEM

hdisk0

vhost0 vscsi0

HACMP Node B

hdisk1 }sharedvgvscsi0vhost0

no_reserve

What is wrong with this picture?This configuration works initially because HACMP has visibility to the shared disk from both servers. However, it does NOT provide VIO Server redundancy. If VIO Server1 were to fail or go down for maintenance HACMP Node 1 would not be able to see or utilize the disks.

VSCSI Disks & HACMP (Same Frame - Single HBA on VIO Servers)

FRAME 1

Hypervisor

HACMP Node A

VIOS 2

VIOS 1hdisk0

hdisk1 }sharedvg

STORAGESUBSYSTEM

hdisk0

MPIOvhost0 vscsi0

vscsi1

HACMP Node B

hdisk1 }sharedvgMPIO

vscsi0

vscsi1

vhost1

vhost0

vhost1

For HACMP clients within the same frame you need an additional path connection from each VIO Server. This additional VSCSI lun export is accomplished by using the mkvdev command with the “-f” flag option.

no_reserve

VSCSI Disks & HACMP (2 Frames - Single HBA on VIO Servers)

FRAME 1

Hypervisor

FRAME 2

HACMP Node A

VIOS 2

VIOS 1

hdisk0

VIOS 2

VIOS 1

hdisk1 }sharedvg

STORAGESUBSYSTEM

hdisk0

vhost0

vscsi0

vscsi1

Hypervisor

HACMP Node B

hdisk1 }sharedvgMPIO

vhost0

vscsi0

vscsi1

no_reserveno_reserve

VSCSI Disks & HACMP (2 Frames - Dual HBAs on VIO Servers)

FRAME 1

Hypervisor

HACMP Node A

VIOS 2

VIOS 1

hdisk1 }sharedvg

hdisk0

vhost0

vscsi0

vscsi1

FRAME 1

Hypervisor

HACMP Node B

VIOS 2

VIOS 1

hdisk1 }sharedvg

hdisk0

vhost0

vscsi0

vscsi1

hdisk0

no_reserveno_reserve

HACMP View of Virtual SCSI

hdisk0

sharedvg

Resource Group 2

sharedvg

Resource Group 1

hdisk0

sharedvg

Resource Group 2

VSCSIShared disk

sharedvg

Resource Group 1

Node A Node B

PVID PVID

Enhanced Concurrent

FRAME X

VIOS 2

VIOS 1

hdiskx

hdiskxH

ypervisor

HACMP Node A

hdiskx }sharedvgMPIO

vhost0

vscsi0

vscsi1

hdiskx

no_reserve

Virtual SCSI Summary

• Luns 은 VIO Servers에 할당됨- 다중(최소 2개) VIO Servers를 사용 (Frame 당 2개)

• “No Reserve” 속성이 반드시 설정되어야 함

• HACMP Clients에서 ECM VGs를 활용함

Virtual I/O Network Terms

Hypervisor

AIX Client LPAR 1

ent1(virt)

en1en0

ent0(virt)

Virtual I/O Server (VIOS)

ent1(phy)

ent4(SEA)

ent2(virt)

AIX Client LPAR 2

ent0(virt)

ent1(virt)

ent0(phy)

Virtual Ethernet

PVID=10

Path Virtual ID (PVID)IEEE 802.1Q Virtual LAN (VLAN)

IEEE 802.3ad Link Aggregation (LA)Cisco EtherChannel (EC)

Interface

Shared EthernetAdapter (Acts as a layer 2 bridge)

VirtualEthernetAdapter

Physical EthernetAdapter

Ethernet Switch

ent3(LA)

PVID=1PVID=10 PVID=1

Link AggregationAdapter (Combines physical adapters)

ent5(virt)

ChannelGroup

Virtual Ethernet & HACMP (No Link Aggregation / Same Frame)

Frame 1

Virtual I/O Server (VIOS2)

Hypervisor

AIX Client LPAR 1

ent0(virt)

ent4(SEA)

ent2(virt)

AIX Client LPAR 2

ent0(virt)

ent0(phy)

Ethernet Switch

ent0(phy)

ent4(SEA)

ent2(virt)

Ethernet Switch

ent6(virt)

ent5(virt)

ControlChannel

ent5(virt)

ent6(virt)

PVID 99

PVID 10

This is a diagram of the configuration required for SEA fallover across VIO Servers. Note that Ethernet traffic will not be load balanced across the VIO Servers. The lower trunk priority on the “ent2” virtual adapter would designate the primary VIO Server to use.

Virtual Ethernet & HACMP (Link Aggregation / Same Frame)

Frame 1

Hypervisor

AIX Client LPAR 1

ent0(virt)

ent1(phy)

ent4(SEA)

ent2(virt)

AIX Client LPAR 2

ent1(virt)

ent0(phy)

Ethernet Switch

ent3(LA)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

Ethernet Switch

ent6(virt)

ent5(virt)

ControlChannel

ent5(virt)

ent6(virt)

PVID 99

PVID 10

Note that Ethernet traffic will not be load balanced across the VIO Servers. The lower trunk priority on the “ent2” virtual adapter would designate the primary VIO Server to use.

Virtual Ethernet & HACMP (Independent Frames)Virtual I/O Server (VIOS2)

Hypervisor

AIX Client LPAR 1

ent0(virt)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

Ethernet Switch

ent3(LA)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

Hypervisor

AIX Client LPAR 2

ent0(virt)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

Ethernet Switch

Frame1

Frame2

ent5(virt)

ControlChannel

HACMP Node 1

HACMP View of Virtual Ethernet (IPAT via Aliasing)

HACMP Node 2

FRAME 2

net_ether_0

192.168.100.1( base address)

192.168.100.2( base address)

FRAME 1

9.19.51.20 (service IP)9.19.51.10 (persistent IP)

(service IP) 9.19.51.21(persistent IP) 9.19.51.11

Topsvcs heartbeating

serial_net1

Hypervisor

AIX Client LPAR

ent0(virt)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

ent0(phy)

ent4(SEA)

ent2(virt)

ent1(phy)

ent3(LA)

FRAME X

ent5(virt)

ControlChannel Control

Channel

Virtual Ethernet – Additional considerations

• 단일 Adapter Network에 대해서는 항상 netmon.cf file을 구성: /usr/es/sbin/cluster/netmon.cf

In virtualized environments:

9.12.4.11

9.12.4.13

9.12.4.11

!REQD en2 100.12.7.9

9.12.4.13

!REQD en2 100.12.7.10-

Typical File:

Most adapters will use netmon in the traditional manner, pinging9.12.4.11 and 9.12.4.13 along with other local adapters or knownremote adapters, and will only care about the interface's inbound byte count for results.

interface en2 will only be considered up if it can ping either 100.12.7.9 or 100.12.7.10

Note:There are additional !REQD formats that may be used within the netmon.cf file outlined in the description of APAR IZ01332

6. 첨부

Do’s

• 가능하면 IPAT via Aliasing와 enhanced concurrent VG를 사용

• 모든 공유 LVM 요소들은 unique한 이름으로 생성

• 주기적인 cluster snapshot과 system backup을 받을 것

• 철저한 계획과 충분한 Test 할 것(ex: application script)

• 가용성 증대와 자가 치유(복구)를 지원하기 위해 application monitoring을 구성할 것

• 변경에 대한 충분한 test를 위해 test 환경을 구축할 것

• 최소 한 개 이상의 non-IP network을 포함하여 신뢰할만한 heartbeat망을 구축할 것

• cluster에서 문제가 발생 할 경우 SNMP를 통한 alert나 SMS, email 전송 시스템 구성

• 가용한 HACMP features를 활용할 것 : application monitoring, extended cluster verification methods, ‘automated’ cluster testing (in TEST only), file collections, fast disk takeover, fast failure detection.

Don’ts

• Node 한쪽만 변경하고 다른 node는 sync하지 않는 것. - 항상 모든 변경이 이뤄진 즉시 sync할 것. - 만약, 한 node는 up이고 다른 node들은 down이라면 변경이 적용되고 sync되는 것은 살아있는(active)node에서실행할 것

• HACMP menu외에서 변경작업 시도. CSPOC 사용할 것

• 지나치게 복잡한 HACMP 구성, test하기 힘든 구성

• 기본 application start, stop script 구현- Pre-requisite 확인 절차와 error 복구 routine을 포함하지 않음. - 항상 script들이 표준 입, 출력 관련 상세 log를 쌓도록 작성할 것

• failover시간을 늘어지게 하는 filesystem 구성- 상호 의존관계(dependency)나 처리를 위한 별도의 절차와 대기 시간(wait)을 생성

Don’ts

• 훈련되지 않고 cluster를 잘 모르는 관리자(admin.)에게 root 권한 제공

• 신중하고 충분한 고려 없이 network에 대한 failure detection rate 변경

• Application stop시 “# kill `ps –ef | grep appname | awk ‘{print $2}’`” 이 내용은 HACMP application monitor도 kill 시킬 수 있음.

• DB가 raw lv를 사용하는데 standard AIX VGs(Original VG)로 구성하는 것- Big 또는 scalable VG로 구성할 것- lvcb의 copy본이 VGDA에도 생성되며, 시스템이 halt될 경우 DB 가 AIX 에서 reserve 한 첫 4K block 에 write 가능. 그렇지 않을 경우(Original VG사용) halt시 db가 깨지는 현상 발생

• 수작업 형태의 절차를 이용하여 Application의 고 가용성을 관리하는 것

6. 첨부

Parameter Tuning - Network

• Failure Detection Parameters• The time needed to detect a failure : (heartbeat rate) X (cycles to fail) X 2

- Cycles to fail (cycle) : The number of heartbeats missed before detecting a failure- Heartbeat rate (hbrate) : The number of seconds between heartbeats.

• For IP Networks

• For Non-IP Networks

IP Network SettingSeconds betweenHeartbeats

Failure CycleFailureDetection Rate

Slow 2 12 48

Normal 1 10 20

Fast 1 5 10

IP Network SettingSeconds betweenHeartbeats

Failure CycleFailureDetection Rate

Slow 3 8 48

Normal 2 5 20

Fast 1 5 10

Parameter Tuning – Network(Cont.)

• The default setting for the failure detection rate is usually optimal• Be careful to change setting, fast or too low custom values can cause false takeover to

occurex)

Parameter Tuning – DMS checklist

• False takeover를 방지하기 위한 설정으로 가급적 아래의 내용대로 수정을 권고합니다

가이드라인 확인방법 / 가이드 권고 내용

IO pacing 확인1.#lsattr -El sys0check the maxpout & minpout

P4High water mark: 33Low water mark: 24

P5 이상(AIX 5.3 or 6.1)High water mark: 8193Low water mark: 4096

syncd 주기 증가

Check:1.#ps -ef|grep syncd2.#pg /sbin/rc.bootAIX Default value is 6010 is recommended for most clusters

syncd frequency: 10

Failure Detection Rate(Notice: you must stop HACMP services on all cluster nodes before change the attribute)

1. If your network is always busy, SLOW is recommended to prevent DMS.

2. Synchronize the Cluster Topology and resources from the node on which the change occurred to the other nodes in the cluster.

FDR for ether networks is set to SLOWFDR for rs232 networks is set to SLOW

변경 전 확인 절차(Check list)

• 필요한 변경인가?• 얼마나 긴급한 변경인가?• 얼마나 중요한 변경인가?• 변경이 cluster관점에서 미치는 영향은?• 변경이 허용되지 않을 경우 이것이 미칠 영향?• 변경에 필요한 모든 절차가 명료하게 이해되고 문서화 되었나?• 변경에 대한 Test는 어떤 식으로 이뤄졌나?• 필요할 경우 변경한 내용을 되돌릴 수 있는 계획은?• 변경은 언제로 일정이 잡혔나?• 사용자들에게 통보되었나?• 유지정비를 위해 계획된 시간에 변경 전 Full backup을 위한 시간과 변경이 실패 할 경우 복구를 위한 충분한 시간이 포함되었나?

Test your cluster before going live! (Checklist)

• Careful testing of your production cluster before going live reduces the risk of problems later.

Test Item How to test Checked

Node Fallover

Network Adapter Swap

IP Network Failure

Storage Adapter Failure

Disk Failure

Clstrmgr Killed

Serial Network Failure

SCSI Adapter for rootvg Failure

Application Failure

Node re-integration

Partitioned Cluster

Lifecycle of HACMP

• Support Life Cycle for HACMP (typically 3 year lifecycle)

Version Release Date End of Support Date

HACMP 5.1 July 11, 2003 Sep 1, 2006

HACMP 5.2 July 13, 2004 Sep 30, 2007

HACMP 5.3 Aug 12, 2005 Sept 30, 2009

HACMP 5.4 Nov 09, 2007 N/A

PowerHA 5.5 Nov 14, 2008 N/A

HACMP Version Compatibility Matrix

AIX 4.3.3 AIX 5.1 AIX 5.1(62bit) AIX 5.2 AIX 5.3 AIX 6.1

HACMP 4.5 No Yes No Yes No No

HACMP/ES 4.5 No Yes Yes Yes No No

HACMP/ES 5.1 No Yes Yes Yes Yes No

HACMP/ES 5.2 No Yes Yes Yes Yes No

HACMP/ES 5.3 No No No Yes Yes Yes

HACMP/ES 5.4.0 No No No TL8+ TL4+ No

HACMP/ES 5.4.1 No No No TL8+ TL4+ Yes

PowerHA 5.5 No No No No TL9+ TL2,SP1+

2009 하반기 sma 세미나 hacmp best practices 제대로작동하지않는일반적인이유:...

Documents

2016년 하반기 산업전망: 디스플레이 10년 後...

2012 하반기 브로셔

2010년 하반기 전망 보고서

2014년 서울지역 아르바이트 노동실태Ⅱ…„...

python essential 세미나

2017년 하반기 전망: 채권 2017년 하반기...

2012 하반기 경제·산업 전망

멀티미디어신기술 세미나

2012 하반기 산업전망 2차전지 -...

go thrive coaching 세미나 - igomt.com filego thrive...

2009년도 하반기 아주아이비투자(주)의...

코리아오에스아이소프트코리아 - osisoft ·...

141216 미디어로그 하반기 호프데이

하반기 주식시장 전망 및 투자유망종목

미디어 산업 2015 하반기 전망

vert.x 세미나 이지원_배포용

알쏭달쏭 ? 보안 세미나

2015년도 하반기 특수건강진단 및 3/4분기...

2011년 하반기 주식시장 전망 -...

2013 하반기 은혜의집 소식지(30호)