2009 하반기 sma 세미나 hacmp best practices 제대로작동하지않는일반적인이유:...
Post on 14-Mar-2018
232 Views
Preview:
TRANSCRIPT
© 2009 IBM Corporation
2009 하반기 SMA 세미나
HACMP Best Practices
2009. 10. 15.백진훈 (jhb@kr.ibm.com)MTS, GTS, IBM Korea
© 2009 IBM Corporation
목차
Contents 1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
PowerHA Message
• HACMP is now PowerHA for AIX
Renamed as part of Power software initiativePublications and product (binaries) continue to use HACMP
© 2009 IBM Corporation
개요 및 문제점
HACMP : • 1991년에 출시, 최신 5.5까지 20번째 release
• 전세계적으로 production 환경에서 60,000 이상의 Reference
• AIX 기반의 강력하고 안정적인 고 가용성 Product
• 다양한 형태의 유연한 구성 지원
• 구성단계에서 검증 절차도 통과되어 작동되고, 운영 중이기는 하지만 실제 고 가용성을 제공해야 한다는 측면에서 볼 때는 최적의 구성이 아닌 경우가 있음
• 높은 수준의 가용성을 확보하기 위해서 고려되어야 할 점들은 무엇인가?
© 2009 IBM Corporation
Why do good clusters turn bad?
HACMP가 제대로 작동하지 않는 일반적인 이유 :• 잘못된 cluster design과 철저하지 못한 planning
- design, planning, test
• 기본 TCP/IP and LVM 구성 문제
• HACMP cluster topology and resource 구성 문제
• Cluster를 운영하는데 있어 변경관리 규율(통제) 부재
• Cluster 관리자에 대한 교육 부족
• Performance/capacity 문제
© 2009 IBM Corporation
Designing High Availability
• “…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs) through appropriate design, planning, selection of hardware, configuration of software, and carefully controlled change management discipline.…”
© 2009 IBM Corporation
Fault resilience vs. Fault tolerance
High Availability does not mean no interruption to the application thus we say,
fault resilient instead of tolerant.
© 2009 IBM Corporation
Eliminating single points of failure
Cluster Object Eliminated as a single point of failure by . . .
Node Using multiple nodes
Power Source Using multiple circuits or uninterruptible power supplies
Network adapterNetwork
Using redundant network adaptersUsing multiple networks to connect nodes
TCP/IP Subsystem Using non-IP networks to connect adjoining nodes and clients
Disk adapterDisk
Using redundant disk adapter or multipath hardwareUsing multiple disks with mirroring or raid
Application Add node for takeover; configure application monitor
Administrator Add backup or very detailed operations guide
Site Add additional site
• “…A fundamental design goal of (successful) cluster design is the elimination of single points of failure (SPOFs).…”
© 2009 IBM Corporation
목차
Contents 1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
HACMP Cluster Configuration – Step by Step
RG1 (NodeA, NodeB)
NODE A NODE B
StorageSubsystem
RG2 (NodeB)
Service IPVolume GroupDevelopment App
Topology Components:• Cluster name• Node Names• IP Network
• Interfaces• Serial Network
Types of Resources• Service IP• Volume Group/s• Application Server
Resource Components:• Resource Group/s
• Policies- startup- fallover- fallback
• Dependencies• Parent / Child• Location
Cluster Name: Cluster 1
Network: net_ether0
Network: rs232_net
Network: diskhb_net
Service IPVolume GroupProduction App
© 2009 IBM Corporation
HACMP Location Dependencies
RG1 (NodeA, NodeB)
NODE A NODE B
StorageSubsystem
RG2 (NodeB)
Service IPVolume GroupDevelopment App
Cluster Name: Cluster 1
Network: net_ether0
Network: rs232_net
Network: diskhb_net
Service IPVolume GroupProduction App
Location Dependencies:- RGs can coexist on same node- RGs can coexist on different nodes
- Can also set Priorities:- High- Intermediate- Low
Diagram Assumptions:RG1 – High PriorityRG2 – Low Priority
On Fallover:- RG2 Offline- RG1 will move to Node B
© 2009 IBM Corporation
HACMP Fallover Scenarios Mutual Fallover
RG1 (NodeA, NodeB)
NODE A NODE B
StorageSubsystem
RG2 (NodeB, NodeA)
Service IPVolume GroupApplication 2
Cluster Name: Cluster 1
Network: net_ether0
Network: rs232_net
Network: diskhb_net
Service IPVolume GroupApplication 1
Environment:- Each Machine is running its own
Production application- Node A fails to Node B- Node B fails to Node A
Fallover Behavior:- On fallover the target machine will
need enough CPU & memory resources in order to handle the load of both applications
RG2 (NodeB, NodeA)
Service IPVolume GroupApplication 2
RG1 (NodeA, NodeB)
Service IPVolume GroupApplication 1
ONLINE
ONLINE
© 2009 IBM Corporation
Resource Group Tips
RG Decisions beyond: Startup Fallover & Fallback behavior
RG1 (NodeA, NodeB)
Service IPVG1APP Server 1
NODE A
RG3 (NodeA, NodeB)
Service IPVG3APP Server 3
RG2 (NodeB, NodeA)
Service IPVG2APP Server 2
NODE B
RG4 (NodeB, NodeA)
Service IPVG4APP Server 4
Further Options• 1 RG vs. Multiple RGs
– Selective Fallover behavior (VG / IP)• RG Processing
– Parallel vs. Sequential• Delayed Fallback Timer
• RG Dependencies– Parent / Child– Location Dependencies
Best Practice:Always try to keep it simple, but stay current with new features and take advantage of existing functionality to avoid added manual customization.
© 2009 IBM Corporation
Cluster Components
Nodes• up to 32 nodes, active와 standby nodes의 조합
• 모든 node가 running인 구성이 가능 (mutual takeover)
• 신뢰할 만한 cluster들은 최소한 한 개의 standby node가 있음
• 공동의 power supply를 가지면 안됨. Ex) 단일 rack에 위치하는 power supply
• 단일 frame내에 lpar로 cluster node를 생성해서는 안됨.
• 여분의 network, disk adapter들을 설치할 수 있는 충분한 I/O slot이 있어야 함
• 즉, Single node에 필요한 수량의 두 배수 만큼의 slot
• 모든 cluster resource는 backup이 있어야 함, 각 node의 rootvg는 mirror되거나 RAID device에 놓여야 함
© 2009 IBM Corporation
Cluster Components(Cont.)
Nodes• Production application이 peak로 수행될 때에도 HACMP가 정상적으로 작동되도록 충분한 CPU
cycles과 I/O bandwidth가 확보되어야 함
• Takeover를 고려할 때 resource 사용률이 40% 이하를 유지해야 함
• 단일 standby node가 여러 active node를 backup하면, 모든 가능한 workload를 수행할 만큼의 충분한 용량이 확보 되어야 함
• Dlpar를 지원하는 H/W에서, application이 기동되기 전에 node를 takeover하기 위해서는 HACMP가processor와 memory를 할당 받아 구성할 수 있어야 함. 모든 resource(CPU, memory)는 사용 가능하거나, CoD를 통해서 획득 가능해야 함
© 2009 IBM Corporation
DLPAR/CoD configuration
• Primary node에서 HACMP가 장애를 감지• Running in a partition on another server, HACMP grows the backup partition, activates the
required inactive processors and restarts application
Production Database Server
DLPAR/CoD Server
(running applications on active processors)
Active Processors Inactive Processors
Web
Ser
ver
Ord
er E
ntry
HACMPHACMP
Database Server
Shared Disk
HACMPHACMP
© 2009 IBM Corporation
Configuration Requirements
• HMC IP와 관리대상 시스템의 이름은 각 DLPAR에 구성되어야만 함
• 모든 DLPAR node들은 HMC와 SSH를 이용한 통신이 가능해야만 함- clverify checks the SSH connectivity
• Cod를 사용할 경우 key값은 HMC를 통해 수작업으로 활성화 되어야 함
• HACMP가 구성할 수 있는 최대 resource는 DLPAR profile 상의 최대 값과 같거나 그 이하임
© 2009 IBM Corporation
Shared Disk
• Takeover로 인한 업무 증가로 CPU자원이 더 필요함• hypervisor에서 각 파티션의 CPU사용률을 모니터링하고 CPU자원이 필요한 파티션에 더 많은 CPU자원을 할당함
HACMP with micro-partition
WAS 서버
Test 서버 1
Production 서버1 (active)
0
100
CPU 사용률CPU CPU 사용률사용률
CPU 사용률CPU CPU 사용률사용률
CPU 사용률CPU CPU 사용률사용률
0
100
0
100
개발 서버
Test 서버 2
Production 서버2
0
100
CPU 사용률CPU CPU 사용률사용률
CPU 사용률CPU CPU 사용률사용률
CPU 사용률CPU CPU 사용률사용률
0
100
0
100
Production 서버1에 장애 발생& fallover
Micro-Partition환경 하에서하이퍼바이저가자동으로 CPU자원을 동적으로이동하여 부하를조절함.
© 2009 IBM Corporation
Infrastructure Considerations
• Power Redundancy• I/O Drawers• SCSI Backplane• SAN HBAs• Virtualized Environments• Application Fallover Protection
I/O drawer
I/O drawer
I/O drawer
I/O drawer
I/O drawer
1
2
3
4
5
6
7
8
9
10
Ie 2. Two nodes sharing I/O drawer
Ie 1. SCSI adapters for rootvg on same BUS
Real Customer Scenarios:
Moral of the Story:* High Availability goes beyond just installing the cluster software
© 2009 IBM Corporation
내장 SCSI 디스크 상의 Single Point of Failure
# lsdev -Ccdiskhdisk0 Available 0A-08-00-8,0 16 Bit LVD SCSI Disk Drivehdisk1 Available 0A-08-00-9,0 16 Bit LVD SCSI Disk Drivehdisk2 Available 0A-08-00-10,0 16 Bit LVD SCSI Disk Drive## lsdev -Cc adapter……………..scsi0 Available 0A-08 Wide/Ultra-3 SCSI I/O Controller……………..
Single Point of Failure
© 2009 IBM Corporation
Single Point of Failure 예시(IO Drawer)
Internal Disk Drive ( 전면 ) PCI Adapter ( 후면 )
Planar 1 Planar 2
rootvg
11
fcs2
ent4
1
ent48
ent8
fcs3
fcs4
fcs0
fcs1
ent16
ent20
ent12
SA0
fcs7
fcs6
fcs15
fcs8
fcs5
EmpTy
EmpTy
EmpTy
2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
fcs11
ent24
1
ent28
ent32
fcs12
fcs13
fcs9
fcs10
ent40
ent44
ent36
SAS0
fcs17
fcs16
fcs19
fcs18
fcs14
EmpTy
EmpTy
EmpTy
2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
L8 L9 L10 L11 L8 L9 L10 L11 L8 L9 L10 L11 L8 L9 L10 L11
T6
Planar 2 Planar 1
T5 T6 T5
rootvg
00
rootvg
22
© 2009 IBM Corporation
Benefit of Boot from SAN
• 구성의 유연성 - Power 시스템의 Internal adapter & - Bay 등의 제한에 따른 LPAR 구성 한계를 극복
• High performance- 내장 Disk 대비 월등한 외장 스토리지의 I/O 성능 활용가능- 외장 스토리지의 Disk cache 활용
• High reliability- DDM 장애 등에 무관하게 시스템 영속성 보장- 외장 스토리지의 높은 가용성 활용- 내부 복제, 원격복제 솔루션 등 활용가능
• 용량 효율성- 외장 스토리지에 복수개의 OS, backup image 등을 위치시킬 수 있음- 기존의 DDM size 단위로 rootvg 할당대비 유연한 할당 가능 (ex, 20GB, 40GB)
• 손쉬운 AIX 백업 가능 – Flashcopy 가 “mksysb” 를 대체- Point-In-Time OS backup 을 여러 벌 확보 가능- OS 백업(Flashcopy target본)을 portable 하게 다른 시스템에서 바로 사용가능
p595
20GB
30GB
30GB
rootvg X1
X4
X3
datavg
appvg
I/O drawer * 2ea
16 core
60GB
© 2009 IBM Corporation
Networks
• Network은 IP and Non-IP network로 구성됨. IP-Network과 Non-IP Network을 통해서 node간에통신하며 heartbeat을 주고 받음
• Network 장애인지 Node 장애인지를 구분하기 위해서 non-IP network 구성이 반드시 구성되어야 함
• HACMP는 다음의 세가지 type의 failure만을 다룸:- Network interface card (NIC) failures- Node failures- Network failures
VS.
© 2009 IBM Corporation
Partitioned Cluster (Split Brain)
• IP network 문제로 heartbeat check가 정상적으로 이뤄지지 않아 각 node는 살아있음에도 상대 node가 down 되었다고 판단하여 서로가 takeover를 시도함.
• 이로 인해 양 node에서 RG가 동시에 online되는 현상이 발생.
• 공유 disk를 사용하는 application이 동시에 기동됨에 따라 data corruption이 발생할 수도 있음
• Proving that Node Isolation caused the problem:– /tmp/clstrmgr.debug file– AIX error log entry GS_DOM_MERGE_ER
© 2009 IBM Corporation
Disk Heartbeating vs. RS232 Links
• HACMP 5.1에서 소개된 기능 : non-IP heartbeating을 위해 SAN 환경을 활용
• 사용 이유 :- RS232 cables의 거리 제약- integrated serial ports의 부족- 일부 model은 integrated port를 heartbeat용으로 사용하는데 제약사항이 있음- Clusters with more than two nodes may require an async adapter with a RAND
• 필요사항 : - 작은 size의 disk 또는 LUN (ex. 1GB)- bos.clvm.enh fileset 설치- An Enhanced Concurrent Mode VG
(RG로 정의될 필요는 없음)
© 2009 IBM Corporation
Checking for Fast Failure Detection (FFD)
# lssrc –ls topsvcs | grep FastFast Failure Detection enabled
# odmget –q name=diskhb HACMPnimHACMPnim:
name = "diskhb"desc = "Disk Heartbeat Serial protocol"addrtype = 1path = "/usr/sbin/rsct/bin/hats_diskhb_nim"para = “FFD_ON“grace = 60hbrate = 3000000cycle = 8gratarp = 0entry_type = "adapter_type"next_generic_type = "transport"next_generic_name = ""src_routing = 0
Note: new feature starting with HACMP 5.4This change will not take effect until the NIM is recycled during a cluster restart
Reason to set this option:
In the event of a crash this option will allow for the takeover to start up immediately instead of waiting for the Failure Detection Rate timeout to pass.
© 2009 IBM Corporation
Important network best practices for high availability
• IP address, subnetmasks, switch port setting, VLAN 등의 network단에서의 변경에 조심 - 장애 감지는 node당 최소 두장의 물리적 adapter가 동일한 물리적 network/VLAN 안에서 있을 때 가능함
• 최소 한 개의 non-IP network을 구성
• Network 가용성 확보 차원에서 HACMP에서 Etherchannel을 구성하여 사용하면 유용함
• 구성 시 secondary switch로 연결된 backup adapter를 포함시켜 구성할 것
• HACMP는 etherchannel 구성을 단일 adapter network으로 간주. adapter 장애 시 문제해결 지원을위해 netmon.cf file을 별도로 구성. Cluster 외부의 다른 interface로 ICMP echo request(ping)를 전송해서 adapter 장애를 판단
• 각 node에 persistent IP를 구성- 원격 관리, monitoring에 유용함
© 2009 IBM Corporation
HACMP Topology Considerations
• IPAT via Replacement vs. IPAT via Aliasing *Considerations:- Max number service IPs within HACMP network- Hardware Address Takeover (HWAT)- Speed of Takeover- Firewall Issues
Node A Node B
en0 – 9.19.10.1 (boot) en0 – 9.19.10.2 (boot)
en0 - 9.19.10.28 (service IP) en0 – 9.19.10.2 (boot)
en1 – 192.168.11.1 (standby) en1 – 192.168.11.2 (standby)
net_ether_0
IPAT via Replacement
© 2009 IBM Corporation
HACMP Topology Considerations
• Contrast between Replacement & AliasingConsiderations:- Max number service IPs within HACMP network- Speed of swap- Hardware Address Takeover (HWAT)- Firewall Issues
Node A Node B
en0 – 192.168.10.1 (base1) en0 – 192.168.10.2 (base1)
9.19.10.28 (persistent a) 9.19.10.29 (persistent b)
9.19.10.51 (service IP)
en1 – 192.168.11.1 (base2) en1 – 192.168.11.2 (base2)
9.19.10.50 (service IP)
net_ether_0
IPAT via Aliasing
© 2009 IBM Corporation
Etherchannel with HACMP• Example of an EtherChannel (Backup) Configuration
Network Switches
ent0
ent1
ent0
ent1ent2 ent2en2 10.10.100.1 boot boot 10.10.101.1 en2
192.168.10.1 persistent a
192.168.10.2 service IP
persistent b 192.168.10.4
Network: net_ether_0
Verification Messages:
For nodes with a single Network Interface Card per logical network configured, it is recommended to include the file '/usr/es/sbin/cluster/netmon.cf' with a "pingable“ IP address as described in the 'HACMP Planning Guide'.
WARNING: File 'netmon.cf' is missing or empty on the following nodes:Node A Node B
Node A Node B
primary secondary
© 2009 IBM Corporation
Netmon.cf
• Etherchannel과 같은 단일 adapter의 장애를 HACMP가 정확하게 판단하기가 어려울 수 있음
• RSCT Topology Services가 단일 adapter의 정상 작동 여부를 확증하기 위해 packet을 강제적으로 전송할 수 없기 때문임
• Etherchannel과 같이 single adapter network 구성에서는 netmon.cf file을 생성. - 일반적으로 default G/W IP 주소를 사용- 다른 node로 부터의 heartbeat packet 수신이 안되면, 미리 설정되었던 G/W 로 ping을 시도
• /usr/sbin/cluster/netmon.cfEx) 180.146.181.119
steamerchowder
180.146.181.121
© 2009 IBM Corporation
Topology environments with a Firewall
If multiple IPs on the same subnet are configured on the same interface AIX will utilize the 1st one configured for the outbound traffic.
Tip:If you only need to manage one service IP per HACMP network consider using IPAT via Replacement to avoid having multiple IPs on the same interface.
Node X
en0 – 10.10.10.1 boot9.19.51.1 persistent IP9.19.51.2 service IP1
en1 – 10.10.11.1 boot
Firewall
Clients can talk to the cluster nodes, but node initiated traffic from the
9.19.51.X network will look like its coming from the
persistent IP not the service IP
LAN
Network set to use IPAT via Aliasing
© 2009 IBM Corporation
Topology environments with a Firewall
Overriding default AIX behavior sometimes requires some creativity
Work around 1:Perform an ifconfig down of the persistent alias within the application start script followed by an immediate ifconfig up to make it the second IP on the list
Node X
en0 – 10.10.10.1 boot9.19.51.2 service IP9.19.51.1 persistent IP
en1 – 10.10.11.1 boot
Firewall
Clients still talk to the cluster nodes via both IPs, but node initiated traffic now from the 9.19.51.X network will look like its
coming from the service IP
LAN
Network set to use IPAT via Aliasing
© 2009 IBM Corporation
Topology environments with a Firewall
IPAT via Replacement will only host one IP on an interface at any given time hence avoiding multiple IPs within the same subnet
Work Around 2:If you only need to manage one service IP per HACMP network consider using IPAT via Replacement to avoid having multiple IPs on the same interface.
Node X
en0 – 9.19.51.1 boot
en1 – 10.10.11.1 standby
Firewall
When HACMP is activated the base address will be
replaced by the new service IP address that the clients
use to connect to
LAN
Network set to use IPAT via Replacement
en0 – 9.19.51.2 service IP1
© 2009 IBM Corporation
Enhanced Concurrent VG
• AIX 5L V5.1 처음 소개됨• HACMP에서 사용 가능한 모든 Disk에서 구현 가능함• JFS and JFS2 filesystems 지원
– File systems은 한번에 한 node에만 mount• 기존의 classic concurrent volume groups을 대체• Enhanced concurrent VGs는 다음의 기능을 사용하기 위해 필요함 :
– Heartbeat over disk for a non-IP network– Fast disk takeover
© 2009 IBM Corporation
Converting VG to ECM
• Stop Cluster
• 한 Node씩 아래와 같은 절차를 수행- varyonvg <VGNAME>- chvg -C <VGNAME>- varyoffvg <VGNAME>
• 모든 Node에서 lsattr -El <VGNAME> 을 통해 VG 의 속성이 이상이 없는지 확인auto_on n N/A Trueconc_auto_on n N/A Trueconc_capable y N/A True
• Verification and synchronization
© 2009 IBM Corporation
Adapters
• 다중 port adapter를 사용할 경우 한 adapter내에서 빈 port를 이용하여 backup을 구성해서는 안됨.
• Built-in Ethernet adapter를 사용할 경우, Node내에 별도의 backup adapter가 있어야 함
• 가능하면 별도의 IO Drawer 또는 서로 다른 backplane, BUS에 adapter를 위치시켜서 이중화 구성을해야 함
© 2009 IBM Corporation
Applications & application Scripts
• 자동화
- 관리자에 의한 수작업 없음
• 일부 application들은 uname이나 serial no. 또는 IP address와 같은 특정한 OS 특성과 밀접한 관계를보이는 경향이 있음 (ex. SAP)
• Application이 현재 running 중인지 확인- RG가 unmanaged 상태일 때, default startup option을 사용하면 HACMP가 application start script를 재 수행함
• Data 상태 확인. 복구가 필요한가?
• Correct Coding :- start with declaring a shell (ex. #!/bin/usr/ksh)- exit with RC=0- application이 정말 중단되었는지 확인하는 절차 포함- fuser
• Smart Assist : DB2, Websphere, Oracle
© 2009 IBM Corporation
Application Monitoring
세가지 type의 monitoring 법• Startup Monitors – run one time• Process Monitors – check specified process instance in the process table• Custom Monitors – run your specified script during reiterating interval
Resource monitors 는 문제 발생 시 단순 알림(notify) event를 수행하도록 구성할 수도 있고, 정상인 상대 node로 서비스가 fallover 되도록 구성할 수도 있음
Don’t stop at just the base configuration - with thorough testing these can be great tools to automate recovery and save an admin time.
© 2009 IBM Corporation
Testing Best Practices
• Production으로 이행하기 전에 application scripts와 application monitoring을 철저히 test해야 함
• 모든 방향으로의 fallover를 test 해야 함
• Test Cluster - Lpars within same frame- Virtual resources
• 가용한 Tool을 활용 – Cluster Test Tool- Further customization enhancements in HACMP 5.4
• 정기적인 test 계획이 세워져야 함 – ex) node fallover and fallback test- 최소 반기 1회 이상
© 2009 IBM Corporation
Maintenance
• 통제된 관리 환경
- 고 가용성을 확보하기 위해 가장 중요한 것 : 엄격한 변경제어, 변경절차 관리, Test
• Cluster node에 변경작업을 하기 전에 HACMP snapshot을 받아 놓아야 함
• OLPW(Online Planning Worksheet)을 이용한 HTML 보고서 생성- http://www-03.ibm.com/systems/power/software/availability/aix/apps/download.html
• CSPOC operations - Cluster Single Point of Control - No TIPing (Testing In Production) : Production 환경에서는 절대로 test 해서는 안됨
• Production과 동일한 test cluster 시스템 유지 – ex) PowerVm을 이용한 test환경 구축
• 문서화된 복구 절차
- test를 통해 복구절차와 예상되는 결과를 문서화 해서 활용
© 2009 IBM Corporation
Documenting the Environment with OLPW
• Online Planning Worksheets: HTML report file
© 2009 IBM Corporation
What about PowerHA and distance?
• Two options, Remote PowerHA or PowerHA/XD• Remote PowerHA
– Systems see same copy of data (shared access to same LUNs)– LVM mirroring (with the optional but recommended Cross-site LVM Mirroring configured) keep copies in sync across sites– System-down condition is a site-down condition– Cross-site LVM Mirroring facilitates integration/reintegration, managing LVM mirror copy consistency between sites
• PowerHA/XD– Systems see “own” copy of data that is replicated– Data replication mechanism keeps copies in sync across sites• GLVM (Geographic Logical Volume Manager)• Metro-Mirror in DS8000 or SVC– Local fallover option can be employed to keep a system-down condition local to the site– PowerHA/XD facilitates the integration/reintegration, including replication role reversal, between sites
• Again, networking (IP/non-IP) used for monitoring in both cases
© 2009 IBM Corporation
목차
Contents 1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
Case #1 – A사
NODE A NODE B
StorageSubsystem
2 Node Cluster (A + S)
IP Aliasing
Etherchannel : (A + B)
Rs232/MNDHB : 모두 사용
Application Monitoring : 사용
Takeover Test : 안함
Application Monitoring timeout이 짧아서 fallover 되는도중에 다시 fallback을 시도하는현상이 있었음(duration time을늘려서 정상 조치함)
Cluster Name: A사
Network: net_ether0
Network: rs232_net
Network: diskhb_net
RG1(NodeA, NodeB)
Service IPVolume GroupProduction App
Standby Node
© 2009 IBM Corporation
Case #2 – B사
NODE A NODE C
StorageSubsystem
RG3 (NodeC,NodeB)
Service IPVolume GroupProduction App
3 Node Cluster :(A + A + A)
IP Aliasing
Etherchannel : (A + B)
Rs232/MNDHB : 모두 사용
Application Monitoring : 미사용
Takeover Test : 주기적
Cluster Name: B사
Network: net_ether0
Network: rs232_net
Network: diskhb_net
RG1(NodeA, NodeB)
Service IPVolume GroupProduction App
RG2 (NodeB,NodeA)
Service IPVolume GroupProduction App
NODE B
© 2009 IBM Corporation
Case #3 – C사
NODE A NODE D
StorageSubsystem
RG4
Service IPVolume GroupProduction App
4 Node Cluster :(A + A + A +A)
IP Aliasing
Etherchannel : (A + B)※ Active 2장
Rs232 : 사용
Application Monitoring : 미사용
Takeover Test : 부정기적
Cluster Name: C사
Network: net_ether0
Network: rs232_net
RG2
Service IPVolume GroupProduction App
NODE B
RG3
Service IPVolume GroupProduction App
NODE B
RG1
Service IPVolume GroupProduction App
© 2009 IBM Corporation
Case #4 – D사
1 x 64 way IBM System p6 595
13 dedicated partitions for DB19 Dedicated partitions for App
8 VIOS with production clients
NIM/CSM partition
1 x 64 way IBM System p5 595
13 dedicated partitions for failover 9 Dedicated partitions for App
8 VIOS with production clients
NIM partition
ThinkCenter
1394
P72
ThinkCenter
1394
P72
ThinkCenter
1394
P72
ThinkCenter
1394
P72
Total StorageDS8300 Total Storage
DS8100
HMC(Redundant)
40 miles
1 x 16 way IBM System p6 570
2 VIOS with 20 VIO clients for Tivoli applications
1 x 16 way IBM System p6 570
2 VIOS with 13 VIO clients for Dev/QA servers
Site 1(IDC) Site 2(Juneau)
No Physical Adapters except VIOs !
No Physical Adapters except VIOs !
Power 595 Power 595
1,400 Business Applications are planning to run!
© 2009 IBM Corporation
목차
Contents 1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
VSCSI General Diagram
FRAME 1
Hypervisor
VIO Server
vhost0
AIX Client LPAR 1
hdisk0 rootvgvscsi0
hdisk1 datavg1
AIX Client LPAR 2
scsi0
vhost1
hdisk0 rootvg
hdisk1 datavg2vscsi0hdisk2(72GB)
WholeDisk
Mapping(no_reserve)
hdisk0(5GB)
rootvg
hdisk1(36GB)
(5GB)
(10GB)
(21GB) free partitions
client_lunsvg
client1_datavg_lv
client1_rootvg_lv
VIO Client로 lun을 export 하는 2가지 방법:• Logical volumes within VG• Whole Disk mapping – required when sharing across VIO Servers
(Enhanced Concurrent VGs required on VIO Clients)
© 2009 IBM Corporation
VSCSI General Diagram – (Mapping entire Disk)
FRAME 1
Hypervisor
AIX Client LPAR 2
VIO Server 1
scsi0
vhost0
hdisk0 rootvg
hdisk1 datavgvscsi0
hdisk1(72GB)
WholeDisk
Mapping(no_reserve)
hdisk0(5GB)
rootvg
StorageSubsystem
VIO Server 2
vhost0hdisk1(72GB)
WholeDisk
Mapping(no_reserve)
hdisk0(5GB)
rootvg vscsi1MPIO
© 2009 IBM Corporation
Virtual SCSI & HACMP 구성 절차
• On Storage device- 대응하는 VIO server 두 대에 Luns 할당
• On HMC- Mappings 정의– (vhost & vscsi)
• On VIO Server 1- “no_reserve” 속성 설정
chdev -l <hdisk#> -a reserve_policy=no_reserve –a algorith=round_robin
- 각 client로 luns exportmkvdev –vdev hdisk# -vadapter vhost0mkvdev –f –vdev hdisk# -vadapter vhost1
• On VIO Server 2- “no_reserve” 속성 설정
chdev -l <hdisk#> -a reserve_policy=no_reserve
- 각 client로 luns exportmkvdev –vdev hdisk# -vadapter vhost0mkvdev –f –vdev hdisk# -vadapter vhost1
© 2009 IBM Corporation
Virtual SCSI & HACMP 구성 절차(Cont.)
• On Clients- MPIO SDDPCM 설치- 첫 번째 client에서 공유 VG(volume group)을 ECVG (Enhanced Concurrent VG)로 생성
(bos.clvm.enh fileset이 필요함) - Client 1에서 varyoffvg- Client 2에서 importvg- Define to HACMP as a shared resource
© 2009 IBM Corporation
VSCSI Disks & HACMP (Same Frame - Single HBA on VIO Servers)
FRAME 1
Hypervisor
HACMP Node A
VIOS 2
VIOS 1hdisk0
hdisk1 }sharedvg
STORAGESUBSYSTEM
hdisk0
hdisk0
vhost0 vscsi0
HACMP Node B
hdisk1 }sharedvgvscsi0vhost0
no_reserve
What is wrong with this picture?This configuration works initially because HACMP has visibility to the shared disk from both servers. However, it does NOT provide VIO Server redundancy. If VIO Server1 were to fail or go down for maintenance HACMP Node 1 would not be able to see or utilize the disks.
© 2009 IBM Corporation
VSCSI Disks & HACMP (Same Frame - Single HBA on VIO Servers)
FRAME 1
Hypervisor
HACMP Node A
VIOS 2
VIOS 1hdisk0
hdisk1 }sharedvg
STORAGESUBSYSTEM
hdisk0
hdisk0
MPIOvhost0 vscsi0
vscsi1
HACMP Node B
hdisk1 }sharedvgMPIO
vscsi0
vscsi1
vhost1
vhost0
vhost1
For HACMP clients within the same frame you need an additional path connection from each VIO Server. This additional VSCSI lun export is accomplished by using the mkvdev command with the “-f” flag option.
no_reserve
© 2009 IBM Corporation
VSCSI Disks & HACMP (2 Frames - Single HBA on VIO Servers)
FRAME 1
Hypervisor
FRAME 2
HACMP Node A
VIOS 2
VIOS 1
hdisk0
VIOS 2
VIOS 1
hdisk1 }sharedvg
STORAGESUBSYSTEM
hdisk0
hdisk0
hdisk0
hdisk0
MPIO
vhost0
vhost0
vscsi0
vscsi1
Hypervisor
HACMP Node B
hdisk1 }sharedvgMPIO
vhost0
vhost0
vscsi0
vscsi1
no_reserveno_reserve
© 2009 IBM Corporation
VSCSI Disks & HACMP (2 Frames - Dual HBAs on VIO Servers)
FRAME 1
Hypervisor
HACMP Node A
VIOS 2
VIOS 1
hdisk1 }sharedvg
hdisk0
hdisk0
MPIO
vhost0
vhost0
vscsi0
vscsi1
MPIO
MPIO
HBA
HBA
HBA
HBA
FRAME 1
Hypervisor
HACMP Node B
VIOS 2
VIOS 1
hdisk1 }sharedvg
hdisk0
hdisk0
MPIO
vhost0
vhost0
vscsi0
vscsi1
MPIO
MPIO
HBA
HBA
HBA
HBA
hdisk0
no_reserveno_reserve
© 2009 IBM Corporation
HACMP View of Virtual SCSI
hdisk0
sharedvg
Resource Group 2
sharedvg
Resource Group 1
hdisk0
sharedvg
Resource Group 2
VSCSIShared disk
sharedvg
Resource Group 1
Node A Node B
PVID PVID
Enhanced Concurrent
Enhanced Concurrent
FRAME X
VIOS 2
VIOS 1
hdiskx
hdiskxH
ypervisor
HACMP Node A
hdiskx }sharedvgMPIO
vhost0
vhost0
vscsi0
vscsi1
hdiskx
no_reserve
© 2009 IBM Corporation
Virtual SCSI Summary
• Luns 은 VIO Servers에 할당됨- 다중(최소 2개) VIO Servers를 사용 (Frame 당 2개)
• “No Reserve” 속성이 반드시 설정되어야 함
• HACMP Clients에서 ECM VGs를 활용함
© 2009 IBM Corporation
Virtual I/O Network Terms
Hypervisor
AIX Client LPAR 1
ent1(virt)
en1en0
ent0(virt)
Virtual I/O Server (VIOS)
ent1(phy)
ent4(SEA)
ent2(virt)
AIX Client LPAR 2
ent0(virt)
en0
ent1(virt)
en1
ent0(phy)
Virtual Ethernet
PVID=10
Path Virtual ID (PVID)IEEE 802.1Q Virtual LAN (VLAN)
IEEE 802.3ad Link Aggregation (LA)Cisco EtherChannel (EC)
Interface
Shared EthernetAdapter (Acts as a layer 2 bridge)
VirtualEthernetAdapter
Physical EthernetAdapter
Ethernet Switch
ent3(LA)
PVID=1PVID=10 PVID=1
Link AggregationAdapter (Combines physical adapters)
ent5(virt)
en5
ChannelGroup
© 2009 IBM Corporation
Virtual Ethernet & HACMP (No Link Aggregation / Same Frame)
Frame 1
Virtual I/O Server (VIOS2)
Hypervisor
AIX Client LPAR 1
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent4(SEA)
ent2(virt)
AIX Client LPAR 2
ent0(virt)
en0
ent0(phy)
Ethernet Switch
ent0(phy)
ent4(SEA)
ent2(virt)
Ethernet Switch
ent6(virt)
ent5(virt)
ControlChannel
ControlChannel
ent5(virt)
en6
ent6(virt)
en6
PVID 99
PVID 10
This is a diagram of the configuration required for SEA fallover across VIO Servers. Note that Ethernet traffic will not be load balanced across the VIO Servers. The lower trunk priority on the “ent2” virtual adapter would designate the primary VIO Server to use.
© 2009 IBM Corporation
Virtual Ethernet & HACMP (Link Aggregation / Same Frame)
Frame 1
Virtual I/O Server (VIOS2)
Hypervisor
AIX Client LPAR 1
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent1(phy)
ent4(SEA)
ent2(virt)
AIX Client LPAR 2
ent1(virt)
en1
ent0(phy)
Ethernet Switch
ent3(LA)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
Ethernet Switch
ent6(virt)
ent5(virt)
ControlChannel
ControlChannel
ent5(virt)
en0
ent6(virt)
en0
PVID 99
PVID 10
Note that Ethernet traffic will not be load balanced across the VIO Servers. The lower trunk priority on the “ent2” virtual adapter would designate the primary VIO Server to use.
© 2009 IBM Corporation
Virtual Ethernet & HACMP (Independent Frames)Virtual I/O Server (VIOS2)
Hypervisor
AIX Client LPAR 1
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
Ethernet Switch
ent3(LA)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
Virtual I/O Server (VIOS2)
Hypervisor
AIX Client LPAR 2
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
Ethernet Switch
Frame1
Frame2
ent5(virt)
ent5(virt)
ent5(virt)
ent5(virt)
ControlChannel
ControlChannel
ControlChannel
ControlChannel
© 2009 IBM Corporation
HACMP Node 1
HACMP View of Virtual Ethernet (IPAT via Aliasing)
HACMP Node 2
en0
FRAME 2
net_ether_0
192.168.100.1( base address)
192.168.100.2( base address)
FRAME 1
en0
9.19.51.20 (service IP)9.19.51.10 (persistent IP)
(service IP) 9.19.51.21(persistent IP) 9.19.51.11
Topsvcs heartbeating
serial_net1
Virtual I/O Server (VIOS2)
Hypervisor
AIX Client LPAR
en0
ent0(virt)
Virtual I/O Server (VIOS1)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
ent0(phy)
ent4(SEA)
ent2(virt)
ent1(phy)
ent3(LA)
FRAME X
ent5(virt)
ent5(virt)
ControlChannel Control
Channel
© 2009 IBM Corporation
Virtual Ethernet – Additional considerations
• 단일 Adapter Network에 대해서는 항상 netmon.cf file을 구성: /usr/es/sbin/cluster/netmon.cf
In virtualized environments:
9.12.4.11
9.12.4.13
9.12.4.11
!REQD en2 100.12.7.9
9.12.4.13
!REQD en2 100.12.7.10-
Typical File:
Most adapters will use netmon in the traditional manner, pinging9.12.4.11 and 9.12.4.13 along with other local adapters or knownremote adapters, and will only care about the interface's inbound byte count for results.
interface en2 will only be considered up if it can ping either 100.12.7.9 or 100.12.7.10
Note:There are additional !REQD formats that may be used within the netmon.cf file outlined in the description of APAR IZ01332
© 2009 IBM Corporation
목차
Contents 1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
Do’s
• 가능하면 IPAT via Aliasing와 enhanced concurrent VG를 사용
• 모든 공유 LVM 요소들은 unique한 이름으로 생성
• 주기적인 cluster snapshot과 system backup을 받을 것
• 철저한 계획과 충분한 Test 할 것(ex: application script)
• 가용성 증대와 자가 치유(복구)를 지원하기 위해 application monitoring을 구성할 것
• 변경에 대한 충분한 test를 위해 test 환경을 구축할 것
• 최소 한 개 이상의 non-IP network을 포함하여 신뢰할만한 heartbeat망을 구축할 것
• cluster에서 문제가 발생 할 경우 SNMP를 통한 alert나 SMS, email 전송 시스템 구성
• 가용한 HACMP features를 활용할 것 : application monitoring, extended cluster verification methods, ‘automated’ cluster testing (in TEST only), file collections, fast disk takeover, fast failure detection.
© 2009 IBM Corporation
Don’ts
• Node 한쪽만 변경하고 다른 node는 sync하지 않는 것. - 항상 모든 변경이 이뤄진 즉시 sync할 것. - 만약, 한 node는 up이고 다른 node들은 down이라면 변경이 적용되고 sync되는 것은 살아있는(active)node에서실행할 것
• HACMP menu외에서 변경작업 시도. CSPOC 사용할 것
• 지나치게 복잡한 HACMP 구성, test하기 힘든 구성
• 기본 application start, stop script 구현- Pre-requisite 확인 절차와 error 복구 routine을 포함하지 않음. - 항상 script들이 표준 입, 출력 관련 상세 log를 쌓도록 작성할 것
• failover시간을 늘어지게 하는 filesystem 구성- 상호 의존관계(dependency)나 처리를 위한 별도의 절차와 대기 시간(wait)을 생성
© 2009 IBM Corporation
Don’ts
• 훈련되지 않고 cluster를 잘 모르는 관리자(admin.)에게 root 권한 제공
• 신중하고 충분한 고려 없이 network에 대한 failure detection rate 변경
• Application stop시 “# kill `ps –ef | grep appname | awk ‘{print $2}’`” 이 내용은 HACMP application monitor도 kill 시킬 수 있음.
• DB가 raw lv를 사용하는데 standard AIX VGs(Original VG)로 구성하는 것- Big 또는 scalable VG로 구성할 것- lvcb의 copy본이 VGDA에도 생성되며, 시스템이 halt될 경우 DB 가 AIX 에서 reserve 한 첫 4K block 에 write 가능. 그렇지 않을 경우(Original VG사용) halt시 db가 깨지는 현상 발생
• 수작업 형태의 절차를 이용하여 Application의 고 가용성을 관리하는 것
© 2009 IBM Corporation
목차
Contents 1. HACMP 개요
2. 요소별 고려사항
3. 실 고객 구성 사례
4. HACMP with PowerVM
5. Do’s and Don’ts
6. 첨부
© 2009 IBM Corporation
Parameter Tuning - Network
• Failure Detection Parameters• The time needed to detect a failure : (heartbeat rate) X (cycles to fail) X 2
- Cycles to fail (cycle) : The number of heartbeats missed before detecting a failure- Heartbeat rate (hbrate) : The number of seconds between heartbeats.
• For IP Networks
• For Non-IP Networks
IP Network SettingSeconds betweenHeartbeats
Failure CycleFailureDetection Rate
Slow 2 12 48
Normal 1 10 20
Fast 1 5 10
IP Network SettingSeconds betweenHeartbeats
Failure CycleFailureDetection Rate
Slow 3 8 48
Normal 2 5 20
Fast 1 5 10
© 2009 IBM Corporation
Parameter Tuning – Network(Cont.)
• The default setting for the failure detection rate is usually optimal• Be careful to change setting, fast or too low custom values can cause false takeover to
occurex)
© 2009 IBM Corporation
Parameter Tuning – DMS checklist
• False takeover를 방지하기 위한 설정으로 가급적 아래의 내용대로 수정을 권고합니다
가이드라인 확인방법 / 가이드 권고 내용
IO pacing 확인1.#lsattr -El sys0check the maxpout & minpout
P4High water mark: 33Low water mark: 24
P5 이상(AIX 5.3 or 6.1)High water mark: 8193Low water mark: 4096
syncd 주기 증가
Check:1.#ps -ef|grep syncd2.#pg /sbin/rc.bootAIX Default value is 6010 is recommended for most clusters
syncd frequency: 10
Failure Detection Rate(Notice: you must stop HACMP services on all cluster nodes before change the attribute)
1. If your network is always busy, SLOW is recommended to prevent DMS.
2. Synchronize the Cluster Topology and resources from the node on which the change occurred to the other nodes in the cluster.
FDR for ether networks is set to SLOWFDR for rs232 networks is set to SLOW
© 2009 IBM Corporation
변경 전 확인 절차(Check list)
• 필요한 변경인가?• 얼마나 긴급한 변경인가?• 얼마나 중요한 변경인가?• 변경이 cluster관점에서 미치는 영향은?• 변경이 허용되지 않을 경우 이것이 미칠 영향?• 변경에 필요한 모든 절차가 명료하게 이해되고 문서화 되었나?• 변경에 대한 Test는 어떤 식으로 이뤄졌나?• 필요할 경우 변경한 내용을 되돌릴 수 있는 계획은?• 변경은 언제로 일정이 잡혔나?• 사용자들에게 통보되었나?• 유지정비를 위해 계획된 시간에 변경 전 Full backup을 위한 시간과 변경이 실패 할 경우 복구를 위한 충분한 시간이 포함되었나?
© 2009 IBM Corporation
Test your cluster before going live! (Checklist)
• Careful testing of your production cluster before going live reduces the risk of problems later.
Test Item How to test Checked
Node Fallover
Network Adapter Swap
IP Network Failure
Storage Adapter Failure
Disk Failure
Clstrmgr Killed
Serial Network Failure
SCSI Adapter for rootvg Failure
Application Failure
Node re-integration
Partitioned Cluster
© 2009 IBM Corporation
Lifecycle of HACMP
• Support Life Cycle for HACMP (typically 3 year lifecycle)
Version Release Date End of Support Date
HACMP 5.1 July 11, 2003 Sep 1, 2006
HACMP 5.2 July 13, 2004 Sep 30, 2007
HACMP 5.3 Aug 12, 2005 Sept 30, 2009
HACMP 5.4 Nov 09, 2007 N/A
PowerHA 5.5 Nov 14, 2008 N/A
© 2009 IBM Corporation
HACMP Version Compatibility Matrix
AIX 4.3.3 AIX 5.1 AIX 5.1(62bit) AIX 5.2 AIX 5.3 AIX 6.1
HACMP 4.5 No Yes No Yes No No
HACMP/ES 4.5 No Yes Yes Yes No No
HACMP/ES 5.1 No Yes Yes Yes Yes No
HACMP/ES 5.2 No Yes Yes Yes Yes No
HACMP/ES 5.3 No No No Yes Yes Yes
HACMP/ES 5.4.0 No No No TL8+ TL4+ No
HACMP/ES 5.4.1 No No No TL8+ TL4+ Yes
PowerHA 5.5 No No No No TL9+ TL2,SP1+
top related