event monitoring service 1 ems hardware monitors 2002. 5. 16 강사 : 공 용섭 과장 hpcs/sdo/mc
TRANSCRIPT
Event Monitoring
Service
1
EMS Hardware Monitors
2002. 5. 16
강사 : 공 용섭 과장
HPCS/SDO/MC
Event Monitoring
Service
2
Agenda
•EMS 의 기초적인 이해
•EMS Monitors 의 설치
•EMS HW Monitors 의 운용
•HW Monitor 환경설정 파일
•PSM – Peripheral Status Monitor
•Review of Operation and Basic Trouble-shooting Guidance
3
Event Monitoring
Service 목 적
HW 관련된 문제들을 해결할 수 있도록 EMS HW Monitors 의 활용 즉 , 그 사용법과 설치 및 설정 그리고 시스템에 적용하는 방법을 알아봅니다 .
EMS HW Monitors 의 기초적인 troubleshooting을 도와줄 여러 가지 방법을 알아봅니다 .
Event Monitoring
Service
4
Event Monitoring
Service
SECTION 1:EMS 의 기초적인 이해
•EMS HW Monitor 의 장점•EMS 란 무엇인가 ?•EMS 환경에서 현재 사용중인 Monitors
•설정 가능한 notification methods
Event Monitoring
Service
5
Event Monitoring
Service
HP ResponseCenter
Event Monitoring ArchitectureH
P-U
X S
erv
ers
Events generatedby HP Servers
EMS HA*Monitors
Third PartyMonitors
Detected byMonitors
Event
Mon
itori
ng
Serv
ice F
ram
ew
ork
*
Relayed to EMSFramework
Customer
Enterprise Mgmt Applications*• HP OpenView• CA TNG
MC/ServiceGuard*
HP Predictive*
Distributed toNotification Device
Event Monitoring
Service
EMS HardwareMonitors
User DevelopedMonitors
6
Event Monitoring
Service
EMS HW Monitor 의 장점
•System downtime 감소•문제 분석 및 수리 시간 단축 •Default monitoring 환경 설정 •HW 자원 감시를 위한 일반적 툴 제공•여러 가지 통보 방법 제공•타 applications 과 적용 가능•최소 관리로 최대 효과
Event Monitoring
Service
7
Event Monitoring
Service Monitors currently utilizing EMS• AutoRAID Disk Array (armmon)• Chassis Monitor (dm_chassis, June
2001)• CMC Monitor (cmc_em, June 2001)• Core Hardware (dm_core_hw)• Disk (disk_em)• Disk Array FC60 (fc60mon)• Fast Wide SCSI Disk Array
(fw_disk_array)• Fibre Channel Adapter
(dm_FCMS_adapter)• Fibre Channel Adapter A5158
(dm_TL_adapter)• Fibre Channel Arbitrated Loop Hub
(dm_fc_hub)• Fibre Channel SCSI Multiplexer
(dm_fc_scsi_mux)• Fibre Channel Switch (dm_fc_sw)
• High Availability Disk Array (ha_disk_array)
• High Availability Storage System (See SES Enclosure Monitor)
• Itanium Core Hardware Monitor (ia64_corehw, June 2001)
• Kernel Resource (krmond)• LPMC (lpmc_em)• Memory (dm_memory)• Remote (RemoteMonitor)• SCSI Card (scsi123_em)• SCSI Tape Devices (dm_stape)• SES Enclosure Monitor
(ses_enclosure)• System Status (sysstat_em)• UPS (dm_ups)
Event Monitoring
Service
http://www.docs.hp.com/hpux/onlinedocs/diag/ems/ems_prod.htm
8
Event Monitoring
Service
EMS Monitors
Configuration Clients
Resources
Configuration Core/Framework
Target Apps
NotificationMethods
EMS 란 무엇인가?
• Event Monitoring Services 는 다음과 같은 내용을 제공하는 하나의 구조체 입니다 .:
• 자원 감시 설정을 위한 일반적인 툴 제공• 특정 자원에서 event 또는 critical value 발생시 여러 경로를 통해
통보• 표준 API 를 통해 새로운 resource monitor 를 쉽게 적용 가능
Event Monitoring
Service
9
Event Monitoring
Service HW Event Monitoring 구성 요소
• EMS
•Event 통보를 위한 구조체•Monitoring request manager ( monconfig)
• Hardware Monitors
•환경설정 files 및 툴• Support Tools Manager
•Event 를 기록 및 보여주는데 사용되는 low-level error handling components.
•STM 은 하나의 Map 을 제공하는데 , 이는 감시해야 될 장치와 그것을 결정하는 HW monitors 에 의해 사용된다 .
Event Monitoring
Service
10
Event Monitoring
Service
HW Device Driver
diaglogd
memlogd
Raw Error Log
STM
Raw Memory Log
OS
Memory
DecoderLogtoolFormatted
Logs
EMSFramework
HW Monitors
Event Monitoring
Notify
*OS MemoryAccess
Memory Subsystem
*OS Memory Access really means hooks in
the Kernel to access memory subsystem.
(i.e. read registers)
Diag2Pseudo-Driver
EMS HW Monitoring: 자원간 통신방법
Event Monitoring
Service
11
Event Monitoring
Service
STM
RegistrarPoll
Hard
ware
Syste
m
PSM
Notify
HA Monitors
HW Monitors
OnSite Pred. (emsscan)
RC Predictive
Monconfig
EMS GUI
Configure
MC/SG MC/LM
SAM
Package Configuration
CONFIGURATION CLIENTS
RES
OU
RC
ES
Poll
EMS MONITORS
CONFIGURATION CORE/FRAMEWOR
K
DictDB
P-Client
IT/O NNM
Up/DN
Values
Info-Min-Maj-Ser-Crit
Up/DN
(Base Path to Monitor Logs /etc/opt/resmon/log/)
registar.log
client.log
api.log api.log
armmon.log
fc60mon.log
emslog
client.log
EMS Monitor 구성도
TCP, UDP, SNMP, eMail, OPC, syslog, textlog, emslog, console
Event Monitoring
Service
NOTIFICATION METHODS
TARGET APPS
USER
12
Event Monitoring
ServiceEMS Event 통보 방법
• Messages written to the system:• SYSLOG• TEXTLOG : event.log• CONSOLE• Predictive Text File
• Messages sent via various protocols:• EMAIL• TCP• UDP• SNMP• OPC (Open View Messaging)
• Notification integrated with MC/ServiceGuard and MC/LockManager
Event Monitoring
Service
13
Event Monitoring
Service Target Applications
SNMP traps,TCP 또는 UDP 메시지 서비스를 지원하는 어떤 시스템관리 어플리케이션 이라도 event 통보를 받을 수 있습니다 .
HP Open View IT/O and HP Network Node Manager message templates for EMS are available at no charge as contributed software from the external web page: http://www.software.hp.com
http://www.software.hp.com/products/EMS/index.html
Event Monitoring
Service
14
Event Monitoring
Service
Notification Targets End User Actions
Event Monitoring
Service
METHOD
NOTIFICATION TARGET END USER ACTION
Write to syslog
•/ var/adm/syslog/syslog.log •Sys Admin reads syslog
Write to console
•System console •Sys Admin views console msg on display
Write to text log
•User defined text log
•Default: /var/opt/resmon/log/event.log
•Sys Admin reads text log
Write to Predictive log
•/ var/opt/pred/emslog
•Onsite Predictive scanner (emsscan) consults rule set and notifies RC if necessary
•RCE reads Predictive message and takes appropriate action
MC/SG
MC/LM
•MC/SG, MC/LM •MC/SG, MC/LM performs package fail-over
15
Event Monitoring
Service
Notification Targets End User Actions (cont’d)
METHOD NOTIFICATION TARGET END USER ACTION
Send via eMail
•eMail address •Sys Admin (or addressee) invokes eMail application and reads message
Send via TCP
•User written socket program – host and port specified
•Application dependent
•Sys Admin runs receiving application.
Send via UDP
•User written socket program – host and port specified
•Application dependent
•Sys Admin runs receiving application
Send via SNMP
•Any application configured to receive SNMP msgs
•Templates provided for integration with HP NNM
•Application dependent
•Sys Admin runs Network Node Manager, receives message. Visual change in HW icon
Send OPC format
•Templates provided for integration with HP OpenView IT/O
•Sys Admin runs OpenView IT/O, receives message. Visual change in HW icon.
Event Monitoring
Service
Event Monitoring
Service
16
Section 2:EMS Monitors 의 설치
•포함된 과정에 대한 개관•EMS Hardware Monitors 설치
Event Monitoring
Service
17
Event Monitoring
Service
OnlineDiag SupportTools Bundle
800
B4708AA
EMS-Core
EMS-Config
Contrib-ToolsLIF-LOAD
Sup-Tool-Mgr-800
B4708AA-1000EMS HW Event
Monitors
B7609BA
HP-UX 10.20HP-UX 11.00
Predictive
OnlineDiag SupportTools Bundle
700
B4708AA
EMS-Core
EMS-Config
Contrib-ToolsLIF-LOAD
Sup-Tool-Mgr-700
B4708AA-1000EMS HW Event
Monitors
B7609BA
HP-UX 10.20HP-UX 11.00
EMS HW Monitors: Product 구조Event
Monitoring Service
18
Event Monitoring
Service The Steps Involved
1. 최신의 Support Plus Media 로 부터 STM 설치 .2. 어떤 장치를 모니터하기 위해 특별히 필요한 사항이 있는지
지원되는 products list 확인 .3. Monitoring Request Manager 실행 :
/etc/opt/resmon/lbin/monconfig
4. Enable hardware event monitoring if release of media is earlier than June 1999
5. Default monitoring requests 가 적절한지 결정 .6. Add or modify monitoring requests as necessary.
7. Verify monitor operation (optional)
Event Monitoring
Service
NOTE: More information on special requirements can be found in the EMS Hardware Monitors User’s Guide, pages 30-34, or at http://www.docs.hp.com/hpux/onlinedocs/diag/ems/ems_prod.htm
19
Event Monitoring
Service EMS Hardware Monitors 설치
•HW Monitoring 을 위해 설치되는 Software components :
•All hardware event monitors
•Monitor configuration files
•Monitoring Request Manager
•EMS framework, including EMS graphical interface
Event Monitoring
Service
20
Event Monitoring
Service Supported System Configuration
•HP 9000 Series 700 or 800 Computer
•HP-UX 10.20 or 11.00, 11i
•Support Plus Media or http://www.software.hp.com
•If you are using MC/ServiceGuard (optional), you must have
•Version A.10.11 for HP-UX 10.20
•Version A.11.04 for HP-UX 11.0
Event Monitoring
Service
21
Event Monitoring
Service Online Diagnostic Products
OnlineDiag Support Tools Bundle HP-UX 10.20 B.10.20.22.xx HP-UX 11.00 B.11.00.17.xx HP-UX 11.11 B.11.11.03.xx HP-UX 11.20 B.11.20.01.xx
Supported on HP-UX 10.20 and 11.X EMS-Core: EMS Framework (B7609BA) EMS-Config: SAM interface to EMS (B7609BA)Sup-Tool-Mgr: STM, EMS HW Monitors, PSM,
monconfig (B4708AA)Predictive: Onsite Predictive, emsscan
Event Monitoring
Service
22
Event Monitoring
Service
• Special requirements for individual monitors:
•URL: http://www.docs.hp.com/hpux/onlinedocs/diag/ems/ems_prod.htm
• AutoRAID Disk Arrays - ARMserver
• Fibre Channel SCSI Multiplexers - F/W ver. 3840 이상• Fibre Channel Adapters
• Fibre Channel Arbitrated Loop Hubs
• Fibre Channel Switches
• System Chassis Code Monitor ( 11i only)
• HP UPSs - ups_mond
Event Monitoring
Service
Monitor-Specific InstallationTasks – (Example)
23
Event Monitoring
Service
Enable Monitoring
• IPR9906 - EMS HW Monitors are active by default after installation
•E-Mail message for enabling HW monitors is sent to root@system*
•HW monitors enabled from character-based configuration client called the “Monitoring Request Manager”
– /etc/opt/resmon/lbin/monconfig
• Default monitoring requests provided for HW event monitoring
•Users may accept these defaults or customize their monitoring configuration using monconfig
Event Monitoring
Service
24
Event Monitoring
ServiceEnable Monitoring – (cont’d)
• Default monitoring requests NOT provided for HW status monitoring. (PSM)
• Status monitoring configured using the EMS GUI
•EMS HW resources will show up in EMS GUI configuration client after specific HW monitoring has been added using monconfig
• IPR0006 - CONSOLE logging not enabled by default.
Event Monitoring
Service
25
Event Monitoring
Service
Event Monitoring
Service
EMS HW MONITORS EMS HA MONITORS
Monitor HW resources such as I/O devices, interface cards, and memory
Monitor disk, cluster,network, and system resources
All HP 9000 systems running HP-UX 10.20 or 11.x
Only HP 9000/800 systems running HP-UX 10.20 or 11.x
Distributed “free” on the Support Plus Media
Available from HP at extra cost
Works best in a high availability environment
Event monitoring via Monitoring Request Manager (monconfig)
Status Monitoring via a SAM GUI Status Monitoring via SAM GUI
EMS HW Monitors and EMS HAMonitors - Differences
26
Event Monitoring
Service
Section 3:EMS HW Monitors 의 운용
Event Monitoring
Service
•Default settings
•Listing
•Viewing
•Adding
•Modifying
•Verifying
•Checking status
•Retrieving and interpreting event messages
•Removing/deleting/disabling
27
Event Monitoring
Service Monitoring Request 란 무엇인가 ?
•어떤 hardware 를 감시할 것인가 ?- HW 자원과 관련된 monitor 를 선택 .
•어떤 events 를 리포팅 할 것인가 ?- 리포팅 할 event 의 severity level 를 선택 .
•어떻게 통보를 해줄 것인가 ?- Notification 방법을 선택 .
Event Monitoring
Service
28
Event Monitoring
Service Monitoring Request Manager
•Monitoring is enabled by default
•Opening screen indicates if monitoring is currently enabled or disabled
•Must be logged on as root
•Type /etc/opt/resmon/lbin/monconfig
Event Monitoring
Service
29
Event Monitoring
Service
Monitoring Request Manager:Opening Screen
Event Monitoring
Service
EMS Version : A.03.20
STM Version : A.26.00
30
Event Monitoring
Service
Enabling Hardware Event Monitoring
•Support Tools bundle 이 설치될 때 자동으로 enable 됨
•Run the Hardware Monitoring Request Manager
•/etc/opt/resmon/lbin/monconfig
•Enter “E” from the main menu selection prompt
Event Monitoring
Service
31
Event Monitoring
Service HW Monitoring Request ManagerEvent
Monitoring Service
Event Monitoring
Service
32
Default Monitoring Requests
SEVERITY LEVELS
NOTIFICATION METHOD
All (>= INFORMATION)
TEXTLOG file:
/var/opt/resmon/log/event.log
Major Warning,Serious, Critical
SYSLOG:/var/adm/syslog/syslog.log
Major Warning, Serious, Critical
EMAIL, address: root
Event Monitoring
Service
33
Event Monitoring
ServiceListing Monitor Descriptions
•특정 HW 자원에 적합한 monitor 를 선택할 때 사용 .•각각의 monitor 가 어떤 HW 자원을 지원하는지
확인할 때 사용 .•사용 가능한 monitor 들의 내용을 나열한다 .
•Run the Hardware Monitoring Request Manager– /etc/opt/resmon/lbin/monconfig
•Enter ‘L’ from the main menu selection prompt
Event Monitoring
Service
http://www.docs.hp.com/hpux/onlinedocs/diag/ems/emd_summ.htm
34
Event Monitoring
ServiceListing Monitor Descriptions
Event Monitoring
Service
35
Event Monitoring
Service
Viewing Current Monitoring Requests
•Monitoring requests 를 추가 또는 수정하기 전에 확인할 경우에 사용
•Monitoring and notification strategy 를 수행할 추가적인 requests 를 결정할 때 사용 .
•Run the Hardware Monitoring Request Manager
•/etc/opt/resmon/lbin/monconfig
•Enter ‘S’ from the main selection prompt
Event Monitoring
Service
36
Event Monitoring
ServiceCurrent Monitoring Requests
Event Monitoring
Service
37
Event Monitoring
ServiceAdding a Monitoring Request
• 특정 monitor 를 위해 통보 방법을 추가 및 설정할 때 사용 .• Run the Monitoring Request Manager
• /etc/opt/resmon/lbin/monconfig
• Enter “A” at the main selection prompt, then select
• Monitors to which this configuration can apply
• Criteria Thresholds (INFORMATION, CRITICAL, …)
• Criteria Operator (>=, =, …)
• Notification Method Prompt (EMAIL, SYSLOG, …)
• User Comment (Any desired comments)
• Client Configuration File Prompt
– (C)lear – default client configuration file
– (A)dd – specify a client configuration file
• Save request when prompted
Event Monitoring
Service
38
Event Monitoring
ServiceHardware Monitoring Requests
Hardware Event
Monitor
어떤 HW 를 모니터 할 것인지 설정한다 . 각 request 에 대해 여러 개의 monitor 를 선택할 수 있다 .
Severity Level:Critical = 5Serious = 4
Major Warning = 3Minor Warning = 2
Information = 1
Operator=><
>=<=!
+어떤 event 에 대해 리포팅 할 것인지 설정한다 . 각 request 에 대해 한 쌍의 설정 값을 선택할 수 있다 .
Notification Method
Event 가 발생시 어떻게 통보할 것인지설정한다 .각 request 에 대해 오직 하나의 통보 방법을 선택할 수 있다 .
Event Monitoring
Service
39
Event Monitoring
Service
Critical An event that will or has already caused data loss, system down time, or other loss of service. System operation will be impacted and normal use of the HW should not continue until the problem is corrected. Immediate action is required to correct the problem.
If MC/ServiceGuard is installed and this is a critical component, a package fail-over WILL occur.
Serious An event that may cause data loss, system down time, or other loss of service if left uncorrected. System operation and normal use of the HW may be impacted. The problem should be repaired as soon as possible
If MC/ServiceGuard is installed and this is a critical component, a package fail-over WILL occur
Major
Warning
An event that could escalate to a Serious condition if not corrected. System operation should not be impacted and normal use of the HW can continue. The problem should be repaired at a convenient time.
If MC/ServiceGuard is installed and this is a critical component, a package fail-over WILL NOT occur.
Minor
Warning
An event that will not likely escalate to a more severe condition if left uncorrected. System operation will not be interrupted and normal use of the hardware can continue. The problem can be repaired at a convenient time.
If MC/ServiceGuard is installed and this is a critical component, a package fail-over WILL NOT occur.
Information
An event that occurs as part of the normal operation of the hardware. No action is required.
If MC/ServiceGuard is installed and this is a critical component, a package fail-over WILL NOT occur.
Event Monitoring
Service
Event Severity Levels & Interaction with MC/SG
40
Event Monitoring
Service
Event Monitoring
Service
Example: Adding a Monitoring
Request (AutoRaid)
41
Event Monitoring
Service
Event Monitoring
Service
Example: Adding a MonitoringRequest (cont’d)
42
Event Monitoring
Service
Event Monitoring
Service
Example: Adding a MonitoringRequest (cont’d)
43
Event Monitoring
Service
Event Monitoring
Service
Example: Adding a MonitoringRequest (cont’d)
44
Event Monitoring
Service
Event Monitoring
Service
Example: Adding a MonitoringRequest (cont’d)
45
Event Monitoring
ServiceModifying Monitoring Requests
•Run the Hardware Monitoring Request Manager
•/etc/opt/resmon/lbin/monconfig
•Enter “M” from the main selection prompt
•Enter the number of the request you want to modify
•Change the setting(s)
•Save request when prompted
Event Monitoring
Service
46
Event Monitoring
Service Modifying Monitoring RequestsEvent
Monitoring Service
47
Event Monitoring
Service
Remove/Disable EMS Hardware Monitoring
•Remove STM and/or EMS
– Run swremove
– Select B7609BA or OnlineDiag bundle
•Disable
– Run Hardware Monitoring Request Manager
– /etc/opt/resmon/lbin/monconfig
–Enter “K” from main menu selection prompt
–Confirm when prompted
Event Monitoring
Service
48
Event Monitoring
Service
Delete EMS HW Monitor Requests
Event Monitoring
Service
•Delete–Run Hardware Monitoring Request Manager
–/etc/opt/resmon/lbin/monconfig–Enter “D” from main menu selection prompt–Enter the number assigned to request to delete–Delete when prompted
49
Event Monitoring
ServiceDeleting A Monitoring Request
Event Monitoring
Service
50
Event Monitoring
Service
Verifying Hardware Event Monitoring
•Run the command send_test_event (introduced September 2000)
•Simulate a hardware failure or event
•Remove a disk from an array
•Unplug a cable
•Turn off the hardware resource
•Use known defective media
•Event messages will be generated
Event Monitoring
Service
51
Event Monitoring
Service
Checking Detailed Monitoring Status
•현재 적용된 모든 monitoring requests 를 나열한다 .
•현재 활성화된 request 만 보여준다 .•비 활성화된 monitor 는 “ NOT MONITORING” 로
표시된다 .•감시할 어떤 자원도 갖고 있지 않은 모든 monitor
들은 비활성화 된다 .
Event Monitoring
Service
52
Event Monitoring
Service
• Predictive adds an additional default request for FC monitors• List of Predictive-Enabled Monitors as of June 2001 for 11.00:
• dm_core_hw• disk_em• dm_FCMS_adapter• dm_TL_adapter• dm_fc_scsi_mux• ha_disk_array• dm_ses_enclosure• lpmc_em• dm_memory• RemoteMonitor• dm_stape• scsi123_em• sysstat_em• dm_ups
• Events with severity >= INFORMATION are written to /var/opt/pred/emslog and TEXTLOG
Event Management Service
Predictive
http://www.docs.hp.com/hpux/onlinedocs/diag/ems/ems_pred.htm
53
Event Monitoring
Service
Retrieving & Interpreting Event Messages
•Email 과 textfile 통보 방법은 전체 메시지의 내용을 보여준다 .
•다른 통보 방법으로 받은 내용은 “ resdata”utility를 사용하여 메시지를 볼 수 있다 .
Event Monitoring
Service
54
Event Monitoring
Service
Sample Event Message
Event Monitoring
Service
Event Monitoring Service Event Notification %<
Notification Time: Wed Sep 9 10:48:30 2000
Hpbs8684 sent Event Monitor notification information:
/storage/events/disks/default/10_4_4.0.0 is >= 1.
Its current value is CRITICAL(5)
Event data from monitor:
Event time: Wed Sep 9 10:48:30 2000
Hostname: hpbs8684.boi.hp.com IP Address : 15.62.120.25
Event Id: 0x00356B15e00000000 Monitor : disk_em
Event # : 100037 Event Class: I/O
Severity: CRITICAL
Disk at hardware path 10/4/4.0.0 : Media Failure
Associated OS error log entry id(s):
000000000000000000
Description of Error:
The device was unsuccessful in reading data for the current I/O request due to an error on the medium. The data
could not be recovered. The request was likely processed in a way which could cause damage to or loss of data.
Probable Cause / Recommended Action:
The medium in the device is flawed. If the medium is removable, replace the medium with a fresh one. Alternatively,
if the medium is not removable, the device has experienced a hardware failure. Repair or replace the device, as necessary.
55
Event Monitoring
Service
Information Contained in an Event Message
• Standard information:
• Notification time
• Value that triggered event
• Event data from monitor
– Event time, hostname, event #, severity, IP address, etc.
• Description of Error
• Probable Cause
• Recommended Action
• HW Resource Information
• Product Information
– Path, FRU, ID
• SCSI Status and Sense Data
Event Monitoring
Service
56
Event Monitoring
Service
Section 4:HW Monitor Configuration Files
Event Monitoring
Service
•각각의 HW monitor 와 관련된 설정 파일 형태 .
•설정 값들 .
•수정 가능한 설정 값들 .
•Configuration parameters 변경 .
•When changes take effect
57
Event Monitoring
Service설정 파일의 형태
• Control the operation of each HW event monitor
•Global.cfg
• Monitor Specific files
•monitor_name.cfg
•default_monitor_name.clcfg (multiple-view)
• Start up Specific files
•monitor _name.sapcfg
• PSM monitor specific files
•monitor_name.psmcfg
Event Monitoring
Service
58
Event Monitoring
Service HW Monitor 설정 파일들
• Global monitor configuration file. 이 파일에 정의된 설정 값들은 모든 monitor 에 유효하며 , monitor-specific file 보다 후 순위 우선권을 갖는다 ./ var/stm/config/tools/monitor/Global.cfg
• Monitor-specific configuration file. 각 monitor 들은 최적화된 설정 값을 이 파일에 포함하고 있다 . 이 설정 값들은 global configuration file 보다 우선순위를 갖는다 .
/var/stm/config/tools/monitor/monitor_name.cfg
• June 2000 버전부터 몇 가지 hardware monitor 는 "multiple-view" (Predictive-enabled) 로 전환되었는데 , 이 monitor 들은 다른 설정파일을 사용한다 . 즉 , Client Configuration File./var/stm/config/tools/monitor/default_monitor_name.clcfg
• 다음과 같은 공통적인 operating parameters 를 가지고 있다 .:– Polling interval, Repeat Frequency, Severity Actions, and
Event Definition
Event Monitoring
Service
59
Event Monitoring
Service
HW Monitor 설정 파일들 (cont.)
•Monitor Startup Specific files– /var/stm/config/tools/monitor/monitor_name.sapcfg
– Default information for that specific monitor
•PSM Monitor Specific files– /var/stm/config/tools/monitor/
monitor_name.psmcfg
– Optimized operating settings for specific monitors
Event Monitoring
Service
60
Event Monitoring
Service
Operating Parameters: Polling Interval
• HW 상태 점검을 위한 Polling 주기를 설정 .•시스템 성능을 고려하여 설정 값을 선택 .
•Default in Global.cfg and monitor_name.cfg :POLL_INTERVAL 60 # in minutes (one hour)
•Monitor 가 enable 된 후 부터의 시간임• 변경 이유 :
•HW 와 관련된 잠재적인 문제들을 줄이기 위해 .
•Global.cfg 파일보다 개별적인 Monitor 설정 파일의 값을 수정 .
Event Monitoring
Service
61
Event Monitoring
Service
Operating Parameters:
Repeat Frequency
• 같은 event 에 대해 얼마나 자주 리포팅 할 것인가 ?•계속적인 같은 메시지 발생에 대해 시스템 부하를
덜어주기 위해 .•Default in Global.cfg :
REPEAT_FREQUENCY 1440 # in minutes (one day)
– Default is once per day ( 하루 한번 )• 변경 이유 :
•한 event 에 대해 좀 더 자주 리포팅 할 필요가 있을 때 .
Event Monitoring
Service
62
Event Monitoring
Service
Operating Parameters: Severity Actions
• 특정 Severity 에 대해 EMS 에 리포팅 할 것인지 , 무시할 것인지 결정한다 .
• Defaults in Global.cfg :
– SEVERITY_ACTION CRITICAL NOTIFY
– SEVERITY_ACTION SERIOUS NOTIFY
– SEVERITY_ACTION MAJOR_WARNING NOTIFY
– SEVERITY_ACTION MINOR_WARNING NOTIFY
– SEVERITY_ACTION INFORMATION NOTIFY
• 변경 이유 :• 덜 중요한 events 에 대해 무시할 수 있도록 “ IGNORE” 로
변경한다 .
Event Monitoring
Service
63
Event Monitoring
Service
Operating Parameters: Events
• Monitor 에 의해 조정될 event 를 정의• Severity level 를 정의• Event 발생시 취해질 행동에 대해 정의• Format in Global.cfg and monitor_name.cfg:
config-verb event no. severity action #descriptionDEFINE_EVENT 10001 CRITICAL DEFAULT #comments here
• Format in default_monitor_name.clcfg: EQ:event_number:severity:enable flag:suppression_time:time_window:threshold: value threshold1:operator1:operator2:value threshold2
EQ:3:CRITICAL:TRUE:1440:ANY:1:NONE:NO_OP:NO_OP:NONE
• 변경 이유 :•모든 환경에서 severity level 이 event 의 중요도를
모두 반영하지는 못함 .•특정 event 를 무시하기 위해 .
Event Monitoring
Service
64
Event Monitoring
Service
Startup Configuration Files
• Contain monitoring request definitions for each monitor
• /var/stm/config/tools/monitor/monitor_name.sapcfg
• Format:
– MONITOR: /storage/events/disk_arrays/FW_SCSI
– Criteria Threshold: INFORMATION
– Criteria Operator: >=
– Target Type: TEXTLOG
– Target TEXTLOG File: /var/opt/resmon/log/event.log
• 초기 구동 시 , ioscan 및 monconfig 가 수행될 때 , 이 파일의 내용이 사용되어 진다 .
• 변경 이유 :• Monitoring requests 를 고객 환경에 맞추기 위해 .
Event Monitoring
Service
65
Event Monitoring
Service
Startup Configuration File Entries
Event Monitoring
Service
Keyword
Values Description
MONITOR
(required)
A valid event monitor resource path
Identifies HW event monitor to which entry applies. Entries must use resource path for monitor being configured. Note: This must be the first keyword in each entry.
Criteria
Threshold
(required)
Valid values include:
Critical
Serious
Major_Warning
Minor_Warning
Informational
Defines severity level used as notification criteria threshold.
Criteria
Operator
(required)
Valid operators are:
< less than
<= less than or equal to
> greater than
>= greater than or equal to
! not equal to
Identifies arithmetic operator used with criteria threshold to control what events are reported. Operator treats each severity level as a numeric value assigned as follows:
Critical = 5 Minor warning = 2
Serious = 4 Informational = 1
Major warning = 3
Event severity received is the left operand. Criteria Threshold value is the right operand.
66
Event Monitoring
Service
Startup Configuration File Entries (cont’d)
Event Monitoring
Service
Keyword Values Description
Target Type
(required)
VALID VALUES:
UDP TCP OPC
SNMP TEXTLOG SYSLOG
EMAIL CONSOLE
Identifies the method of notification used.
Target Type Modifier (required for the following target types):
UDP
Target UDP Host – hostname of the machine to which UDP event messages will be sent.
Target UDP Port – port number on the host that will be used for the network connection.
TCP
Target TCP Host – hostname of the machine to which TCP event messages will be sent.
Target TCP Port – port number on the host that will be used for the network connection.
TEXTLOG
Target TEXTLOG – name of the log file to which event messages will be sent.
Target EMAIL Address – email address of the recipient of the event messages.
Comment
(optional):
Any text string Optional field presented as user data in each event meeting this criteria.
67
Event Monitoring
ServiceDefault File Entries
Event Monitoring
Service
Description Entry
Entry to send all events to textlog
MONITOR: /storage/events/disk_arrays/FW_SCSI
Criteria Threshold: INFORMATION
Criteria Operator: >=
Target Type: TEXTLOG
Target TEXTLOG FILE: /var/opt/resmon/log/event.log
Entry to send SERIOUS and CRITICAL events to syslog
MONITOR: /storage/events/disk_arrays/FW_SCSI
Criteria Threshold: SERIOUS
Criteria Operator: >=
Target Type: SYSLOG
Entry to send SERIOUS and CRITICAL events to email
MONITOR: /storage/events/disk_arrays/FW_SCSI
Criteria Threshold: SERIOUS
Criteria Operator: >=
Target Type: EMAIL
Target EMAIL address: root
68
Event Monitoring
Service
PSM Monitor Configuration Files
• PSM 과 HW event monitor 사이의 상호작용은 다음의 PSM Configuration file 에 의해 제어된다 .
• /var/stm/config/tools/monitor/monitor_name.psmcfg• Format:
– MONITOR_RESOURCE_NAME: /storage/events/disks/default– PSM_RESOURCE_NAME (valid PSM resource path)– MONITOR_STATE_HANDLING (type of state handling)– DOWN_SEVERITY_THRESHOLD: CRITICAL– DOWN_SEVERITY_OPERATOR: =
• 어떤 severity level 이 “ Down” 상태를 야기하고 , 그에 따른 행동 및 다시 “ Up” 상태로 되돌리는데 필요한 부분을 정의한다 .
• PSM 은 매 10 분마다 설정 파일을 체크 한다 .• 변경 이유 :
• 특정 자원을 “ Down” 상태로 변경하도록 severity level 를 재 설정 할때 .
Event Monitoring
Service
69
Event Monitoring
Service
Section 5: PSM Peripheral Status Monitors
Event Monitoring
Service
•Peripheral Status Monitors 란 ?
•MC/Service Guard 와 구성방법
70
Event Monitoring
Service
When to Use EMS HW StatusMonitoring
• Peripheral Status Monitoring
•HW 가 운용가능한지 판단하기 위해•OpenView IT/O 와 같은 시스템 관리 프로그램과
연동하기 위해•HW 자원에 종속적인 페키지를 만들어
MC/ServiceGuard 와 통합하기 위해
Event Monitoring
Service
71
Event Monitoring
Service개 관
Event Monitoring
Service
•HW event 를 device/resource 상태의 변화로 전환
•Data 의 사용에 영향을 미치게 되는 상태를 리포팅
•SAM 에서 EMS GUI 를 이용 .
•변경 후 package fail-over 시 MC/ServiceGuard 에 의해 사용됨 .
•HW event monitors 와 MC/ServiceGuard 사이의 Interface 역할 .
72
Event Monitoring
ServiceHow PSM Works
Event Monitoring
Service
Hardware Event
Monitor
Peripheral Status
Monitor (PSM)
Event MonitoringService(EMS)
To MC/
ServiceGuard
HW event monitor 는 각각의 event 에 대해 severity level 를 지정하고 그것을 PSM 에 전달한다 .
The PSM 은 그 event 의 severity level 를 device status (UP or DOWN) 로 전환하고 그 상태를 EMS 에 전달한다 .
EMS
Notification
그 자원에 대해 PSM monitoring request 가 만들어지면 , 지정된 통보 방법에 의해 통보된다 .
그 자원이 MC/ServiceGuard package 와 관련이 있을 경우 , EMS 는 MC/SG 에게 그 상태를 바꾸도록 통보한다 . 만약 그 자원의 상태가 “ Down” 으로 바뀌게 되면 , MC/SG 는 package 를 fail-over 시키게 된다 .
73
Event Monitoring
ServicePSM 구성 요소
Event Monitoring
Service
•psmctd – Peripheral Status Client/Target daemon
•HW resource 의 상태를 감시하는데 사용됨 .
•psmmon – Peripheral Status Monitor
•psmctd 에 의해 인식된 자원들의 상태를 감시하는 유틸리티 .
•set_fixed – HW resource 의 상태를 “ Down” 에서 “ Up”으로 직접 바꾸어주는 유틸리티 .
•자동으로 이러한 수행을 하지 못하는 monitor 에게만 사용가능 .
예 ) “DOWN” 상태인 HW resource 들을 나열 /opt/resmon/bin/set_fixed –L ”DOWN” 에서 ” UP” 으로 변경 : /opt/resmon/bin/set_fixed –n resource_name
74
Event Monitoring
Service
PSM States
Event Monitoring
Service
Condition Interpretation
Up HW is operating normally
Down An event has occurred that indicates a failure with the HW
Unknown Cannot determine the state of the HW. This state is treated as DOWN by the PSM
75
Event Monitoring
Service
Event Monitoring
Service
Configuring MC/ServiceGuard Package Dependencies with the PSM
•PSM 에서 사용 가능한 한 개 이상의 자원을 MC/SG package 와 구성 하는 방법 .
•특정 자원에 대한 상태를 감시하는 EMS monitoring request 를 만든다 .
•그 자원의 상태가 변경될 경우 MC/SG 에 통보한다 .
•PSM package 를 구성하는 두 가지 방법
•SAM
•Editing package configuration file (/ etc/cmcluster/pkg.ascii)
76
Event Monitoring
Service
Data
Mirror
VG01Node 2Node 1
Package 1 pkg.ascii
Package Dependency:
VG01
Applic 1
IP Addr - Pkg 1IP Addr - Node 1 IP Addr - Node 2
Exclusive VG Activation
만약 VG01 의 상태가 Node1 에서 “ DOWN” 이 되면 , 그 패키지는 VG01의 상태가 “ UP” 으로 보이는 다른 시스템 , 즉 Node2 에서 시작된다 .
MC/ServiceGuard Packages
Event Monitoring
Service
77
Event Monitoring
Service
GUI Monitoring Request – Example
Event Monitoring
Service
78
Event Monitoring
Service
GUI Monitoring Request – Example (cont’d)
Event Monitoring
Service
79
Event Monitoring
Service
GUI Monitoring Request – Example (cont’d)
Event Monitoring
Service
80
Event Monitoring
Service
GUI Monitoring Request – Example (cont’d)
Event Monitoring
Service
81
Event Monitoring
Service
SECTION 6:Basic Troubleshooting Guidance
•정보 수집•Disable a Monitored Resource
•How to Test Online Diagnostics
•How to Completely Disable EMS
Event Monitoring
Service
82
Event Monitoring
Service 정보 수집 Event
Monitoring Service
다음은 EMS troubleshooting 할 때 유용한 정보들 이지만 , 문제의 성격에 따라 아래 정보들이 모두 필요한 것은 아닙니다 .
•EMS and STM version (monconfig, cstm, swlist)•System type (uname –a, model)•swlist –l bundle and swlist –l product•/var/opt/resmon/log/event.log•/etc/opt/resmon/log/api.log -client.log -registrar.log•/var/adm/sw/swagent.log•/var/stm/logs/sys•Persistence files /etc/opt/resmon/persistence
EMS 와 STM Version 확인 : http://www.docs.hp.com/hpux/onlinedocs/diag/stm/stm_upd.htm#table
83
Event Monitoring
Service Disable a Monitored ResourceEvent
Monitoring Service
• September 2000 (IPR 0009) 버전 이후로 startmon_client 는 다음 파일을 참조 한다 .
/ var/stm/data/tools/monitor/disabled_instances
• 위에 나열된 각 항목들은 한 줄 당 한 항목을 나타낸다 .• Wildcards 도 사용 가능하다 .:
/ storage/events/disks/default/*
• As user root:1. Add/Delete/Modify instances in disabled_instances file2. Execute monconfig (E)nable Monitoring3. Wait for monitoring to be re-enabled 4. Select (C)heck detailed monitoring status
84
Event Monitoring
Service Disable an Individual EventEvent
Monitoring Service
• For Multiple-View Predictive-Enabled monitors edit the client configuration file/var/stm/config/tools/monitor/default_monitor_name.clcfg• Change the “enabled flag” from TRUE to FALSE:
EQ:3:CRITICAL:FALSE:1440:ANY:1:NONE:NO_OP:NO_OP:NONE
• For other monitors edit the monitor configuration file/var/stm/config/tools/monitor/monitor_name.cfg• Change the action flag to IGNORE:
DEFINE_EVENT 5 CRITICAL IGNORE # power supply fault
• 변경 후 바로 적용되며 monitoring 을 restart 할 필요가 없다 .
85
Event Monitoring
Service How to Test Online DiagnosticsEvent
Monitoring Service
1. Hardware monitoring 은 세개의 daemon 이 필요하다 .: diagmond, diaglogd, and memlogd. Check with ps -ef command.
2. List all currently active HW monitors:ps -ef | grep stm
3. Run /etc/opt/resmon/lbin/monconfig to (C)heck detailed monitoring status. The initial screen should show event monitoring enabled.
4. EMS 를 통해 test event 를 보내어 테스트할 경우에는 send_test_event 명령을 사용한다 .:/opt/resmon/bin/send_test_event –v monitor_name
86
Event Monitoring
Service How to Completely Disable EMSEvent
Monitoring Service
일시적으로 EMS 전체를 Disable 할 필요가 있을 경우에는 다음과 같은 몇 가지 순서대로 수행해야 한다 .:
1. Run monconfig (K)ill (disable) monitoring2. Edit /etc/inittab using vi, comment out the 4 lines labeled
ems1, ems2, ems3 and ems43. Reread /etc/inittab by running init q4. Change EMS_ENABLED to 0 in /etc/rc.config.d/ems5. Change AUTOSTART_EMSAGT to 0 in
/etc/rc.config.d/emsagtconf6. Kill emsagent and p_client processes (if still running)7. Verify monitors stopped using ps -ef|grep stm
Enable 할 경우에는 위의 반대로 수행하고 다음을 실행한다 ./sbin/init.d/ems start/sbin/init.d/emsa start
87
Event Monitoring
Service Q & AEvent
Monitoring Service
Event Monitoring
Service
88
Thanks