ibm software group - fkii · • 대용량데이터를위한병렬처리및optimization •...
TRANSCRIPT
®
© 2003 IBM Corporation
IBM Software Group
2003. 11. 13 이 지 은IBM Information management 팀
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
–
–
Agenda
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Part 1
데이터 웨어하우징 동향: 현재와 미래
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
1. Data Warehouse – Reengineering or Rebuilding – Rigorous Measurement of RO
I2. Data Warehouse is essential in Customer Relationship Management
3. Integration
4. Proliferation of Data Sources
5. Growing number of end users
6. More Complex Queries
7. Exploding Data Volumes
8. Data Warehouse for real time analysis and actions
9. Increased Analytics
Trends in Data Warehousing – 대용량+활용성+실시간
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Mid-1980s: Data Warehousing – “일관성”Data marts
Business data warehouseReconciling disparate data
Single version of the truth
Creates historical record
CharacteristicsHistorical data
Separation of informational & operational needs
Structured data
Unidirectional data flow
Trusted sources
Basic technical metadata
Business data warehouse
Metadata
Operational systems
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Late-1990s: Operational data stores and data marts“ Immediate information”
Operational data store (ODS)Near real-time
Integrating related data
Relative/partial truth
CharacteristicsRecent and historical data
Merging of informational & operational needs
Structured data (largely)
Bidirectional data flow
Trusted sources
Comprehensive technical metadata Operational systems
(Operational) Data marts
Metadata
Business data warehouse
Operational data store
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
The vision: Comprehensive integration of informationIntegrated information
Real-time knowledgeIntegrating all informationComplete truth
CharacteristicsImmediate and historical dataFully merged informational & operational needsStructured and unstructured data Bidirectional data flow and accessIntelligent cachingTrusted and untrusted sourcesComplete business & technicalmetadata
(Operational) Data marts
Information integration
Metadata Business data warehouse
Operational data store
Operational systems
Untrusted & unstructured sources (e.g. Internet)
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Three ways of extending a data warehouse
1. Accessing and joining real-time dataFrom databases, applications and queues
2. Accessing and joining unstructured dataFrom content stores, web-pages and e-mails
3. Combining data from multiple data martsFrom disparate data warehouses
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
ClientBI Tool
Database
DBMS
Existing Operational Systems
DB2 Information Integrator
Application
Database
DB2
BDW
Operational systems
Data marts
ODS
Application
Extending DW : Federation allows access to and joining of real-time data with an existing warehouse.
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Client
BI Tool
Existing Content Systems
DB2 Information Integrator
BDW
Operational systems
Data marts
ODS XML
Content
DBMS
Extending DW: Federation allows access to XML and unstructured content.
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Client BI Tool
Second data warehouse
DB2 Information Integrator
Data Mart
RDDBS
BDW
Operational systems
Data marts
ODS
BDW
Extending DW: Federation enables joining existing data marts (and data warehouses) together.
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Data warehousing has evolved beyond static data. IBM Information Integration delivers information
Mid-80s: Data warehousePoint-in-time structured dataIntegrated within the business intelligence environment
Mid-90s: Operational data storeNear real-time structured dataIntegrated within a limited internal environment
Today: Information integration => Extending DW Real-time structured and unstructured informationIntegrated wherever required
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Part 2
실시간 비즈니스 인텔리젼스: Embedded analysis
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Everywhere : Demands of real-time
Operational BI
Balanced Scorecar
d
Key Performance
Indicators
Real Time Enterprise
Enterprise Application Integration
Information Integration
Business Performance Management
Zero Latency
Enterprise
Integration Brokers
SCM, ERP, CRM
Executive Dashboards
Real Time Enterprise
Real TimeBI
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Real-time BI and lifespan of data
Operational System
EAI / Eii
ETL & Data Warehouse
tapeReal Time BI
This Minute Tonight 12-36 months Archives
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Real time BI – Major factors
ParallelETL Engines
MQSeriesqueues
Replication
Web services
Embedded Analysis with data mining,
rules, campaigns
Alerts, triggers,
KPIs, Analytics
DB2Warehouse
ODS
InformationIntegration
CorporateDashboard
ConsumersContinuous Loading
Concurrent User Queries
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Real time ETL needs Parallelism & Non-stop Loading
Load
DB2
DB2
DB2
DB2BI users
www
MQ
new data Import Transforms Aggregate
CallCenter
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
ClientBI Tool
Information integrationData
marts
ODS
DB2Database
DBMSwrapper
Data Warehouse
Applicationwrapper
Applicationwrapper
Federation
Monitors & Dashboards
Real time BI – Federation
Operational systems Operational Systems
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
BI ReportsBI ReportsPortal KPIsPortal KPIs
Call CenterCall CenterWeb SiteWeb Site
애플리케이션에 BI 의 로직이 탑재됨을 의미합니다. 실시간으로 분석하고 그 결과를 사용자가 자신의 애플리케이션에서 보도록 합니다.
Real-time BI with Embedded analysis
DataWarehouse
ProductionDatabase
Production Database
Data Warehouse
Data Mining
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Real time BI – things to consider
change data capture
MQSeries
paralleltransform &
load
DB2Warehouse
Style of application & UseOLTP Integration or BI10 records integrated or 10M
Determine level of detail requiredDetermine latency required
How quickly can the LOB use the knowledge?Will business processes have to change?Is the price worth the performance?
Take only what you needReal time is expensiveLOB must be able to take action real time
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Part 3
Real Time Enterprise를 위한 BI 플랫폼: IBM Data Warehouse Edition
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
The DB2 Framework for Business IntelligenceThe DB2 Framework for Business Intelligence
Adm
inist
ratio
n
Information Integration
OLAP
OLAP
Mini
ngMi
ning
Stat
istics
Stat
istics
ETL
ETL
SQL XML PMML
DB2
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
DB2 DWE ( Data Warehouse Edition) DB2 DWE ( Data Warehouse Edition) the BI Frameworkthe BI Framework
federatedfederateddata accessdata access
DB2DB2
OLAP & OLAP & RDBMS RDBMS
MetadataMetadata
DB2 DB2 CatalogCatalog
Cube ViewCube View MQT on MQT on MDCMDC
DetailDetailDataData
Web ServicesWeb Services
XMLXMLSQLSQLJava SP Java SP
Client & ISV Applications
VSAMIDS
Oracle
SQL Server
IMS
Sybase
DB2
etc.
federatedfederateddata accessdata access
DB2DB2
OLAP & OLAP & RDBMS RDBMS
MetadataMetadata
DB2 DB2 CatalogCatalog Cube ViewCube View
MQT on MQT on MDCMDC
DetailDetailDataData
Web ServicesWeb Services
XMLXMLSQLSQLSP SP
Client & ISV Applications
VSAMIDS
Oracle
SQL Server
IMS
Sybase
DB2
etc.
H y p e r i o n
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
DB2 Data Warehouse Edition Components
DB2 Enterprise Server Edition Foundation of BI platform
Database Partitioning Feature Shared-nothing scalability
DB2 Cube Views Unified engine for OLAP
Intelligent Miner Scoring, Modeling & Visualization Data mining inside DB2
Warehouse Center & Manager ETL transforms & scheduling
Office Connect Enterprise Edition Excel connectivity to DB2
Query Patroller Workload management
Information Integrator Real time DW
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
IBM DWE for Extending DW
marts
Information Integrator
데이터 통합
ETL Data Stage
DB2 Warehouse Manager
DB2 UDBExtending
DW
DB2 UDB
DB2 Information Integrator
DataStage parallel extender/
Warehouse Manager
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
DB2 UDB : Optimized platform for BI
고성능Performance
안정성stability
확장성Scalability
연계성Integration
• 대용량 데이터를 위한 병렬처리 및 Optimization• 복잡한 질의, 대수 사용자의 질의에도 빠른응답속도 지원
• 데이터 웨어하우스를 위한 인덱스 기법제공• 대용량데이터를 위한 다양한 Partition기법
• 이기종 데이터 소스에 투명한 접근 허용 및데이터 교환을 위한 투명한 Architecture 제공 및 관리의 용이함 제공
• Shared-nothing 아키텍처를 사용하여 1000 노드까지의 선형적 확장성 보장
• 자율컴퓨팅을 통한 효율적 데이터 관리• 자원최적화 하여 스스로 관리하는 기능
• 고가용성 Fail-over를 지원하여안정적인 서비스 제공 및 사례를 통한우수성 입증
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
DB2DB2OracleOracle
InformixInformix
TCP/IP
OracleSQL* Net
Informixclient
SybaseSybase
SybaseOpen Client
MS SQL ServerMS SQL Server
MS SQL SrvrODBC Client
BlastBlast
DB2 390DB2 390DB2 400DB2 400
DB2 WindowsDB2 WindowsDB2 UNIXDB2 UNIX
TCP/IPAPPCNetBIOS
APPC, TCP/IP
DRDA Driver
DB2 LAN Driver
FlatfileFlatfile
ExcelExcel
DocumentumDocumentum
Any ODBCAny ODBCData Data
sourcesource
ODBC
EngineEnhancementsfor Federated- Optimizer- Rewrite- Runtime
wra
pper
arc
hite
ctur
eXMLXML
DB2 Information Integrator:• heterogeneous data federation technology for structured data and content
Information Integration extends the warehouse
TCP/IP
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
DB2 Warehouse Center
extract, transform, load, schedule, administrate
Internet
File Edit ViewHelp
Tools
Back Forward Stop HomeSearchRefresh
x History Mail Print
Address: http://the_call_center/customer/lookup
07/02/2001 19:29:20
WPS v1.2 WPS v1.2 -- The Cutting EdgeThe Cutting Edge
home
1 cust-nbr2 acct-code3 first name4 last name5 street6 city7 zip8 country
1 cust-number2 acct_type3 F-name4 L-name5 street6 city7 postal_code8 country
DB23m
DB2 Data Warehouse Manager and Center ETL
DB2 Data Warehouse Center ETLƒ Basic DB2 administration console for ETLƒ Database schema & user maintenanceƒ Access to most RDBMS's & flat filesƒ Schedules and monitors database tasksƒ 150+ data transformationsƒ Loads data into DB2 data warehousesƒ CWMI Standards adherence
DB2 Warehouse Manager adds:ƒ Extracts & transformations on remote servers via
agentsƒ Transformation library
DB2 Warehouse Manager
Metadata
IBM
ETL agents Information Catalog
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
IBM DWE for Embedded Analysis
DB2 OLAP ServerDB2 Intelligent Miner
DB2Mining Extenders
Cube Views
Embedded Analysis
Real-time BI
Mining Extenders
Cube Views
DB2 OLAP Server
Intelligent Miner
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Intelligent Miner Modeling, Scoring, Visualization: DB2 UDB can do data mining , even real time mining
predict ions
clusters
scores
Outliers
DecisionsDecisionsPMML Model
Intelligent Miner
Neural NetsAssociationsTime SeriesRadial BasisDecision TreesRegression
Answers
SQLScoring
T_rev98
T_rev98
multi_lines
multi_lines
multi_lines
wireless_Rm
ISP_amt
call_card
T_rev98
T_rev98
PM_usage
MM_usage
Call_waiting
multi_lines
call_card
Sys_rx_ft
ISP_amt
ISP_usa
PM_usage
MM_usage
Call_waiting
wireless_Rm
wireless_3T
rural_zip
PM_usage
MM_usage
MM_usage
MM_usage
Call_waiting
ISP_amt
res_bldg
rural_zip
PM_usage
PM_usage
call_card
multi_lines
wireless_3T
ISP_amt
call_card
Rtx
.47
.44
.43
.36
.33
.29
.27
.19
DB2 Data Warehouse
SQLModeling
visualize
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
IBM Embedded Analysis Strategy : Database-centric
Database
show
web
Stored Procedure
Data
Task
IM Visualization
...stand-alone
...as applet
IM Modeling
IM Scoring Results of
applying the model
Mining task stored as LOB
Mining model stored as LOB
Data to which themodel is applied
Training data
Apply Function
Data
Mining result
Model
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Data Mining – architectureIBM Embedded analysis Strategy – Background
data warehouse
extractDB2
Analyst
Workbench
Consumer
PackagedPackagedApplicationsApplications
algorithm
Application Embedded
RDBMS Extenders
DB2DB2InstanceInstance
DB2DB2InstanceInstance
DB2DB2InstanceInstance
DB2DB2InstanceInstance
SQL invokes extender
Programmer
T_rev98
T_rev98
multi_lines
multi_lines
multi_lines
wireless_Rm
ISP_amt
call_card
T_rev98
T_rev98
PM_usage
MM_usage
Call_waiting
multi_lines
call_card
Sys_rx_ft
ISP_amt
ISP_usa
PM_usage
MM_usage
Call_waiting
wireless_Rm
wireless_3T
rural_zip
PM_usage
MM_usage
MM_usage
MM_usage
Call_waiting
ISP_amt
res_bldg
rural_zip
PM_usage
PM_usage
call_card
multi_lines
wireless_3T
ISP_amt
call_card
Rtx
.47
.44
.43
.36
.33
.29
.27
.19
PMML Advocate
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
IBM Embedded Analysis Strategy – 2 way fits all
Operational Data StoreData
ModelsScores
IM Scoring API
Application Integration Layer: CLI/JDBC/SQLJ/ODBC
Modeling Environm ent Scoring Environm ent
Type attribu tes
Type
attr
ibut
es
T ype player
Type p layer
T ype player
Ty pe p lay er
Segm entation
Analytical Data Mart
M ining W orkBench
Applications with em bedded scoring
Scheduler
JOb
Scheduler
M odel Calibration
Business Objects
Applications with em bedded Mining
T ype attributes
Type
attr
ibut
es
Type player
Type p layer
Type player
Ty pe play er
Segm entationScheduler
JO b
Scheduler
M odel C alibrat ion
IM Modeling API
M ining Run
Model Transportation
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
IBM Embedded Analysis Strategy : Workbench+DB-centri
IM Scoring
Data Analyst
Historical Data
Select Transform Mine Assimilate
Transformed Data
Extracted Information
Assimilated Information
Selected Data
Data Warehouse
mining model
Data Mining Workbench
DB2 UDF
Models from a consultant, solution provider, or central support group within anenterprise.Models can be exchanged among data mining tools from compliant vendors.Added value: could merge purchased data (such as demographic or industry-specific data) with internal data.
Scored DataSQL
PMML model
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
활용예1: 고객 특성 분석 및 활용
비즈니스 이슈고객의 특성을 파악하여 마케팅 전략을 위한 타겟 그룹 개발 필요기존 고객 특성 파악 정보를 새로운 고객에게 적용하여 각 특성에 맞는적절한 상품 추천 필요기존 고객의 행동 특성 변화 에 대응 필요
접근 방법고객 행동 정보 기반의 클러스터링 수행서로 다른 특성 그룹 파악고객 세분화 모델을 운영시스템으로 보냄새로운 고객에 대한 스코어링 수행기존 고객에 대한 스코어링 수행리포트에 스코어링 결과를 보여줌
Operational Data StoreData Models
Scores
IM Scoring API
Mines the data using Demographic Clustering
Application Integration Layer: CLI/JDBC/SQLJ/ODBC
Modeling Environment Scoring Environment
Type attributes
Typ
e at
tribu
tes
Type player
Type player
Type player
Type player
Segmentation
Data Warehouse
Analytical Data Mart
Mining WorkBench
Applications with embedded scoring
Scheduler
JOb
Scheduler
Model Calibration
OLAP
Cluster model
xfer model
활용예1: 고객 특성분석 및 활용 개념도
활용 예2: Real time Promotion - embedded mining 특정 고객군별 연관상품 분석을 실시간으로 수행하여 CRM 활동에 활용
Mines the data
Application Integration Layer: CLI/JDBC/SQLJ or ODBC
Modeling Environment
Data WarehouseAnalytical Data Mart
IM Modeling API
IM Visualization
Applications with embedded mining
applications that searchfor association rules
TransactionalData
Scheduler
JOb
Scheduler
Model Calibration
BusinessObjects
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
DB2 Cube Views : DB2 UDB can recognize OLAP
OLAP MetadataOLAP Metadata
OLAP OLAP MetadataMetadata
OLAP MetadataOLAP Metadata
Model & ETL Tool Metadata
OLAP MetadataOLAP MetadataMeta Data
Bridge
BI Tool Metadata
Meta Data
Bridge
Hyperion
BUSINESS OBJECTSDML DML
DDLDDLOLAP MetadataOLAP Metadata
DB2 Data Warehouse
RDBMS Metadata
OLAP OLAP MetadataMetadata
OLAP MetadataOLAP Metadata
OLAP OLAP MetadataMetadata
OLAP MetadataOLAP Metadata
OLAP MetadataOLAP Metadata
DATADATA
Optimization Optimization
AdvisorAdvisor
MQTMQT’’s s
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
OLAP Center ArchitectureDB2 Cube ViewsOLAP Center
XML Output
File
XMLImport
File
DetailDetailDataData
DB2 DB2 CatalogCatalog
MQT & MQT & indexindex
OLAP OLAP MetadataMetadata
XMLXML
SQLSQL
Metadata Access
Data Access
Custom App
Excel w/ OfficeConnect
JDBC
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
DB2 Cube ViewsPerformance Advisor
OLAP MetadataOLAP MetadataCube Views Model Base TablesCatalog Tables
MQT's
Time & Space constraintsOptimization hints
Model InformationData Samples
Statistics
Administrator
Performance Advisor
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Optimization Advisor
DB2 Cube ViewsMQT 활용
MQT에 있는 내용을 검색하는 쿼리가 발생하면 쿼리를 다시 생성
MQT에서 집계된 데이터를 바로 읽어 옴.
Query MQT
select s.REGION, SUM(f.SALES)from FACT f, STORE swhere f.STORE_KEY=s.STORE_KEY
and s.REGION=‘Pacific’group by s.REGION
create table MYMQT as (select s.REGION, SUM(f.SALES)from FACT f, STORE swhere f.STORE_KEY=s.STORE_KEYgroup by s.REGION
) …
REGION SALESEastern 155,037.12$ Mid Atlantic 37,534.96$ Mid West 233,134.89$ Mountain 74,019.26$ Pacific 116,960.98$ South East 81,799.49$ South West 82,916.89$
Rewritten Query DB2 Optimizer
select REGION, SALESfrom MYMQTwhere REGION=‘Pacific’
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
•일반 사용자들이 OLAP 툴 등을 이용하여 데이터베이스에 쿼리를 수행하게 될 경우, 이미 생성되어 있는 MQT에 대하여 쿼리가 수행되어 쿼리 응답 성능 향상된다.
•이때, 사용자는 기존과 같은 환경에서 같은 쿼리를 수행한다. MQT 의 존재 여부에영향을 받지 않는다.
•일반 사용자들이 OLAP 툴 등을 이용하여 데이터베이스에 쿼리를 수행하게 될 경우, 이미 생성되어 있는 MQT에 대하여 쿼리가 수행되어 쿼리 응답 성능 향상된다.
•이때, 사용자는 기존과 같은 환경에서 같은 쿼리를 수행한다. MQT 의 존재 여부에영향을 받지 않는다.
다차원 모델링
일반 사용자 및분석가
정보계 담당자
DW 시스템과 DB2 Cube Views – Benefit for Everybody
쿼리 성능 향상쿼리 성능 향상
• Cube Views에서 관리 되는 메타 데이터는 ETL, 모델링 툴 에서 부터 가져오고 각OLAP 툴로 가져가서 작업이 가능하다.
• Cube Views에서 관리 되는 메타 데이터는 ETL, 모델링 툴 에서 부터 가져오고 각OLAP 툴로 가져가서 작업이 가능하다.
다양한 툴과 메타 데이터 공유다양한 툴과 메타 데이터 공유
•자동 요약 테이블을 사용자 쿼리 유형, OLAP 메타 데이터 등을 고려하여 최적화하여작성할 수 있어 관리자가 별도의 노력을 들여 요약 테이블을 구성하지 않아도 된다.
•자동 요약 테이블을 사용자 쿼리 유형, OLAP 메타 데이터 등을 고려하여 최적화하여작성할 수 있어 관리자가 별도의 노력을 들여 요약 테이블을 구성하지 않아도 된다.
분석 환경에 최적화된 자동 요약 테이블 생성분석 환경에 최적화된 자동 요약 테이블 생성
DB2 Cube Views
IBM Software Group | DB2 Information Management Software
© 2003 IBM Corporation
Extending data warehouse
Data Warehouse Edition
Real-time BI Embedded Analysis
DB2 UDB for EDW
Technology
IBM methodologyExperience
진정한 BI 의 실현: Integrated Approach with IBM DWE