data management 数据管理
DESCRIPTION
The Networked Economy (8): Information Management, Strategy, and Innovation 网络经济:信息管理,战略和创新. Data Management 数据管理. Agenda 议程. Databases vs data warehouse 数据库和数据仓库 Four properties of data warehouses 数据仓库的四个特征 - PowerPoint PPT PresentationTRANSCRIPT
© people & data | www.weigend.com Andreas S. Weigend, Ph.D. 韦思岸教授
The Networked Economy (8): Information Management, Strategy, and Innovation网络经济:信息管理,战略和创新
Data Management数据管理
2©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Agenda议程
Databases vs data warehouse数据库和数据仓库
Four properties of data warehouses数据仓库的四个特征
Subject oriented – Integrated – Time variant - Non-volatile对象导向的 - 集成的 - 随时间变化的 - 稳定的
Getting data into the warehouse将数据载入数据仓库
ETL : Extract – Transform – LoadETL :提取 - 转化 - 加载
Accessing data and reporting数据评价和报告
OLAP: On-Line Analytical ProcessingOLAP :联机分析过程
Process of designing and building a warehouse设计和建立数据仓库的程序
3©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Databases vs data warehouses数据库与数据仓库
•Databases 数据库 Many operational databases in
the organization组织中有很多运营的数据库
Operational data, day-to-day, many databases in an enterprise企业的运营数据,每天,很多数据库
Example: Database of current addresses for catalog mailings例如:用于注册邮件地址的数据库
•Data warehouse 数据仓库 A single data warehouse in the
organization组织中只有一个数据仓库
Keeps historical information其中保存有历史信息
Embodies business knowledge包含业务知识
Needs systems integration skills需要系统整合技术
4©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Raw data vs information and insights原始数据与信息和启示
•Raw Data原始数据 What are the total sales for
region A?A 地区的总销售额是多少?
Which salesperson earned the highest commission this month?这个月哪个销售员赚的佣金最多?
•Information and Insights信息和启示 How have the sales for region A
changed over the past five years?A 地区在过去五年中的销售情况有怎样的变化?
Which products should sell best next year?明年哪些产品能卖得好呢?
Tell me something I did not know.告诉我一些我不知道的事情
5©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
From Extract to Warehouse DSS从提取到仓库决策支持系统
Controlled可控的
Reliable可靠的
Quality information质量信息
Single source of truth单一数据源
Data warehouse数据仓库
Internal and externalsystems
内部和外部系统Decision makers
决策者
6©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
The data warehouse数据仓库
Central repository中央仓库
Historical data历史数据
Many data sources (internal and external)很多数据源 ( 内部 , 外部 )
Single version of truth事实的唯一版本
Separate from operational systems与运营系统分离
7©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Properties of a data warehouse 数据仓库的特征
Subjectoriented对象导向的
Integrated集成的
Time variant随时间变化的
Non-volatile稳定的
Datawarehouse
数据仓库
8©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Subject oriented对象导向的
Operational systems运营系统
储蓄Savings
Shares股票份额
Loans贷款 Insurance
保险
Equityplans
权益方案Customerfinancial
information顾客财务信息
Data warehouse subject area
数据仓库对象区
Data is categorized and stored by business subject rather than application.
数据是按照业务对象而不是应用程序来分类和存储的。
9©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Integrated集成的
Data warehouse数据仓库
Operational environment运营环境
Subject = Customer对象=顾客
Savingsapplication储蓄应用程序
Current accounts
application当前帐户应用程序
Loansapplication贷款应用程序
No specific application没有丝毫应用程序的色彩
Data on a given subject is defined and stored once.
所有对象的数据只被定义和存储一次。
10©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Time variant随时间变化的
Data数据
Time时间
01/2008
02/2008
03/2008
Data for January一月的数据
Data for February二月的数据
Data for March三月的数据
Data warehouse数据仓库
Data is stored as a series of snapshots, each representing a period of time.
数据是以瞬时数据序列的形式存储的,每一个瞬时数据代表一段时间。
11©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Non-volatile稳定的
read读取
load
加载
insert插入update更新delete删除
Operational databases运营数据库
Warehouse database仓库数据库
read读取
Typically data in the data warehouse is not updated or deleted.
通常数据仓库中的数据是不会被更新或删除的。
12©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
ETL: Extract – Transform – LoadETL :提取-转化-加载
Detect changes检测变化Move data移动数据Clean data整理数据Index and summarize索引和摘要Create and maintain metadata维护元数据Load data加载数据
Warehouse仓库
Programs程序
Gateways网关
Tools工具
Operational Systems运营系统
ETL
13©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Data access and reporting数据访问和报告
Tools that retrieve data for business analysis为了分析业务而进行数据检索的工具
More than a single tool may be required可能需要更多的工具
Requirements要求
Ease of use便于使用
Intuitive直观
Metadata元数据
Training培训
Warehouse 仓库数据库
Drill-down数据挖掘
Forecasting预测
Simple queries简单查询
14©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Multidimensional OLAP多维的联机分析处理
Data is represented and extracted by dimensions数据是按照维度来表示和提取的
Matrix calculations are carried out quickly矩阵计算能够被快速的执行
Results are displayed as结果显示为
Matrix reports矩阵报告
Graphs图表
Pro
duct
产品
Time时间
Customer
顾客
January 2008 Tennis Tennis TennisShoes Balls Nets
-----------------------------------------------------------------Customer A 200 300 350Customer B 1000 1000 700Customer C 500 1500 250Customer D 250 200 200-----------------------------------------------------------------Total 1950 3000 1500
15©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Gathering user requirements搜集用户的需求信息
Obtain from user从用户收集而来
What types of questions will be asked?应该问一些什么样的问题?What information is needed (internal, external)?需要什么类型的信息? ( 内部信息 , 外部信息 )
What tasks and processes need to be documented哪些任务和处理步骤需记录? E.g., Compliance (Sarbanes-Oxley)
例如:承诺(萨班斯 -奥克雷法案)
Educate user 同时帮助用户了解
Data models and metadata数据模型和元数据Reporting requirements报告的需求Business rules商业规则Tools and their selection工具及选择
16©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
User expectations and service level agreements (SLAs)用户的期望和服务水平协议
Key: Manage expectations关键:管理期望
Avoid creating unrealistic hopes 避免产生不切实际的希望
Set achievable targets for response to queries对查询设定可实现的响应目标
Define Service Level Agreements (SLAs)设定服务水平协议( SLAs)
Educate the users对用户进行培训
17©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Pilot the warehouse引导仓库的构建过程
Target a specific application area 以仓库的某个应用区域为目标
Involve relevant groups使有关各方参与进来
Designers 设计者
Users 使用者 Check ease of use of tool
检验工具的易用性 Test data and query performance
检查数据和查询执行效果 Identify training requirements确定培训需求
Developers开发者 Test security and access levels, monitor performance
测试安全性和访问的级别,并追踪执行效果
18©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Data warehouse champion数据仓库维护者
Articulates the vision使愿景更加清晰
Maintains inter-group communication维持团体间的沟通
Settles conflicts解决冲突
Identifies and solves issues发现并解决问题
Brings in business expertise带来商业专业技术
Organizes and supports the team组织并支持团队
Manages outside consultants管理外部顾问
Brings the data warehouse to life使数据仓库更加接近现实
19©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Steering committee指导委员会
Tasks任务
Provides direction提供指导Decides upon implementation issues解决执行过程中的问题Sets priorities设定优先级Assists with resource allocation协调资源配置Communicates effectively to all levels 必要时候与各个层次的人有效的沟通
Constituencies候选人
Business executives业务主管Managers经理IT representatives信息系统的管理代表Knowledge workers知识渊博的员工Business analysts业务分析员Business planners商业策划员 Research and development研发人员
20©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
What makes the warehouse successful?成功的数据仓库
Start with strategic vision以战略的眼光为起点
Project driven by the business以业务为导向
Focus on objectives注重目标的实现
Clear value added to the business增加商业价值
Use of the warehouse is key!有人使用是关键!
Delivers good data传递高质量的数据
Performs well运行效果很好
21©
peop
le &
data
| w
ww
.weig
en
d.c
om
| +
1 6
50
90
6-5
90
6 |
+4
9 1
74
90
6-5
90
6 |
+8
6
138
181
8 3
80
0
Summary: Benefits总结:益处
Intangible benefits (45%)无形的收益( 45%)
Respond to changing business conditions适应变化的商业环境Remain competitive保持竞争力
Improved decision making (25%)更好的决策( 25%)
Increased transparency增加透明度Shorter response time缩短响应时间Better reporting严谨的报告制度
Increased productivity (30%)提高生产率或者投资回报率( 30%)
For internal users内部用户For external users外部用户