case study –data warehouse 遠傳電信 cdr dw poc

24
Case Study–Data Warehouse 遠遠遠遠 CDR DW POC Situation overview Customer currently use Teradata with high maintain cost fee Auto process complex procedures Competing against Oracle Strategy & solution Provide a stable, scalable and reliable Operational Data Store(ODS) solution that can share CDR(Call Detail Records) space occupation and heavy processes in Teradata EDW system, in order to enhance the performance of FET EDW

Upload: evelyn

Post on 29-Jan-2016

138 views

Category:

Documents


0 download

DESCRIPTION

Case Study –Data Warehouse 遠傳電信 CDR DW POC. Situation overview Customer currently use Teradata with high maintain cost fee Auto process complex procedures Competing against Oracle Strategy & solution - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Case Study –Data Warehouse 遠傳電信  CDR DW POC

Case Study–Data Warehouse遠傳電信 CDR DW POC

Situation overviewCustomer currently use Teradata with high maintain cost feeAuto process complex proceduresCompeting against Oracle

Strategy & solution Provide a stable, scalable and reliable Operational Data Store(ODS) solution that can share CDR(Call Detail Records) space occupation and heavy processes in Teradata EDW system, in order to enhance the performance of FET EDW

Page 2: Case Study –Data Warehouse 遠傳電信  CDR DW POC

Solution Design Consideration– Import/Export Strategy

Voice CDR

SMS CDR

GPRS CDR

WAP CDR

CDR ODS- MS SQL 2008

Loading PSTAGE

PDATA

PMART

Teradata

Bulk Insert

SSIS

FastLoad

FastExport

Bulk Insert

Bulk Export

Trans-form

CubesSSAS Cube

s

Fact Table Data Mart- MS SQL 2008

Bulk Copy

SubscriberProfile

CDRSummary

Page 3: Case Study –Data Warehouse 遠傳電信  CDR DW POC

Our Solution Architecture

Teradata

12 nodesCapacity :11

TB

Current Data Volume: ~9TB(FET+KGT)

CDRs

HP DL580 + EMC CX3-20SQL 2008

750 GB874 Text files

Multi-threadBulk Insert

ADO .netProvider

Data Partitioning

Page Compression

Improved Query Performance

SQL 2008 New Features in this POC

SQL Query Analyzer

Page 4: Case Study –Data Warehouse 遠傳電信  CDR DW POC

ODS POC Final ResultItem Description KPI Data

Source Row Count

Data Source Size

MSFT

1 Case I : Load 8/1 ~ 8/30 CDR data into database with pre-defined schema & Perform EOD process then split into 6 modulesCase II : Load 8/31 CDR data & Split it into 6 modules

2.0 Hrs

15 Mins

755 GB CDR Text files

3hr4m

6m58s

2 Replicate PMART.CUS_SUBSCR_CURR & PMART.CUS_SUBSCR_CURR_PP data from Teradata via ETL Automation tools

Yes or No

45GB in Teradata

YES

3 Case I : Generate 8/30 ~ 8/31 CDR_VOICE_DLY data into database with pre-defined schemaCase II : Generate 2008/08 CDR_VOICE_MLY monthly CDR aggregation data into database with pre-defined schema

24 Mins

30 Mins

MO:389,400,000MT:370,040,000

MO : 140G(58G after Compression)MT : 100G(54G after Compression)

22m

90m

4 Write back 2008/08 CDR_VOICE_MLY monthly and daily CDR aggregation data into Teradata with pre-defined schema via ETL Automation tools

Yes or No

N/A YES

Page 5: Case Study –Data Warehouse 遠傳電信  CDR DW POC

SAN Storage儲存設備部分,微軟建議在遠傳現有的 EMC CX3-80 SAN 的擴充,新增磁碟增加所需儲存空間; IT 人員不須學習新的維運管理技能就可有效管理本系統。以下為所建議之 SAN 組成之方式,以資料使用狀況分開存放於不同的 RAID 群組中。

資料類型 放置資料 資料容量 Raid 組成方式 單顆硬碟容量

High Performance

F1/F2 Voice CDR, SMS CDR, GPRS CDR

4TB Raid 0 + 1 146GB

Ordinary Performance

PMart 資料延伸

3TB Raid 5 300GB

Page 6: Case Study –Data Warehouse 遠傳電信  CDR DW POC

Solution Design Consideration- High Availability

兩台資料庫伺服器分別處理不同工作,預估除了在處理EOD/EOM 等尖峰時間外,平時的 CPU 使用率並不會超過40%規劃兩台 SQL 資料庫伺服器將組成 Active-Active mode Fail-over Cluster ,互為備援;當一台伺服器故障後,另一台伺服器將可接手所有的工作負載

6

Page 7: Case Study –Data Warehouse 遠傳電信  CDR DW POC

Case Study – 高效能警政署 150 億筆通聯資料 4~5 seconds search

Situation overviewCIB was pure Oracle database environmentCustomer needs to deal with 15 billoins records Oracle ran insert & index building 15 hours

Strategy & solutionPartner came to MTC for a POC supportPOC result in preparing time for 6 hours and search for 5 seconds in 15 billions record to exceed CIB’s expectation.Also provide best backup and synchronized VLDB solution in different options for CIBMicrosoft provides enterprise-class, mission-critical systems and support

Page 8: Case Study –Data Warehouse 遠傳電信  CDR DW POC

警政署通聯分析平台架構

QM

Nod1 Node2

Linked servers

Distribution Link Architecture

SSIS(Bulk Insert ) + Create 5 Indexes

SSIS(Bulk Insert ) + Create 5 Indexes

Page 9: Case Study –Data Warehouse 遠傳電信  CDR DW POC

Testing Result- Size / Records

Description Size Records

1 1008GB 15,000,000,000

2 1.008TB 150 億筆

Page 10: Case Study –Data Warehouse 遠傳電信  CDR DW POC

Testing Result- Demo

Action Time duration in different flash test

Over all (hr)

Oracle

SSIS (loading data)

1:54 ~2:07 2 5.5

Create 5 Indexes

3: 53~4:20 4.2 10.3

Search 3~6 sec 5 Sec

Page 11: Case Study –Data Warehouse 遠傳電信  CDR DW POC

資農業產銷資訊整合平台

•廣度規劃以農糧署為資料提供單位,以農糧署、農委會統計室與企劃處 經濟研

究科為資料使用單位。本年度訪談計畫將安排統計室及農糧署各三場、經濟研究科二場,訪談內容包括決定重要農產品項目、監控基準、資料來源、資料清理原則、資料採礦模式、展現方式等。•深度

以時間、地區、產量及價格為主要分析項目•作業功能 :

    1. 最新新聞閱覽     2. 文件管理     3. 量化指標

品項範圍選定原則 :  (依本年度時程考量)

11

Page 12: Case Study –Data Warehouse 遠傳電信  CDR DW POC

IBM HS22

12

資農業產銷資訊整合平台

Page 13: Case Study –Data Warehouse 遠傳電信  CDR DW POC

農委會 BI 專案範圍

(1)平台軟體

•資料庫基本功能• 資料 ETL(Extract Transform Load)功能

•多維分析功能•指標管理及儀表板功能•報表服務功能•網頁展示功能•決策支援展示功能

(2)應用範圍驗證

•資料介接功能•篩選設定重要農產品項目功能•設定農產品監控基準功能•時間序列之資料比較功能•提示功能• 資料下鑽( Drill down)或上鑽( Drill up)功能

•圖形呈現功能•排序功能

13

Page 14: Case Study –Data Warehouse 遠傳電信  CDR DW POC

14

個人化首頁

Page 15: Case Study –Data Warehouse 遠傳電信  CDR DW POC

15

農情 BI 中心

Page 16: Case Study –Data Warehouse 遠傳電信  CDR DW POC

16

農產品價格分析

Page 17: Case Study –Data Warehouse 遠傳電信  CDR DW POC

17

時間地區分解樹分析

Page 18: Case Study –Data Warehouse 遠傳電信  CDR DW POC

18

地區時間分解樹分析

Page 19: Case Study –Data Warehouse 遠傳電信  CDR DW POC

19

價格預測分析平台

Page 20: Case Study –Data Warehouse 遠傳電信  CDR DW POC

20

青蔥產量價格分析

Page 21: Case Study –Data Warehouse 遠傳電信  CDR DW POC

The Common, non-AMO pitfalls

Too large dimensionsBig Distinct Count Measure Groups

... With bad partitioningMany-to-Many dimensions

...with large intermediate fact tablesParent/Child dimension

... With too many membersNear Real Time

... With a constant, high throughput flow of dataPartitioning

... With too many partitions

See a

pattern?

Page 22: Case Study –Data Warehouse 遠傳電信  CDR DW POC

22

Excel 工具整合 時間序列預測分析

Page 23: Case Study –Data Warehouse 遠傳電信  CDR DW POC

23

Excel 工具整合 狀況分析 (What-if 假設分析 )

Page 24: Case Study –Data Warehouse 遠傳電信  CDR DW POC

異動資料擷取 (CDC)提供資料表的歷程記錄變更資訊有效監控資料狀態提供具效率的資料整理方式