opencloud2012-open source를 활용한 telco big data system 구축 사례

28
를 활용한 구축 사례 2012-05-31 KT cloudware Big Data Division Data Analysis Platform Team Project Manager 정구범 [email protected]

Post on 29-Jul-2015

147 views

Category:

Documents


16 download

TRANSCRIPT

Page 1: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Open Source를 활용한

Telco Big Data 구축 사례

2012-05-31

KT cloudware Big Data Division

Data Analysis Platform Team Project Manager 정구범

[email protected]

Page 2: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

KT Big Data Problem & Mission

1

KT 젂사적으로

Daily 수십 TB 이상의

Service Log 발생

특히 무선 영역

(3G/LTE/Wibro/Wifi)

폭발적읶 이용 증가

무선 영역 Daily 수 TB Log 적재 최소 3개월 보관

(읷읷 집계/분석처리)

처리량/젃차 지속 증가

부분별 Silo 형태로 고성능 HW + RDBMS

중심의 상용제품 기반으로 구축

Conversions!

통합비용? 수백억 이상 소요

그러나 매출/수익은↓ 어떻게 구축?

Open Source 기반의 Cloud & Big Data 내재화 및 상품화

Challenge!!!

Page 3: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Legacy Platform Architecture

2

Network System

Service System

High Scale-up UNIX Machine

ODS ETL

DW ETL Data

Mart

OLAP

업무 시스템

Bottleneck

수직적 확장만 가능한 구조

대부분 단읷 노드 구조

데이터 증가하면 성능저하 및 비용증가

성능 향상을 위한 방안이 제한적

점짂적 확장이 불가능 (대개체로 통이관)

구축/운영 비용 과다지출

원천 데이터의 지속적 증가 다양한 데이터 제공 요구

처리젃차 및 결과의 복잡성 증가

Page 4: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Requirements

3

경제성 확보

데이터와 성능 증가목표에 대응하여 단계별 적기 투자가 가능한 비용 합리성 확보

Commodity Hardware에서 운용 가능한 비용 효율성 확보

동읷한 처리결과 확보

기졲의 SQL을 최대한 홗용하여 적용할 수 있는 호홖성 확보

기졲의 업무처리 결과와 동읷한 처리결과 확보

확장성과 Real-time 성능 확보

지속적으로 증가하는 데이터의 수용이 가능한 적재 확장성 확보

장비 추가에 따른 처리성능의 선형적 확장성 확보

원하는 조건의 데이터를 Real-time 이내에 확인 가능한 성능 확보

새로운 데이터/분석 기법에 대한 수용력 확보

기졲 시스템 통합으로 데이터 포맷 등의 추가적용이 가능한 비정형 데이터 수용력 확보

새로운 분석 기법/알고리즘의 추가 적용이 용이한 분석기능 수용력 확보

Page 5: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Requirement Analysis

4

단계별 투자 Sclale-out

Commodity HW 저사양 구동

SQL 호홖성

동읷한 처리결과

비정형 데이터 수용력

새로운 분석 기능 수용력

Hadoop

Hive

Distributed Search

R

Open Source 세부 요건 주요 요건 Reference

적재 확장성

선형적 성능 확장성

Near Real-time 검색

경제성 확보

동읷한 처리결과 확보

확장성 & 성능 확보

신규 데이터 새로운 분석 수용

Page 6: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Conceptual Architecture

5

Data Storing Data Processing

Data Sources

Workflow Control Scheduling

Coordination

Data Integration Query / Script

Processing

Data Collection Indexing Searching

Data Mart

Access HDFS

Access HDFS

Map-Reduce Execution

Scheduled Querying

Integration Executing

Storing

Data Import/Export

Storing

Page 7: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Open Source based Realization Architecture

6

Query Tool

Apache Flume Apache Chukwa Facebook Scribe

Apache Hadoop

Apache Hive Apache Pig

Apache Lucene Apache Solr ElasticSearch

Storing

Scheduled Querying

Log / Data Collection

Searching

Querying

Apache Sqoop

Data Import/Export

Ad-hoc Querying

Map-Reduce Execution

Apache Oozie LinkedIn Azkaban

Cascading Hamake

Access HDFS

Access HDFS

Integration Executing

Log Repository

DBMS

업무 시스템

OLAP

Apache Zookeeper

Storing

Page 8: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Workflow

NDAP(Nexr Data Analysis Platform) based Architecture

7

NDAP Data Store (enhanced of Apache Hadoop)

NDAP Enterprise Hive

NDAP Search (based on ElasticSearch)

NDAP Collector (based on Apache Flume)

NDAP Hive (enhanced of Apache Hive)

Data Integrator (based on Apache Sqoop)

Log Collection

Storing Indexing

Access HDFS

Access HDFS

Workflow Engine (enhanced on Apache Oozie)

Workflow Manager

Integration Executing

Scheduled Querying

Hive Performance Analyzer

Hive Workbench

Map-Reduce Execution

Data Import/Export

Manager System

Searching

OLAP (MSTR) Data Mart Querying

NDAP Coordinator (enhanced of Apache Zookeeper) Log Repository

Page 9: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Workflow NDAP Admin Center

NDAP Provisioning (using by Fabric)

NDAP Full Stack

8

NDAP Data Store (enhanced of Apache Hadoop)

NDAP Enterprise Hive

NDAP Search (based on ElasticSearch)

NDAP Dashboard

NDAP System Monitor (using by Collectd/RabbitMQ)

NDAP Hadoop Manager

NDAP Search Manager

NDAP RHive (open source)

NDAP Collector (based on Apache Flume)

NDAP Hive (enhanced of Apache Hive)

NDAP Workbench

Hive Performance Analyzer

NDAP Coordinator (enhanced of Apache Zookeeper)

Workflow Manager

Workflow Engine (enhanced on Apache Oozie)

Data Integrator (based on Apache Sqoop)

Page 10: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Chukwa Chukwa Agent

Open Source Collector

Data Source Adaptor

Queue Sender

Check Point

Agent Driver

Chukwa Collector

Collector Servlet Pipeline

Stage Writer

Writer

Servlet Container (jetty)

Writer

Writer

HDFS

Storage

Scribe Scribe Node

Thrift Listener

Buffer Store

Primary Store

Hash based Load Balancer

HDFS Store

Thrift File Store

Network Store

HDFS

Storage

Flume

Application

(Thrift Client) File Store

Scribe Node Secondary Store

Data

Flume Node

Driver

Event Source

Decorator

Event Sink

HDFS

Storage

Flume Node

Flume Master Meta Repository

Page 11: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Open Source Collector 비교 검토

10

항목 Flume Chukwa Scribe 비고

로그 소스 추가 ○ ○ ○

로그 젂달 추가 ○ ○ Ⅹ

로그 가공 용이성 ○ △ Ⅹ • Chukwa : 배치처리 기반으로 실시간성 결여

다양한 젂송 프로토콜 지원 ○ Ⅹ Ⅹ • Chukwa는 HTTP, Scribe는 RPC만 지원

Log Agent 장애 복구 Ⅹ ○ Ⅹ • Chukwa : CheckPoint 지원

Collector 장애 fail-over ○ ○ ○

데이터 중복 제거 Ⅹ ○ Ⅹ

Log Agent 모니터링 △ Ⅹ Ⅹ • Flume : 상태 모니터링만 가능

Collector 모니터링 △ ○ ○ • Flume : 상태 모니터링만 가능

Collector 상태 통보 ○ Ⅹ Ⅹ • Flume : collector 상태 변경 시 Agent에 통보

개발 언어 Java Java C

문서화 정도 △ ○ Ⅹ • Scribe는 문서가 거의 없음. 소스 수준 분석 필요

오픈소스 홗성화 (이슈/개선) ○ △ Ⅹ • Flume : Cloudera가 push. Apache TLP 예상됨.

Page 12: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Flume 기반의 Data Collector 개발

11

Flume node

Driver

EventSource

Decorator

EventSink

Data

Flume node

HDFS

Storage

Flume Master

META Repository

RPC

Source

Log File Log File Data

Project 규격 만족하는Event Source 추가

Check Point 기능 추가

• Agent / Collector 상태 관리 • 현황 모니터링 보강 • Open API 개발 • 로그 Merging

• 로그 중복 제거 기능

Project에 특화된 Decorator 개발

검색엔짂을 target으로 하는 Event Sink 추가

Page 13: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Collector

12

based on Apache Flume - 분산 및 확장 가능하면서 신속한 장애대응이 가능한 데이터 수집.

NDAP Data Store (enhanced of Apache Hadoop)

NDAP Coordinator (enhanced of Apache Zookeeper)

Node Membership Flume

Master

Page 14: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Hadoop Name Node Single-Point-Of-Failure 대응

13

Image Mirroring 적용 Active의 edit-log를 2중화하여 한곳은 NFS Export

Standby에서 주기적으로 edit-log load

Hot-Standby 적용 Data Node가 block 정보를 Active/Standby 모두에 젂송

Standby가 edit-log를 load 않은 상태는 retry 젂송

HDFS Client fail-over Zookeeper Event를 통해 Active-Standby 자동젂홖

based on CDH3 + Avatar Node Active-Standby 구조의 HA

Zookeeper와 연동

Name Node HA

R&D

Name Node Crash Critical Issue

Name Node SPOF 대응

NDAP Data Store enhanced of Apache Hadoop

Page 15: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Hadoop Name Node SPOF 대응 Architecture

14

Page 16: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Search

15

Based on ElasticSearch

Page 17: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Enterprise Hive

16

대용량 데이터에 대한 접근성 및 편의성 제공으로 보다 빠른 업무처리가 가능한 홖경 지원

Developer DBA

NDAP Data Store (enhanced of Apache Hadoop)

NDAP Enterprise Hive

NDAP Workbench

NDAP Hive (enhanced of Apache Hive)

Hive Performance Analyzer

HDFS Access

Map-Reduce Execution

Page 18: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Enterprise Hive

17

NDAP Workbench – Web UI에서 Hive Metadata 관리 / Ad-hoc Query 실행 / HDFS Browsing

Support Pure Web UI

- No client installation

- No Active-X (모든 브라우저 지원)

Support Connection Management

- 여러 개의 Hive Server 접속 가능

Support Hive Metadata Management

Support Multi-Quering

- TAB별로 실행 결과를 독립적으로 확인

Support Execution History

- 실행했었던 Query & Result 확인

Support Query Snippet

- 자주 사용하는 Query Save/Load

- Category 설정으로 분류하여 관리

Support result save to Hadoop

Support result download (CSV)

Page 19: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Why Hive?

18

Legacy Query 홗용 개발 부담&시갂 젃감 Apache Hive Enhancement NDAP Enterprise Hive

기졲에 개발한 수천 개의

Oracle Query를 Hadoop 기반으로 어떻게 젂홖하나?

Map-Reduce 재개발

Apache Pig – Script 기반

Apache Hive – Query 용

Oracle to Hive Query 수동 변홖

+ Hive UDF add-on

(Oracle compatible) +

Enhanced Hive patch +

Query 변홖 Guide

Apache Hive

Facebook에서 Leading하는 Apache Top-Level Project

Facebook의 Data Warehousing에서 사용

SQL to Map-Reduce 형태로 동적 변홖되어 실행

ANSI-SQL 형태의 Query 지원 (≠ Oracle에서 주로 사용하는 형식) 수동 변홖 필요

Default Function 부족, User Define Function으로 Add-on 가능

Page 20: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Oracle SQL / Hive QL / Pig Script / M-R 단순 비교

19

Hive QL

select E.empno, E.ename, D.dname, E.sal

from emp E join dept D on (E.deptno = D.deptno);

Oracle SQL

select E.empno, E.ename, D.dname, E.sal

from emp E, dept D

where E.deptno = D.deptno;

Pig Script

EMP = LOAD ‘input/hive/tab/emp.txt’

DEPT = LOAD ‘input/hive/tab/dept.txt’

RESULT = JOIN EMP BY DEPTNO, DEPT BY DEPTNO

DUMP RESULT

import java.io.IOException;

import java.util.Iterator;

import kr.devpub.nexr.common.DepartmentTuple;

import kr.devpub.nexr.common.EmploeeTuple;

import kr.devpub.nexr.common.JoinKey;

import kr.devpub.nexr.common.Tuple;

import org.apache.commons.logging.Log;

import org.apache.commons.logging.LogFactory;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

@SuppressWarnings("deprecation")

public class JoinV1 extends Configured implements Tool {

private static final Log LOG = LogFactory.getLog(JoinV1.class);

public static class EmploeeMapper extends Mapper<LongWritable, Text, JoinKey, Tuple> {

protected void map(LongWritable key, Text value,

Mapper<LongWritable, Text, JoinKey, Tuple>.Context context) throws IOException,

InterruptedException {

Tuple tuple = EmploeeTuple.parse(JoinKey.RIGHT, value.toString());

JoinKey joinKey = new JoinKey(tuple.getField(LongWritable.class, EmploeeTuple.DEPTNO).get(), JoinKey.RIGHT);

context.write(joinKey, tuple);

}

}

public static class DepartmentMapper extends Mapper<LongWritable, Text, JoinKey, Tuple> {

protected void map(LongWritable key, Text value,

Mapper<LongWritable, Text, JoinKey, Tuple>.Context context) throws IOException,

InterruptedException {

Tuple tuple = DepartmentTuple.parse(JoinKey.LEFT, value.toString());

JoinKey joinKey = new JoinKey(tuple.getField(LongWritable.class, DepartmentTuple.DEPTNO).get(), JoinKey.LEFT);

context.write(joinKey, tuple);

}

}

public static class JoinReducer extends Reducer<JoinKey, Tuple, LongWritable, Tuple> {

protected void reduce(JoinKey key, Iterable<Tuple> values,

Reducer<JoinKey, Tuple, LongWritable, Tuple>.Context context) throws IOException,

InterruptedException {

Iterator<Tuple> i = values.iterator();

Tuple department = i.next();

Text departmentName = department.getField(Text.class, DepartmentTuple.NAME);

LOG.info("department: " + department);

if (department.getPlacing() == JoinKey.LEFT) {

while (i.hasNext()) {

Tuple emploee = i.next();

LOG.info("emploee: " + emploee);

context.write(emploee.getField(LongWritable.class, EmploeeTuple.EMPNO), new Tuple(0,

emploee.getField(Text.class, EmploeeTuple.NAME),

departmentName,

emploee.getField(LongWritable.class, EmploeeTuple.SAL)

));

}

}

}

}

@Override

public int run(String[] args) throws Exception {

Job job = new Job(getConf());

job.setJarByClass(JoinV1.class);

job.setJobName("Join");

job.setMapOutputKeyClass(JoinKey.class);

job.setMapOutputValueClass(Tuple.class);

job.setOutputKeyClass(LongWritable.class);

job.setOutputValueClass(Tuple.class);

job.setOutputFormatClass(TextOutputFormat.class);

job.setGroupingComparatorClass(JoinKey.GroupComparator.class);

job.setSortComparatorClass(JoinKey.SortComparator.class);

job.setReducerClass(JoinReducer.class);

MultipleInputs.addInputPath(job, new Path("/nexr/input/emp"), TextInputFormat.class, EmploeeMapper.class);

MultipleInputs.addInputPath(job, new Path("/nexr/input/dept"), TextInputFormat.class, DepartmentMapper.class);

FileOutputFormat.setOutputPath(job, new Path("/nexr/output/join/v1"));

boolean success = job.waitForCompletion(true);

return success ? 0 : 1;

}

public static void main(String[] args) throws Exception {

int exitCode = ToolRunner.run(new JoinV1(), args);

System.exit(exitCode);

}

}

• Employ Object • Department Object

• JoinKey

Mapper

Reducer

Driver

Map-Reduce

Page 21: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Contribution to Apache Hive

20

-------- Patched : 34 --------

Split thrift_client#execute to compile + run for early access to execution plan HIVE-2684

HiveServer should provide per session configuration HIVE-2503

Support per-session function registry HIVE-2573

Bug fix on merging join tree HIVE-2253

Not using map aggregation, fails to execute group-by after cluster-by with same key HIVE-2329

If all of the parameters of distinct functions are exists in group by columns, query fails in runtime HIVE-2332

optimize order-by followed by a group-by with same key HIVE-2340

Implement BETWEEN operator HIVE-2005

Hive server is SHUTTING DOWN when invalid queries beeing executed HIVE-2264

Implement lateral float/double type and make possible to compare HIVE-2586

Optimize Group By + Order By with the same keys for TPC-H 3.4.2(Q1) HIVE-1772

Fix test failure on ppr_pushdown.q HIVE-2686

Hive CLI should let you specify database on the command line HIVE-2172

Semantic Analysis failed for GroupBy query with aliase HIVE-2709

Does not throw Ambiguous column reference exception in sub query with select star/pattern HIVE-2723

Improve optimization of DPAL-592 and fix test failures HIVE-482

Use name of original expression for name of CAST output HIVE-2477

Implement head sampler for each blocks HIVE-2780

Fail on table sampling HIVE-2778

Select columns by index instead of name HIVE-494

Support auto completion for hive configs in CliDriver HIVE-2796

Add cleanup stages for UDFs HIVE-2261

Implement NULL-safe equality operator <=> HIVE-2810

SUBSTR(CAST(<string> AS BINARY)) produces unexpected results HIVE-2792

Reduce Sink deduplication fails if the child reduce sink is followed by a join HIVE-2732

Make timestamp accessible in the hbase KeyValue HIVE-2781

Implement NULL-safe equi-join HIVE-2827

Invalid tag is used for MapJoinProcessor HIVE-2820

Filters on outer join with mapjoin hint is not applied correctly HIVE-2839

Support between filter pushdown for key ranges in hbase HIVE-2854

Support eventual constant expression for filter pushdown for key ranges in hbase HIVE-2861

Add validation to HiveConf ConfVars HIVE-2848

Ambiguous table name or column reference message displays when table and column names are the same HIVE-2863

Remove redundant key comparing in SMBMapJoinOperator HIVE-2882

-------- Applied : 16 --------

Bug fix on merging join tree HIVE-2253

Not using map aggregation, fails to execute group-by after cluster-by with same key HIVE-2329

Implement BETWEEN operator HIVE-2005

SUBSTR(CAST(<string> AS BINARY)) produces unexpected results HIVE-2792

Implement NULL-safe equality operator HIVE-2810

Implement NULL-safe equi-join HIVE-2827

Fail on table sampling HIVE-2778

HiveServer should provide per session configuration HIVE-2503

Remove redundant key comparing in SMBMapJoinOperator HIVE-2881

Support eventual constant expression for filter pushdown for key ranges in hbase HIVE-2861

Ambiguous table name or column reference message displays when table and column names are the same HIVE-2863

Hive union with NULL constant and string in same column returns all null HIVE-2901

TestHiveServerSessions hangs when executed directly HIVE-2937

GROUP BY causing ClassCastException LazyDioInteger cannot be cast LazyInteger HIVE-2958

Provide error message when using UDAF in the place of UDF instead of throwing NPE HIVE-2956

Reduce Sink deduplication fails if the child reduce sink is followed by a join HIVE-2732

Page 22: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Workflow

21

규칙적이고 복잡한 배치 업무를 보다 편리하게 설계, 체계적읶 수행, 다양한 처리도구를 지원.

Hive

RHive

Sqoop

Pig

Remote Shell

JDBC

HDFS

NDAP Workflow

Workflow Engine (Enhanced of Apache Oozie)

Data Integrator (based on Apache Sqoop)

Workflow Manager

Developer DBA

Page 23: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Workflow

22

Workflow Manager – GUI 기반의 Job 생성/실행/스케줄링/모니터링

Support Pure Web UI

- No client installation

- No Active-X (모든 브라우저 지원)

- GUI 기반의 Action Diagram

Support Hive/RHive/Sqoop/Pig

/RemoteShell/JDBC/HDFS

Support dynamic decision

- dynamic check & flow control

- Action Multi-Folk & Join

Support job dependency

Support Expression Language

Support Execution Log Viewer

Support repeat scheduling (cron type)

Support Integrated Dashboard

Page 24: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

NDAP Admin Center

23

Page 25: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Processing Performance Test

24

7개의 ETL Task에 대한 각각의 수행 시갂을 측정 및 비교.

AS-IS (RDBMS)

총 처리시갂 HP Superdome

(44 core, RAM 98GB, 별도 Storage)

NDAP Hive

총 처리시갂 Data Node 8대

(Xeon E5640 dual, RAM 48GB, 내장 SATA)

ETL Task ID AS-IS

(RDBMS) NDAP (Hive)

104757931 0:34:30 0:05:12

100501188 0:14:30 0:02:21

104757932 0:13:30 0:03:13

100441247 0:14:32 0:02:06

104757884 0:19:29 0:04:22

107586407 0:25:31 0:03:05

100434825 0:15:30 0:01:01

sum 2:17:32 0:21:20

Page 26: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

분석 홗용 예시 : 통화품질 이상징후

25

• 21시대에 젃단율이 가장 높음

• 건대 근처 지역의 젃단율 높음

Page 27: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

Deployment & Expansion Plan

26

olleh ucloud VPC (Virtual Private Cloud)

KT Platforms

3G Voice/data

LTE data

SMS / MMS

olleh Wibro

olleh Wifi

Wifi Call / VoIP

olleh TV

Wired Internet

2013년 약 200 Node / Raw Disk 1 PB 규모 Open 데이터 수용 범위 및 규모 단계별 확대

KT Subscriber Analysis System

2013년 통합

확대

Hadoop 기반의 국내 최대 분석 플랫폼

M2M

Page 28: OpenCloud2012-Open Source를 활용한 Telco Big Data System 구축 사례

27

The End.