hadoop & neptune feb. 2009 김형준

20
Hadoop & Neptune Feb. 2009 http://www.openneptune.com http://www.jaso.co.kr 김김김

Upload: cody-lynch

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

More CPU Faster Disk Program Tuning More Memory

TRANSCRIPT

Page 1: Hadoop & Neptune Feb. 2009   김형준

Hadoop & Nep-tune

Feb. 2009http://www.openneptune.com

http://www.jaso.co.kr

김형준

Page 2: Hadoop & Neptune Feb. 2009   김형준

The Data 'Tsunami'

Page 3: Hadoop & Neptune Feb. 2009   김형준

More CPU

Faster DiskProgram Tuning

More Memory

Page 4: Hadoop & Neptune Feb. 2009   김형준

Uninstall

Page 5: Hadoop & Neptune Feb. 2009   김형준

Where?Distributed File System

How?Distributed/Parallel Computing

Page 6: Hadoop & Neptune Feb. 2009   김형준

Hadoop DFSUnlimited StorageNo Backup, Self-healingThousands NodesBut, No POSIXNo Random write

Page 7: Hadoop & Neptune Feb. 2009   김형준

: machine: daemon process

NameNode(DFS Master)

JobTracker(Job Master)

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

SecondaryNameNode

ClientAPIcontrol

datacontrol

data

Page 8: Hadoop & Neptune Feb. 2009   김형준

Hadoop MapReduce1TB group by -> 10 분

More Machine -> 1 분

Page 9: Hadoop & Neptune Feb. 2009   김형준

• map (k1,v1) → list(k2,v2)• reduce (k2, list (v2)) → result value

This is a book. That book is on the desk.I like that book.

This is a book. That book is on the desk.

I like that book.

(This,1)(book, 1)(That, 1)(book, 1)…

(I,1)(that, 1)(book, 1)…

map()

map()

(book, [1,1,1])…(is, [1,1])…(This,[1])

(book, 3)…(is, 2)…(This,1)

reduce()

Exec distributed/parallelMap&Reduce execution platform

Split

PartitionMergeSort

Page 10: Hadoop & Neptune Feb. 2009   김형준

: machine: daemon process

NameNode(DFS Master)

JobTracker(Job Master)

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

DataNode(DFS Slave)

TaskTracker(Task Mgmt.)

Local Disk

SecondaryNameNode

ClientAPIcontrol

datacontrol

data

Page 11: Hadoop & Neptune Feb. 2009   김형준

A piece of Cake

Page 12: Hadoop & Neptune Feb. 2009   김형준

NeptuneDatabase running on DFS(Hadoop)Unlimited Structured DataNo Backup

But, No JOIN, No SQLNo Multiple row operationNo Aggregation function

Page 13: Hadoop & Neptune Feb. 2009   김형준

OperationCreate/Drop Tableput/getlike/betweenscan/merge scan(join)MapReduce

Page 14: Hadoop & Neptune Feb. 2009   김형준

Why Neptune?

Tablet A-3

Tablet A-N

Tablet A-2

TabletA-1

TableA

JobTracker

Make Map&Reduce function

Run on Map&Reduce framework

META Table Get tablet list

Map Task

TaskTracker

Map TaskMap Task

Map Task

TaskTracker

Map TaskMap Task

Map Task

TaskTracker

Map TaskMap Task

Task assign to each node

TaskTracker

ReduceTask

TaskTracker

ReduceTask

TableB

Tablet B-2

Tablet B-1

분산 / 병렬처리: Speed, Scalability

Page 15: Hadoop & Neptune Feb. 2009   김형준

분산파일시스템 (Hadoop or other)

TabletServer #1TabletServer #2 TabletServer #n

Cluster Management System

NeptuneMaster

분산 / 병렬컴퓨팅 플랫폼(Hadoop)

사용자 애플리케이션

Neptune( 대용량분산 데이터 저장소 )

논리적 Table

물리적 저장소

Page 16: Hadoop & Neptune Feb. 2009   김형준

When use NeptuneLarge DataOnline put/get and analysisLess complex

Google Personalized SearchGoogle analytics

Page 17: Hadoop & Neptune Feb. 2009   김형준

Finding developer

Page 18: Hadoop & Neptune Feb. 2009   김형준

Cheap Hardware and Smart SoftwareUse cheap commodity hardware frequent failureDevelop smart software for reducing the cost of failure

Easy ManagementHigh Scalability by automatic discovery of new servers and racksHigh Redundancy for failure of servers, racks, even data centers

Speed and Then More SpeedHigh speed with low cost Rapid development and deployment of new products

Use existing technologiesUse techniques from the leading edge of computer scienceUse open source codes as a starting point

Principle of Google Infra

Page 19: Hadoop & Neptune Feb. 2009   김형준

Google Infra

Google Linux

GFS

Bigtable

Map & Reduce Client API

Chubby

Cluster M

gmt

Batch applica-tion Online Services

HardwareLow-end commodity servers40 or more pizza box server per rack

Google’s core competencyGoogle’s software stack

Page 20: Hadoop & Neptune Feb. 2009   김형준

Q&A