hadoop inside

Hadoop Inside

TC 데이터플랫폼실 GFIS팀

이은조

What is Hadoop

Hadoop is a Framework & System for

parallel processing of

large amounts of data in

a distributed computing environment

http://searchbusinessintelligence.techtarget.in/tutorial/Apache-Hadoop-FAQ-for-BI-professionals

Apache project

open source

java based

google system clone

GFS -> HDFS

MapReduce -> MapReduce

Distributed Processing System

How to process data in distributed environment

how to read/write data

how to control nodes

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

process error, network error, hardware error, …

error handling

temporary error: retry -> duplication, data corruption, …

permanent error: fail over(which one?)

process hang: timeout & retry

• too long -> long response time

• too short -> infinite loop

Hadoop System Architecture

Tracker

: Node : Process : Heart Beat : Data Read/Write

Secondary

HDFS + MapReduce

vs. Filesystem

inode – namespace

cylinder / track – data node

blocks(bytes) – blocks(Mbytes)

Features

very large files

write once, read many times

support for usual file system operations

ls, cp, mv, rm, chmod, chown, put, cat, …

no support for multiple writers or arbitrary modifications

Block Replication & Rack Awareness

1 2 3 4

4 : Server

: Rack

: File

: Block

HDFS - Read

Data Node

: Node : Data Block : Data I/O

Data Read

: Operation Message

Client

Data Node

1. Read Request

2. Response

3. Reqeust Data

4. Read Data

HDFS - Write

Data Node

Data Write

Client

Data Node

1. Write Request

2. Response

3. Write Data

4. Write Replica

5. Write Done

: Node : Data Block : Data I/O : Operation Message

HDFS – Write (Failure)

Data Node

Data Write

Client

Data Node

1. Write Request

2. Response

3. Write Data

4. Write Replica

5. Write Done

HDFS – Write (Failure)

Data Node

Data Write

Client

Data Node

Write Replica

Delete Partial block

Replica Arrangement

MapReduce

Definition

map: (+1) [ 1, 2, 3, 4, …, 10 ] -> [ 2, 3, 4, 5, …, 11 ]

reduce: (+) [ 2, 3, 4, 5, …, 11 ] -> 65

Programming Model for processing data sets in Hadoop

projection, filter -> map task

aggregation, join -> reduce task

sort -> partitioning

Job Tracker & Task Trackers

master / slave

job = many tasks

# of map tasks = # of file splits (default: # of blocks)

# of reduce tasks = user configuration

MapReduce

: Distributed File System

: Split

: Input Data Record

: Map Task

: Reduce Task

: Shuffling & Sorting

: Map Output Record (Key/Value pair)

: Reduce Output Record (Key/Value pair)

Map / Reduce Task

: Partition

MapReduce

: Split

: Input Data Record

: Map Task

: Reduce Task

Map / Reduce Task

: Partition

MapReduce

: Split

: Input Data Record

: Map Task

: Reduce Task

Map / Reduce Task

: Partition

MapReduce

: Split

: Input Data Record

: Map Task

: Reduce Task

Map / Reduce Task

: Partition

MapReduce

: Split

: Input Data Record

: Map Task

: Reduce Task

Map / Reduce Task

: Partition

MapReduce

: Split

: Input Data Record

: Map Task

: Reduce Task

Map / Reduce Task

: Partition

MapReduce

: Split

: Input Data Record

: Map Task

: Reduce Task

Map / Reduce Task

: Partition

Mapper - partitioning

double indexed structure

Spill Thread

data sorting: 2nd index (quick sort)

spill file generating

spill data file & index file

merge sort (by key) per partition

key value key value … key value

partition key offset

value offset

partition key offset

value offset

key offset

Output Buffer (default: 100Mb)

1st Index

2nd Index

TaskTracker (reduce task)

Reducer –fetching

GetMapEventsThread

map event listener

MapOutputCopier

data fetching from completed mapper (HTTP)

concurrent running in some threads

Merger

key sorting (heap sort)

TaskTracker

(map task)

TaskTracker

(map task)

TaskTracker

(map task)

Tracker

Copier

completion events

HTTP - GET

Reducer

Job Flow

Client

MapReduce

Program

Tracker

Reduce

1. runJob

3. submit job

5. add job

6. heartbeat

7. assign task

9. launch

10. run

Shared

File System 2. copy job

resources

4. retrieve

input spilts

8. retrieve

job resources

: Node

: Class

: Job Queue

: Method Call

11. read data/

write result

: Task

Client Node

JobTracker Node

TaskTracker Node

Monitoring

Heart beat

task tracker status checking

task request / alignment

other commands (restart, shudown, kill task, …)

Cluster Status

Job / Task Status

JobInProgress

TaskInProgress

Reporter & Metrics

Black list

Monitoring (Summary)

Heart beat

task tracker status checking

task request / alignment

other commands (restart, shudown, kill task, …)

Cluster Status

Job / Task Status

JobInProgress

TaskInProgress

Reporter & Metrics

Black list

Monitoring (Cluster Info)

Monitoring (Job Info)

Monitoring (Task Info)

Task Scheduler

job queue

red-black tree ( java.util.TreeMap)

sort by priority & job id (request time)

load factor

remain tasks / capacity

task alignment

high priority

new task > speculative execution task > dummy splits task

map task (local) > map task (non-local) > reduce task

padding

padding = MIN(total tasks * pad faction, task capacity)

for speculative execution

Error Handling

configurable (default 4 times)

Timeout

configurable

Speculative Execution

current – start >= 1 minute

average progress – progress > 20%

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

error handling

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

error handling

HDFS Client master / slave

replication / rack awareness job scheduler

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

error handling

heart beat job/task status

reporter / metrics

load balancing

Monitoring

node status

task status

Fault tolerance

error detection

error handling

black list time out & retry

speculative execution

Limitations

map -> reduce network overhead

iterative processing

full(or theta) join

small size but many splits data

Low latency

polling & pulling

job initializing

optimized for throughput

job scheduling

data access

hadoop inside

Technology

hadoop ecosystem

inside inside inside inside inside inside inside inside...

stuart pérez a12729. agenda que es hadoop porque usarlo...

greenplum hadoop

spark hadoop

hadoop tutorial

hadoop aus it-operations-sicht - teil 1 (hadoop-grundlagen)

hadoop testing

hadoop overview

bigdata & hadoop

estudo hadoop

hadoop operations

hadoop 2.0 introduction – with hdp for...

big$data$processing$using$ hadoop$ -...

hadoop management console from ebay at china hadoop 2015

introduction to hadoop 2.0 & yarn | hadoop 2.0 & yarn...

hadoopソースコードリーディング　2回目　　...

hadoop presentation

apache hadoop - conceitos teóricos e práticos, evolução...

introducción a hadoop -...