図でわかるhdfs erasure coding

61
図でわかる HDFS Erasure Coding Kai Sasaki Treasure Data Inc.

Upload: kai-sasaki

Post on 21-Apr-2017

3.772 views

Category:

Data & Analytics


0 download

TRANSCRIPT

図でわかるHDFS Erasure Coding

Kai Sasaki Treasure Data Inc.

Who am I

佐々木 海(Kai Sasaki)

Software Engineer at Treasure Data Inc. http://www.treasuredata.com

Hadoop, Spark, DL4J

Agenda

• Erasure Coding

• Under the Namespace

• Writing Side

• Reading Side

Erasure Coding

Replication

Block

HDFS

Replication

Block

Block

Block

Block

HDFS

Replication

Block

Block

Block

Block

HDFS

Replication

Block

Block

Block

Block

HDFS

CapacityOverhead

x3

Replication

Block

Block

Block

Block

HDFS

Redundancy 2

CapacityOverhead

x3

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

Block

RS-6-3

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

Block6 out of 9

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

Block6 out of 9

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

Block6 out of 9

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

Block6 out of 9

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

BlockRedundancy 3

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

Block

CapacityOverhead

x1.5Redundancy 3

Erasure Coding

Block

Block

HDFS

Block

Block

Block

Block

Block

Block

Block

Block

BlockGroup

Under the Namespace

INode and BlockInfo

BlockInfo

INode

INode and BlockInfo

BlockInfo

INode

BlockInfo BlockInfo…

INode and BlockInfo

BlockInfo

INode

BlockInfo BlockInfo…

Block

Block

Block

INode and BlockInfo

BlockInfo

INode

BlockInfo BlockInfo…

Block

Block

Block

BlockGroup

BlockInfo

Block

Block

Block

BlockGroup

long BlockId0 64

BlockInfo

Block

Block

Block

BlockGroup

long BlockId0 64

BlockInfo

Block

Block

Block

BlockGroup

index GroupId4bit 60bit

long BlockId0 64

BlockInfo

Block

Block

Block

BlockGroup

index GroupId4bit 60bit

index 0

index 2index 1

Saving memory

Writing Side

Data

0

Data64KB

0

Data

BlockGroup

64KB

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data Block

Data

BlockGroup

0

Data Block Parity Block

Data

BlockGroup

0

Data Block Parity Block

Stripe

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

Data

BlockGroup

0

0

Data

BlockGroup

0

0

Data

BlockGroup

0

0

Saving diskspace usage

Reading Side

BlockGroup

0

0

BlockGroup

0

0

200kb 500kb

BlockGroup

0

0

200kb 500kb

BlockGroup

0

0

200kb 500kb

BlockGroup

0

0

200kb 500kb

BlockGroup

0

0

200kb 500kb

BlockGroup

0

0

200kb 500kb

BlockGroup

0

0

200kb 500kb

BlockGroup

0

0

200kb 500kb

Saving reading time

まとめ

• Namespace -> Saving memory BlockInfoStriped, BlockIdManager

• Writing Side -> Saving diskspace usage INodeFile

• Reading Side -> Saving reading time DFSStripedInputStream

ありがとうございました