alluxio

14

Upload: christophe-marchal

Post on 24-Jan-2017

18 views

Category:

Technology


0 download

TRANSCRIPT

Berkeley Data Analytics Stack

Supported Storages and Framework

Memory-Centric distributed storage system

Storage (hdfs,s3,...)

AlluxioBlock 1 Block 2 Block 3

Spark Memory

Spark Job A

Block 1 Block 2 Spark Memory

Spark Job B

Block 1 Block 3

Architecture

● Metadata● Workflow ManagerMaster

Worker Worker Worker

Tiered Storage

Master

Worker Worker Worker

Memory SSD HDD

Unified and Transparent Namespace

Resiliency: Master

Master

Worker Worker

Master

Worker

Active Passive

Write Read

Journal

Resiliency: Lineage

FileSet A

FileSet C

Spark Job

FileSet B

FileSet D

Spark Job

MapReduce Job FileSet EX

Code

Bigger Case

● Mem+HDD● 100+ nodes● 1 PB+ managed space● 30x Perf improvement

Where is the code?

Thanks!