hadoop - past, present and future - v1.2

7/12/14

!  Prepared for: v Orange County Java Users Group

!  Presented by:

v “Big Data Joe” Rossi v @bigdatajoerossi

Hadoop Past, Present and Future

Roadmap

~1 hour

1-‐ What Makes Up Hadoop 1.x?

2-‐ What’s New In Hadoop 2.x?

3-‐ The Future Of Hadoop …

What Makes Up Hadoop 1.x?

Hadoop 1.0: HDFS + MapReduce

NameNode

DataNode / TaskTracker DataNode / TaskTracker

JobTracker

Client 1-‐1

1-‐2 1-‐3

Hadoop 1.0: HDFS + MapReduce

NameNode

DataNode / TaskTracker DataNode / TaskTracker

JobTracker

Client 1-‐1 1-‐2

1-‐3

Reduce Map

2-‐1 3-‐2 3-‐3 4-‐1

2-‐3 4-‐2 2-‐2 3-‐1 4-‐3

Reduce Map

MapReduce v1 LimitaTons Scalability Maximum cluster size is 4,000 nodes and maximum concurrent tasks is 40,000

Availability JobTracker failure kills all queued and running jobs

Resources ParVVoned into Map and Reduce Hard parTToning of Map and Reduce slots led to low resource uVlizaVon

No Support for Alternate Paradigms / Services Only MapReduce batch jobs, nothing else

HADOOP 1.0

Single Use System Batch Apps

Apache Hadoop 1.0: Single Use System

HDFS (redundant, reliable storage)

MapReduce (cluster resource management and data

processing)

Pig Hive

What’s New In Hadoop 2.x?

YARN Replaces MapReduce

Yet Another Resource NegoVator

YARN will be the de-‐facto distributed operaVng system for Big Data

Store DATA in one place

YARN: Taking Hadoop Beyond Batch

Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service

ApplicaTons Run NaTvely IN Hadoop

HDFS2 (redundant, reliable storage)

YARN (cluster resource management)

BATCH (MapReduce)

INTERACTIVE (Tez)

ONLINE (HBase)

STREAMING (DataTorrent)

GRAPH (Giraph)

Running all on the same Hadoop cluster to give applicaVons access to all the same source data!

YARN: ApplicaTons

MapReduce v2

Stream Processing

Master-‐Worker Online

In-‐Memory

Apache Storm

YARN: Moving Quickly Conceived at Yahoo!

Alpha Releases – 2.0

Beta Releases – 2.1 GA Released – 2.2

100,000+ nodes, 400,000+ jobs daily 10 million+ hours of compute daily

Version 2.3 Version 2.4

YARN: Dr. Evil Approved

YARN: How It Works

ResourceManager

NodeManager

ApplicaVonMaster

NodeManager

NodeManager NodeManager

Scheduler

Container

Container Container

Client

YARN: What Has Changed? YARN MRv1 RM

ResourceManager

AM ApplicaVonMaster

JT JobTracker

Scheduler Scheduler

NM NodeManager

TT TaskTracker

Container Map

Reduce

ResourceManager

Scheduler

JobTracker

Scheduler

NodeManager

ApplicaVonMaster

TaskTracker

Map Reduce

NodeManager

Container Container

TaskTracker

Map Reduce

!  Scale !  New programming models and services

!  Improved cluster uVlizaVon !  Agility !  Backwards compaVble with MapReduce v1

!  Mixed workloads on the same source of data

6 Benefits of YARN

The Future of Hadoop Projects and Roadmap

Speed Deliver interacTve query performance.

SQL on Hadoop

SQL Support array of SQL semanTcs for analyTc applicaTons running against Hadoop.

Scale SQL interface to Hadoop designed for queries that scale from Terabytes to Petabytes

Hive on Apache Tez Hortonworks

Next Gen SQL on Hadoop

Hive on Apache Spark Cloudera

Cloudera Impala Cloudera

Apache Drill MapR

Dynamic Scaling On-‐demand cluster size. Increase and decrease the size with load.

HOYA: HBase (NoSQL) on YARN

Easier Deployment APIs to create, start, stop and delete HBase clusters.

Availability Recover from Region Server loss with a new container.

Machine Learning Framework well suited for building machine learning jobs.

Microsog REEF

Scalable / Fault Tolerant Makes it easy to implement scalable, fault-‐tolerant runTme environments for a range of computaTonal models.

Maintain State Users can build jobs that uTlize data from where it’s needed and also maintain state ager jobs are done.

Retainable Evaluator ExecuTon Framework

Heterogeneous Storages in HDFS

NameNode

Storage

NameNode

SATA SSD Fusion IO

!  Apache Hadoop 2.5

v NodeManager Restart w/o disrupTon v Dynamic Resource ConfiguraTon

!  Apache Hadoop 2.6

v Memory As Storage Tier v Support For Docker Containers

Hadoop Roadmap

Q3 2014

Q4 2014

I Know You Have QuesVons … No such thing as a stupid quesVon.

Hadoop: Past, Present and Future

OC Big Data Meetup

One Last Thing …

meetup.com/ocbigdata 3rd Wednesday Of The Month Next: July 16st @ 5:45P

Thank You!

Hadoop: Past, Present and Future

Big Data Joe Rossi hkp://bigdatajoe.io/ @bigdatajoerossi

hadoop - past, present and future - v1.2

Data & Analytics

curso sqr v1.2

lin spec v1.2

esperanza matemática, v1.2

getting started v1.2

introduction to hadoop 2.0 & yarn | hadoop 2.0 & yarn...

hadoopソースコードリーディング　2回目　　...

persuasion ppt -v1.2

share plan v1.2

presentazione instapro v1.2

curso hadoop. fcojavierlahozsevilla v1.0.pdf ·...

stuart pérez a12729. agenda que es hadoop porque usarlo...

big$data$processing$using$ hadoop$ -...

merch trade v1.2

hadoop 2.0 introduction – with hdp for...

ex8l manual v1.2

la virtualisation v1.2

sistema piso v1.2

sef trubach v1.2

assetbase-id v1.2

cmmi v1.2 português