introduction to hadoop 2.0 & yarn | hadoop 2.0 & yarn fundamentals | hadoop 2.0 & yarn...

22
© 2015 BlueCamphor Technologies (P) Ltd. Hadoop 2.0 & Yarn

Upload: skillspeed

Post on 07-Aug-2015

168 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd.

Hadoop 2.0 & Yarn

Page 2: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Session Objectives

ᗍ Introduction to Big Data and Hadoop

ᗍ Understanding Hadoop 2.0 and its features

ᗍ Understanding the differences between Hadoop 1.x and 2.x

ᗍ Understanding YARN

Page 3: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Big Data and its Challenges

Page 4: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Big Data and its Challenges

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications

Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information

It’s very difficult to manage such huge data……

Get Started with BIG Data & Hadoop

Page 5: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Who Generates Big Data?

Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?Today, it is becoming a problem for all of us to manage such BIG DATA….Get Started with BIG Data & Hadoop

Page 6: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop can be used for easy processing of such huge Data…..We will answer how?

Before that let’s understand what is Hadoop?

Get Started with BIG Data & Hadoop

Page 7: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop and its Characteristics

Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model

It is an Open-source Data Management technology with scale-out storage and distributed processing

Hadoop Characteristi

cs

Flexible

Reliable

Economical

Scalable Get Started with BIG Data & Hadoop

Page 8: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop Ecosystem

Flume Sqoop

Import Or Export

Unstructured or Semi-Structured data Structured Data

Apache Oozie (Workflow)

HDFS(Hadoop Distributed File System)

Pig LatinData Analysis

HiveDW System

MapReduce Framework HBase

Other YARN

Frameworks (MPI, GIRAPH)

YARNCluster Resource Management

Get Started with BIG Data & Hadoop

Page 9: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Next Generation Hadoop

Get Started with BIG Data & Hadoop

Page 10: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop 1.x

Client

NameNode Secondary NameNode

Job Tracker

Data Node Data Node

Task Tracker

Map Reduce

Task Tracker

Map Reduce

Task Tracker

Map Reduce

Data Node

Task Tracker

Map Reduce

Data Node

Data Blocks

…….

HDFS Map Reduce

Get Started with BIG Data & Hadoop

Page 11: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Challenges for Hadoop 1.x

Problem Description

NameNode – No horizontal Scalability

Single NameNode and Single Namespaces, limited by NameNode RAM

NameNode – No high Availability (HA)

NameNode is single point of failure, need manual recovery using Secondary NameNode in case of failure

Job Tracker – Overburdened Spends significant amount of time and effort managing the life-cycle of applications

MRv1 – Only Map and Reduce TasksHumongous amount of data stored in HDFS remains unutilized and cannot be used for other workloads such as graph processing etc.

Get Started with BIG Data & Hadoop

Page 12: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop 2.x Features

Property Hadoop 1.0 Hadoop 2.0

Federation One Namenode and Namespaces

Multiple Namenode and Namespaces

High Availability Not Present Highly Available

YARN – Processing Control and Multi-tenancy

JobTracker, Task TrackerResource Manager, Node Manager, App Master, Capacity Scheduler

Other Important Hadoop 2.0 Features

ᗍ HDFS Snapshots

ᗍ NFSv3 access to data in HDFS

ᗍ Support for running Hadoop on MS Windows

ᗍ Binary Compatibility for MapReduce applications built on Hadoop 1.0

ᗍ Substantial amount of Integration testing with rest of the projects (Such as PIG, HIVE) in Hadoop ecosystem

Get Started with BIG Data & Hadoop

Page 13: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

HDFS 1.x Vs 2.x

Pool k Pool n

NS 1 NS k NS n

NN-1 NN-k NN-n

Block Pools

DataNode 1

….

DataNode 2

….

DataNode m

….

Common Storage

Blo

ck S

tora

ge

Nam

esp

ace

…. ….

Hadoop 2.0

NameNode

NS

Block Management

.….

Storage

Nam

esp

ace

Blo

ck S

tora

ge

Hadoop 1.0

Pool 1

Datanode Datanode

Get Started with BIG Data & Hadoop

Page 14: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop 2.x – High Availability

Client

Secondary NameNode

ActiveNameNode

Shared edit logs

StandbyNameNode

Resource Manager

Data Node Data Node

Node Manager

ContainerApp

Master

Node Manager

ContainerApp

Master

Node Manager

ContainerApp

Master

Node Manager

ContainerApp

Master

Data Node Data Node

HDFS YARN

Read edit logs and applies to its own namespace

All name space edits logged to shared NFS storage; single writer (fencing)

Next Generation MapReduce

NameNode High

Availability

**Not necessary to configure secondary NameNode

Get Started with BIG Data & Hadoop

Page 15: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop 1.x Vs 2.x Ecosystem

Apache Oozie (Workflow)

HIVE DW System

Pig LatinData

Analysis

MapReduce Framework

HBase

HDFS (Hadoop Distributed File System)

Apache Oozie (Workflow)

HIVE DW System

Pig LatinData

Analysis

Other YARN

Frameworks

(MPI, GIRAPH) HBaseMapReduce Framework

YARN Cluster Resource Management

HDFS (Hadoop Distributed File System)

Get Started with BIG Data & Hadoop

Page 16: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

© 2015 BlueCamphor Technologies (P) Ltd. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

YARN Flow

YARN = Yet Another Resource Negotiator

JobHistoryServer

Resource Manager

Client

Client

Container

App Master

Node Manager

App Master

Container

Node Manager

Container

Container

Node ManagerMapReduce Status

Job Submission

Node Status

Resource Request

Resource Manager

ᗍ Cluster Level Resource Managerᗍ Long life, High Quality Hardware

Node Manager

ᗍ One per Data Nodeᗍ Monitors Resources on Data Node

Application Master

ᗍ One per application ᗍ Short lifeᗍ Manages task/scheduling

Get Started with BIG Data & Hadoop

Page 17: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Job Trends – Hadoop

Get Started with BIG Data & Hadoop

Page 18: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Course Topics

Module 1

Introduction to Big Data and Hadoop

Module 2

HDFS Internals, Hadoop

Configurations and Data Loading

Module 3

Introduction to Map Reduce

Module 4

Advanced Map Reduce Concepts

Module 5

Introduction to Pig

Module 6

Advanced Pig and Introduction to Hive

Module 7

Advanced Hive Concepts

Module 8

Extending Hive and HBase Introduction

Module 9

Advanced HBase and Oozie Introduction

Module 10

Project Set-up Discussion

Get Started with BIG Data & Hadoop

Page 19: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Why SkillSpeed?

Course Curriculum

from Industry Experts

Instructor Led Live Virtual Sessions

Lifetime access to Course

Content via LMS

100% Placement Assistance

24x7 Support

24x7

Get Started with BIG Data & Hadoop

Page 20: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Corporate Partners

Get Started with BIG Data & Hadoop

Page 21: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture

Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Lines open 24/7

To know more about the course, Please contact:

IND+91-90660-20904 USA1866-607-6547 (Toll Free)

Or reach us at

[email protected]

Contact us..

Page 22: Introduction to Hadoop 2.0 & YARN | Hadoop 2.0 & YARN Fundamentals | Hadoop 2.0 & YARN Architecture