2.hadoop cluster

Upload: riteshaladdin

Post on 01-Jun-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 2.Hadoop Cluster

    1/45

    http://www.excelonlineclasses.co.nr [email protected]

    Hadoop Cluster & Setup

    http://www.excelonlineclasses.co.nr/mailto:[email protected]:[email protected]://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    2/45

    Online TrainingDevelopmentTesting

    Job supportTechnical Guidance Job Consultancy Any needs of IT Sector 

    Excel Online Classes ofers ollowingservices:

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    3/45

    HADOOP CLS!E"

    -Nagarjuna K 

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    4/45

    Agenda

    Building blocks of adoop cluster

    !ew big clusters across the globe

    "luster #et-up #ingle Node

    $i%erent components of the "lusterhttp://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    5/45

    #uilding $loc%s

    NameNode

    $ataNode

    #econdar& NameNode

     'ob (racker

     (ask (rackerhttp://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    6/45

    Hadoop Server roles

    http://www.excelonlineclasses.co.nr/

    ) di%erent t&pes omachines

    *. +asters,. #laes). "lients

    ' (aster)slavearc*itecture $ot* distri$ut

    storage anddistri$utedco+putation'

  • 8/9/2019 2.Hadoop Cluster

    7/45

    ,a+e ,ode

    $istributed storage $!#

    NameNode is the master of $!# $irects $atanodes to perform low lee

    /0 tasks

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    8/45

    ,a+e ,ode

    Keeps track of how 1les are brokedown and where the& are stored.

     (his is memor& and /0 intensie.

     (o reduce the workload2 this node

    doesn3t store an& data or performan& tasks.

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    9/45

    ,a+e ,ode

    4ithout name node 1le s&stem can3t bused.

    5 single point of failure.

    Name node obliterated  all 1les lost Need to make Name Node resilient

    enough to withstand failures.

    "onstantl& engages with data nodes toknow their health.http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    10/45

    Data ,ode

    6ach slae machine of cluster will be adata node.

    "lient re7uest 89 Name Node  diide

    the data into blocks and write in di%erendata nodes."lient re7uest 8; Name Node  inform

    client that data is present in so < so nod

    "lient communicates with datanode.

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    11/45

    Data ,ode

    $atanode ma& communicate withother datanodes to replicate itsblocks.

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    12/45

    Data ,ode

    htt : www.excelonlineclasses.co.nr

    1. Data nodesconstantly report toName Node aboutthe blockinformation they

    store.

    2. After synching,DataNodecontinually poll

    NameNode for anycreate , move ordelete blocks

  • 8/9/2019 2.Hadoop Cluster

    13/45

    Secondar- ,a+e,ode

    #NN resides on a machine in cluste (his machine like that of NN doesn=

    hae an& $N or (( daemons runnin

     #NN unlike NN2 doesn3t record an&real time changes.

    "ommunicates with NN  take

    snapshot of $!# cluster andmerges the changes.

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    14/45

    Secondar- ,a+e,ode

     (hough NN is a single point offailure2 with manual interentions 2we can minimi>e the data loss.

    4ith manual interentions2 we cacon1gure #NN to NN.

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    15/45

     .o$!rac%er

    0ersees and coordinates the parallelprocessing of data using +ap ?educe.

    #ubmit code  determines execution planb& looking at which 1les to process 8datalocalit& is important

    f a task fails  automaticall& relaunch thetask possibl& on a di%erent node.

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    16/45

     .o$!rac%er

    0nl& one job(racker for a cluster.

     (&picall& runs on a serer as a

    master node of the cluster.

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    17/45

    !as% !rac%er

    ike data storage2 data computatioalso follow master/slae architectur

    5s $Ns are slaes at storage leel  ((s are slaes at computation leel

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    18/45

    !as% !rac%er

     'ob(racker  oer all execution of a job

     (ask(racker is responsible for completion oa task assigned to the node.

    4ord of "aution 0ne task tracker per node

    But task tracker can spawn multiple jms tohandle man& mappers and reducers in AA .

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    19/45

    !as% !rac%er

    ?esponsibilit& of tasktracker to sendthe heart beat of the task status to

     job(racker.

    f job(racker don3t receie heart beat2assumes that tasktracker got crashedand resubmits the job to another nod

    http://www.excelonlineclasses.co.nr/

    .o$!rac%er and

  • 8/9/2019 2.Hadoop Cluster

    20/45

     .o$!rac%er and!as%!rac%er

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    21/45

    Client

    Not part of cluster sed to load data into cluster

    #ubmit +/? jobs to cluster

     (he& hae hadoop setup but not part ocluster

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    22/45

     /a*oo Cluster

     (he CahooD #earch 4ebmis a adoop application truns on more than *E2core inux cluster produces data that is n

    used in eer& CahooD 4search 7uer&.

    0n !ebruar& *F2 ,E CahooD nc. launched wha

    claimed was the worlargest adoop producapplicationhttp://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    23/45

    0ace$oo% Cluster

    !*e Dataware*ouse Hadoop cluster at 0ace$o

    •   12 P# o storage in a single HD0S cluster•   1333 +ac*ines•   21 !# per +ac*ine 4a ew +ac*ines *av

    !# eac*5•   2133 +ac*ines wit* 6 cores eac* 7

    +ac*ines wit* 28 cores eac*•   91 # o "A( per +ac*ine•   2; +apgured storage capacit-? !*is is larger t*e previousl- %nown /a*oo?=s cluster o 2Here are t*e cluster statistics ro+ t*e H

    cluster at 0ace$oo%:

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    24/45

    D0S*ell 

     (he $!# shell can be inoked b&: bin/hadoop dfHargsI

    • cat

    • chgrp

    • chmod

    • chown• cop&!romocal

    • cop&(oocal

    • cp

    • du

    • dus

    • expunge

    • get

    • getmerge

    • ls• lsr

    • mkdir

    • moefromocal

    • m

    • touch>

    • put

    • rm

    • rmr

    • setrep• stat

    • tail

    • test

    • text

  • 8/9/2019 2.Hadoop Cluster

    25/45

    Hadoop Cluster Setup

    #ingle Node

    http://www.excelonlineclasses.co.nr/

    Hadoop Single ,ode

  • 8/9/2019 2.Hadoop Cluster

    26/45

    Hadoop Single ,odeSetup#tep *:  $ownload hadoop from 

    http://hadoop.apache.org/mapreduce/releases.

    ml

    #tep ,:ntar the hadoop 1le:

    tar xf> hadoop-E.,E.,.tar.g>

    http://www.excelonlineclasses.co.nr/

    Hadoop Single ,ode

  • 8/9/2019 2.Hadoop Cluster

    27/45

    Hadoop Single ,odeSetup#tep ):#et the path to jaa compiler b& editing '5J50+6Larameter in hadoop/conf/hadoop--Men.sh

    http://www.excelonlineclasses.co.nr/

    Hadoop Single ,ode

  • 8/9/2019 2.Hadoop Cluster

    28/45

    Hadoop Single ,odeSetup#tep :

    "reate an ?#5 ke& to be used b& hadoopwhen ssh3ing to localhost:

    ssh-ke&gen -t rsa -L O O

    cat P/.ssh/idrsa.pub IIP/.ssh/authori>edke&s

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    29/45

    Hadoop Single ,ode Setup

    #tep Q:$o the following changes to the con1guration 1les under

    hadoop/conf   coresite'x+l:

    @propert-  @na+es'deault'na+e@)na+e  @value*ds:))local*ost:8333@)value@)propert-

    @propert-  @na+e*adoop't+p'dir@)na+e

    @value)sers)i+s

  • 8/9/2019 2.Hadoop Cluster

    30/45

    Hadoop Single ,ode Setup <+apred

  • 8/9/2019 2.Hadoop Cluster

    31/45

    Hadoop Single ,ode Setup < *ds<site'x+l *dssite'x+l:@con>guration@propert-@na+eds'replication@)na+e@value2@)value@)propert-

    @)con>guration

    http://www.excelonlineclasses.co.nr/

    Hadoop Single ,ode

  • 8/9/2019 2.Hadoop Cluster

    32/45

    Hadoop Single ,odeSetup

    #tep R:!ormat the hadoop 1le s&stem. !romhadoop director& run the following:

    bin/hadoop namenode -format

    http://www.excelonlineclasses.co.nr/

    i d

  • 8/9/2019 2.Hadoop Cluster

    33/45

    sing Hadoop25How to start Hadoop

    cd hadoop/bin ./start-all.sh15How to stop Hadoop

    cd hadoop/bin ./stop-all.sh95How to cop- >le ro+ local to HD0S  cd hadoop

      bin/hadoop dfs put localmachinepath hdfspath5How to list >les in HD0Scd hadoopbin/hadoop dfs -ls

    http://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    34/45

    HADOOP co+ponents

  • 8/9/2019 2.Hadoop Cluster

    35/45

    HADOOP co+ponents

  • 8/9/2019 2.Hadoop Cluster

    36/45

    HADOOP co+ponents

  • 8/9/2019 2.Hadoop Cluster

    37/45

    0 t t i t

  • 8/9/2019 2.Hadoop Cluster

    38/45

    0ew startup scripts

    start

  • 8/9/2019 2.Hadoop Cluster

    39/45

    G*at >les in cluster inS-nc

    NameNode and 'ob(racker ondi%erent machines #laes in NN2 '( should be in s&nc

    ?un hdfs scripts from NameNode?un +? scripts from '(Keeping all the con1g 1les in s&nc

    across the cluster is a good practice8exceptionshttp://www.excelonlineclasses.co.nr/

  • 8/9/2019 2.Hadoop Cluster

    40/45

    Para+eters in HADOOPCLS!E"

    - Nagarjuna K 

    http://www.excelonlineclasses.co.nr/

    Environ+ental Para+eters

  • 8/9/2019 2.Hadoop Cluster

    41/45

    Environ+ental Para+eters *adoop

  • 8/9/2019 2.Hadoop Cluster

    42/45

    Environ+ental Para+eters *adoop

  • 8/9/2019 2.Hadoop Cluster

    43/45

    Environ+ental Para+eters *adoop

  • 8/9/2019 2.Hadoop Cluster

    44/45

    !*an% -ou

     Your feedback is highly important toimprove our course material and

    teaching methodologies.

    Please email your suggestions [email protected]

    http://www.excelonlineclasses.co.nr/

    Disclaimer

    mailto:[email protected]://www.excelonlineclasses.co.nr/http://www.excelonlineclasses.co.nr/mailto:[email protected]:[email protected]

  • 8/9/2019 2.Hadoop Cluster

    45/45

    Disclaimer 

    Excel nline classes ackno!ledges the proprietary rights othe trademarks and product names of other companiesmentioned in any of the training material including but notlimited to the handouts, !ritten material, videos, po!er po presentations, etc. All such training materials are providedour students for learning purposes only. "tudents shall not such materials for their private gain nor can they sell any s

    materials to a third party. "ome of the examples provided any such training materials may not be o!ned by us and assuch !e does not claim any proprietary rights for the same#e does not guarantee nor is it responsible for such producand pro$ects. #e ackno!ledges that any such information  product that has been la!fully received from any third partsource is free from restriction and !ithout any breach orviolation of la! !hatsoever.

    http://www.excelonlineclasses.co.nr/

    http://www.excelonlineclasses.co.nr/http://www.excelonlineclasses.co.nr/