2.hadoop cluster
TRANSCRIPT
-
8/9/2019 2.Hadoop Cluster
1/45
http://www.excelonlineclasses.co.nr [email protected]
Hadoop Cluster & Setup
http://www.excelonlineclasses.co.nr/mailto:[email protected]:[email protected]://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
2/45
Online TrainingDevelopmentTesting
Job supportTechnical Guidance Job Consultancy Any needs of IT Sector
Excel Online Classes ofers ollowingservices:
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
3/45
HADOOP CLS!E"
-Nagarjuna K
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
4/45
Agenda
Building blocks of adoop cluster
!ew big clusters across the globe
"luster #et-up #ingle Node
$i%erent components of the "lusterhttp://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
5/45
#uilding $loc%s
NameNode
$ataNode
#econdar& NameNode
'ob (racker
(ask (rackerhttp://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
6/45
Hadoop Server roles
http://www.excelonlineclasses.co.nr/
) di%erent t&pes omachines
*. +asters,. #laes). "lients
' (aster)slavearc*itecture $ot* distri$ut
storage anddistri$utedco+putation'
-
8/9/2019 2.Hadoop Cluster
7/45
,a+e ,ode
$istributed storage $!#
NameNode is the master of $!# $irects $atanodes to perform low lee
/0 tasks
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
8/45
,a+e ,ode
Keeps track of how 1les are brokedown and where the& are stored.
(his is memor& and /0 intensie.
(o reduce the workload2 this node
doesn3t store an& data or performan& tasks.
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
9/45
,a+e ,ode
4ithout name node 1le s&stem can3t bused.
5 single point of failure.
Name node obliterated all 1les lost Need to make Name Node resilient
enough to withstand failures.
"onstantl& engages with data nodes toknow their health.http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
10/45
Data ,ode
6ach slae machine of cluster will be adata node.
"lient re7uest 89 Name Node diide
the data into blocks and write in di%erendata nodes."lient re7uest 8; Name Node inform
client that data is present in so < so nod
"lient communicates with datanode.
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
11/45
Data ,ode
$atanode ma& communicate withother datanodes to replicate itsblocks.
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
12/45
Data ,ode
htt : www.excelonlineclasses.co.nr
1. Data nodesconstantly report toName Node aboutthe blockinformation they
store.
2. After synching,DataNodecontinually poll
NameNode for anycreate , move ordelete blocks
-
8/9/2019 2.Hadoop Cluster
13/45
Secondar- ,a+e,ode
#NN resides on a machine in cluste (his machine like that of NN doesn=
hae an& $N or (( daemons runnin
#NN unlike NN2 doesn3t record an&real time changes.
"ommunicates with NN take
snapshot of $!# cluster andmerges the changes.
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
14/45
Secondar- ,a+e,ode
(hough NN is a single point offailure2 with manual interentions 2we can minimi>e the data loss.
4ith manual interentions2 we cacon1gure #NN to NN.
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
15/45
.o$!rac%er
0ersees and coordinates the parallelprocessing of data using +ap ?educe.
#ubmit code determines execution planb& looking at which 1les to process 8datalocalit& is important
f a task fails automaticall& relaunch thetask possibl& on a di%erent node.
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
16/45
.o$!rac%er
0nl& one job(racker for a cluster.
(&picall& runs on a serer as a
master node of the cluster.
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
17/45
!as% !rac%er
ike data storage2 data computatioalso follow master/slae architectur
5s $Ns are slaes at storage leel ((s are slaes at computation leel
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
18/45
!as% !rac%er
'ob(racker oer all execution of a job
(ask(racker is responsible for completion oa task assigned to the node.
4ord of "aution 0ne task tracker per node
But task tracker can spawn multiple jms tohandle man& mappers and reducers in AA .
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
19/45
!as% !rac%er
?esponsibilit& of tasktracker to sendthe heart beat of the task status to
job(racker.
f job(racker don3t receie heart beat2assumes that tasktracker got crashedand resubmits the job to another nod
http://www.excelonlineclasses.co.nr/
.o$!rac%er and
-
8/9/2019 2.Hadoop Cluster
20/45
.o$!rac%er and!as%!rac%er
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
21/45
Client
Not part of cluster sed to load data into cluster
#ubmit +/? jobs to cluster
(he& hae hadoop setup but not part ocluster
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
22/45
/a*oo Cluster
(he CahooD #earch 4ebmis a adoop application truns on more than *E2core inux cluster produces data that is n
used in eer& CahooD 4search 7uer&.
0n !ebruar& *F2 ,E CahooD nc. launched wha
claimed was the worlargest adoop producapplicationhttp://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
23/45
0ace$oo% Cluster
!*e Dataware*ouse Hadoop cluster at 0ace$o
• 12 P# o storage in a single HD0S cluster• 1333 +ac*ines• 21 !# per +ac*ine 4a ew +ac*ines *av
!# eac*5• 2133 +ac*ines wit* 6 cores eac* 7
+ac*ines wit* 28 cores eac*• 91 # o "A( per +ac*ine• 2; +apgured storage capacit-? !*is is larger t*e previousl- %nown /a*oo?=s cluster o 2Here are t*e cluster statistics ro+ t*e H
cluster at 0ace$oo%:
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
24/45
D0S*ell
(he $!# shell can be inoked b&: bin/hadoop dfHargsI
• cat
• chgrp
• chmod
• chown• cop&!romocal
• cop&(oocal
• cp
• du
• dus
• expunge
• get
• getmerge
• ls• lsr
• mkdir
• moefromocal
• m
• touch>
• put
• rm
• rmr
• setrep• stat
• tail
• test
• text
-
8/9/2019 2.Hadoop Cluster
25/45
Hadoop Cluster Setup
#ingle Node
http://www.excelonlineclasses.co.nr/
Hadoop Single ,ode
-
8/9/2019 2.Hadoop Cluster
26/45
Hadoop Single ,odeSetup#tep *: $ownload hadoop from
http://hadoop.apache.org/mapreduce/releases.
ml
#tep ,:ntar the hadoop 1le:
tar xf> hadoop-E.,E.,.tar.g>
http://www.excelonlineclasses.co.nr/
Hadoop Single ,ode
-
8/9/2019 2.Hadoop Cluster
27/45
Hadoop Single ,odeSetup#tep ):#et the path to jaa compiler b& editing '5J50+6Larameter in hadoop/conf/hadoop--Men.sh
http://www.excelonlineclasses.co.nr/
Hadoop Single ,ode
-
8/9/2019 2.Hadoop Cluster
28/45
Hadoop Single ,odeSetup#tep :
"reate an ?#5 ke& to be used b& hadoopwhen ssh3ing to localhost:
ssh-ke&gen -t rsa -L O O
cat P/.ssh/idrsa.pub IIP/.ssh/authori>edke&s
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
29/45
Hadoop Single ,ode Setup
#tep Q:$o the following changes to the con1guration 1les under
hadoop/conf coresite'x+l:
@propert- @na+es'deault'na+e@)na+e @value*ds:))local*ost:8333@)value@)propert-
@propert- @na+e*adoop't+p'dir@)na+e
@value)sers)i+s
-
8/9/2019 2.Hadoop Cluster
30/45
Hadoop Single ,ode Setup <+apred
-
8/9/2019 2.Hadoop Cluster
31/45
Hadoop Single ,ode Setup < *ds<site'x+l *dssite'x+l:@con>guration@propert-@na+eds'replication@)na+e@value2@)value@)propert-
@)con>guration
http://www.excelonlineclasses.co.nr/
Hadoop Single ,ode
-
8/9/2019 2.Hadoop Cluster
32/45
Hadoop Single ,odeSetup
#tep R:!ormat the hadoop 1le s&stem. !romhadoop director& run the following:
bin/hadoop namenode -format
http://www.excelonlineclasses.co.nr/
i d
-
8/9/2019 2.Hadoop Cluster
33/45
sing Hadoop25How to start Hadoop
cd hadoop/bin ./start-all.sh15How to stop Hadoop
cd hadoop/bin ./stop-all.sh95How to cop- >le ro+ local to HD0S cd hadoop
bin/hadoop dfs put localmachinepath hdfspath5How to list >les in HD0Scd hadoopbin/hadoop dfs -ls
http://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
34/45
HADOOP co+ponents
-
8/9/2019 2.Hadoop Cluster
35/45
HADOOP co+ponents
-
8/9/2019 2.Hadoop Cluster
36/45
HADOOP co+ponents
-
8/9/2019 2.Hadoop Cluster
37/45
0 t t i t
-
8/9/2019 2.Hadoop Cluster
38/45
0ew startup scripts
start
-
8/9/2019 2.Hadoop Cluster
39/45
G*at >les in cluster inS-nc
NameNode and 'ob(racker ondi%erent machines #laes in NN2 '( should be in s&nc
?un hdfs scripts from NameNode?un +? scripts from '(Keeping all the con1g 1les in s&nc
across the cluster is a good practice8exceptionshttp://www.excelonlineclasses.co.nr/
-
8/9/2019 2.Hadoop Cluster
40/45
Para+eters in HADOOPCLS!E"
- Nagarjuna K
http://www.excelonlineclasses.co.nr/
Environ+ental Para+eters
-
8/9/2019 2.Hadoop Cluster
41/45
Environ+ental Para+eters *adoop
-
8/9/2019 2.Hadoop Cluster
42/45
Environ+ental Para+eters *adoop
-
8/9/2019 2.Hadoop Cluster
43/45
Environ+ental Para+eters *adoop
-
8/9/2019 2.Hadoop Cluster
44/45
!*an% -ou
Your feedback is highly important toimprove our course material and
teaching methodologies.
Please email your suggestions [email protected]
http://www.excelonlineclasses.co.nr/
Disclaimer
mailto:[email protected]://www.excelonlineclasses.co.nr/http://www.excelonlineclasses.co.nr/mailto:[email protected]:[email protected]
-
8/9/2019 2.Hadoop Cluster
45/45
Disclaimer
Excel nline classes ackno!ledges the proprietary rights othe trademarks and product names of other companiesmentioned in any of the training material including but notlimited to the handouts, !ritten material, videos, po!er po presentations, etc. All such training materials are providedour students for learning purposes only. "tudents shall not such materials for their private gain nor can they sell any s
materials to a third party. "ome of the examples provided any such training materials may not be o!ned by us and assuch !e does not claim any proprietary rights for the same#e does not guarantee nor is it responsible for such producand pro$ects. #e ackno!ledges that any such information product that has been la!fully received from any third partsource is free from restriction and !ithout any breach orviolation of la! !hatsoever.
http://www.excelonlineclasses.co.nr/
http://www.excelonlineclasses.co.nr/http://www.excelonlineclasses.co.nr/