chapter 8
Post on 31-Dec-2015
38 Views
Preview:
DESCRIPTION
TRANSCRIPT
CHAPTER 5
CHAPTER 8Hadoop1OutlineHadoopHBaseHadoopHBase
22HadoopHBaseHadoopHBase
33(1/5)HadoopGNU/LinuxWin32GNU/LinuxHadoopJavasshHadoopJavaJava(JRE)Java()44CentOS 5.5OpenJDKJavajava -version
~# java -versionjava version "1.6.0_17"OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.16.b17.el5-i386)OpenJDK Client VM (build 14.0-b16, mixed mode)OpenJDKyum~# yum -y install java-1.6.0-openjdk(2/5)55HadoopOpenJDKOpenJDKJavaOracle (Sun) Java JDKOracle (Sun) Java JDKOracle(http://www.oracle.com)
(3/5)66jdk-6u25-linux-i586.bin/usr~# chmod +x jdk-6u25-linux-i586.bin~# ./jdk-6u25-linux-i586.bin/usr()jdk1.6.0_25alternativesOracle (Sun) Java JDKOpenJDK~# alternatives --install /usr/bin/java java /usr/jdk1.6.0_25/bin/java 20000~# alternatives --install /usr/bin/javac javac /usr/jdk1.6.0_25/bin/javac 20000(4/5)77Java~# java versionjava version "1.6.0_25"Java(TM) SE Runtime Environment (build 1.6.0_25-b06)Java HotSpot(TM) Client VM (build 20.0-b11, mixed mode, sharing)~#javac -versionJavac 1.6.0_25sshrsync~# yum -y install openssh rsync~# /etc/init.d/sshd restartHadooproot(5/5)88HadoopHBaseHadoopHBase
99Hadoop HadoopLocal (Standalone) ModePseudo-Distributed ModeFully-Distributed Mode1010Local (Standalone) Mode(1/7)HadoopApache Hadoop(http://hadoop.apache.org/)HadoopHadoop 0.21.0Hadoop 0.20.2wgethadoop-0.20.2.tar.gz
~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz~# tar zxvf hadoop-0.20.2.tar.gz1111Local (Standalone) Mode(2/7)hadoop-0.20.2/opthadoop~# mv hadoop-0.20.2 /opt/hadoopJavaHadoophadoopviconf/hadoop-env.sh~# cd /opt/hadoop//hadoop# vi conf/hadoop-env.sh1212Local (Standalone) Mode(3/7)hadoop-env.shJAVA_HOME(export JAVA_HOME=/usr/jdk1.6.0_25) IPv6IPv6hadoop-env.shexport HADOOP_OPTS=-Djava.net.preferIPv4Stack=trueIPv4
# Command specific options appended to HADOOP_OPTS when specified.........export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"export JAVA_HOME=/usr/jdk1.6.0_25 JAVA_HOMEexport HADOOP_OPTS=-Djava.net.preferIPv4Stack=true IPv41313Local (Standalone) Mode(4/7)Hadoop Local (Standalone) ModeHadoop
conf/hadoop-env.shJAVA_HOME/hadoop# bin/hadoopUsage: hadoop [--config confdir] COMMANDwhere COMMAND is one of: namenode -format format the DFS filesystem.........or CLASSNAME run the class named CLASSNAMEMost commands print help when invoked w/o parameters.1414Local (Standalone) Mode(5/7)Hadoophadoop-0.20.2-examples.jargrepinputconf/xmlinput/hadoop# mkdir input /hadoop# cp conf/*.xml input1515Local (Standalone) Mode(6/7)hadoop-0.20.2-examples.jargrepconfig
/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'/hadoop# cat output/*13 configuration4 configuration.xsl1 configure1616Local (Standalone) Mode(7/7)hadoop-0.20.2-examples.jar grepoutputoutput/hadoop# rm -rf output1717Pseudo-Distributed Mode(1/9)Pseudo-Distributed ModeLocal (Standalone) Modeconfcore-site.xmlhdfs-site.xmlmapred-site.xmlcore-site.xml
/hadoop# vi conf/core-site.xml1818Pseudo-Distributed Mode(2/9)core-site.xml
fs.default.name hdfs://localhost:9000
1919Pseudo-Distributed Mode(3/9)hdfs-site.xml/hadoop# vi conf/hdfs-site.xmlhdfs-site.xml
dfs.replication 1
2020Pseudo-Distributed Mode(4/9)mapred-site.xml/hadoop# vi conf/mapred-site.xmlmapred-site.xml
mapred.job.tracker localhost:9001
2121Pseudo-Distributed Mode(5/9)SSHHadoopsshssh(yesEnter)~# ssh localhostThe authenticity of host 'localhost (127.0.0.1)' can't be established.RSA key fingerprint is Are you sure you want to continue connecting (yes/no)? yes yesWarning: Permanently added 'localhost' (RSA) to the list of known hosts.root@localhost's password: 2222Pseudo-Distributed Mode(6/9)Ctrl + C~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keysexit~# ssh localhostLast login: Mon May 16 10:04:39 2011 from localhost~# exit2323Pseudo-Distributed Mode(7/9)HadoopHadoopbin/hadoop namenode -formatHDFS/hadoop# bin/hadoop namenode -format11/05/16 10:20:27 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNode.........11/05/16 10:20:28 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1************************************************************/2424Pseudo-Distributed Mode(8/9)bin/start-all.shJobtrackerTasktracker/hadoop# bin/start-all.shstarting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.outlocalhost: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host01.outlocalhost: starting secondarynamenode, logging to /opt/hadoop/bin/../logs/hadoop-root-secondarynamenode-Host01.outstarting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.outlocalhost: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host01.out2525Pseudo-Distributed Mode(9/9)hadoop-0.20.2-examples.jar grepbin/hadoop fs -putconfHDFSinputhadoop-0.20.2-examples.jar grep/hadoop# bin/hadoop fs -put conf input/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'2626Fully-Distributed Mode(1/14)HadoopMasterSlaveJavasshIPHost01Namenode + Jobtracker192.168.1.1Host02Datanode + Tasktracker192.168.1.22727Fully-Distributed Mode(2/14)stop-all.shHadoopHadoop.ssh/hadoop# /opt/hadoop/bin/stop-all.sh~# rm -rf /opt/hadoop~# rm -rf ~/.ssh~# rm -rf /tmp/*HadoopHost01Host01Hadoop 0.20.2Hadoop/opt/hadoop~# wget http://apache.cs.pu.edu.tw//hadoop/common/hadoop-0.20.2/hadoop-0.20.2.tar.gz~# tar zxvf hadoop-0.20.2.tar.gz~# mv hadoop-0.20.2 /opt/hadoop2828Fully-Distributed Mode(3/14)bin/hadoop-env.shHadoop/opt/hadoopvibin/hadoop-env.sh
~# cd /opt/hadoop//hadoop# vi conf/hadoop-env.shhadoop-env.shJAVA_HOME# Command specific options appended to HADOOP_OPTS when specifiedexport HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"export JAVA_HOME=/usr/jdk1.6.0_25 JAVA_HOME2929Fully-Distributed Mode(4/14)conf/core-site.xmlvi/hadoop# vi conf/core-site.xmlconf/core-site.xml3030Fully-Distributed Mode(5/14)
fs.default.name hdfs://Host01:9000 hadoop.tmp.dir /var/hadoop/hadoop-${user.name}
3131Fully-Distributed Mode(6/14)conf/hdfs-site.xmlviconf/hdfs-site.xml/hadoop# vi conf/hdfs-site.xmlconf/hdfs-site.xml
dfs.replication 2
3232Fully-Distributed Mode(7/14)conf/mapred-site.xmlviconf/mapred-site.xml/hadoop# vi conf/mapred-site.xmlconf/mapred-site.xml
mapred.job.tracker Host01:9001
3333Fully-Distributed Mode(8/14)conf/mastersviconf/masters/hadoop# vi conf/mapred-site.xmlconf/slavesviconf/slavesconf/slaveslocalhostHost02/hadoop# vi conf/mapred-site.xml3434Fully-Distributed Mode(9/14)scp~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""~# cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys~# scp -r ~/.ssh Host02:~/~# ssh Host02 Host01Host02~# ssh Host01 Host02Host01~# exit Host01~# exit Host02 (Host01)3535Fully-Distributed Mode(10/14)HadoopHadoopSlave(NFS)Host01HadoopHost02~# scp -r /opt/hadoop Host02:/opt/HadoopHDFS/hadoop# bin/hadoop namenode -format3636Fully-Distributed Mode(11/14)11/05/16 21:52:13 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = Host01/127.0.0.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/11/05/16 21:52:13 INFO namenode.FSNamesystem: fsOwner=root,root,bin,daemon,sys,adm,disk,wheel......11/05/16 21:52:13 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at Host01/127.0.0.1************************************************************/3737Fully-Distributed Mode(12/14)Hadoop/hadoop# bin/start-all.shstarting namenode, logging to /opt/hadoop/bin/../logs/hadoop-root-namenode-Host01.outHost02: starting datanode, logging to /opt/hadoop/bin/../logs/hadoop-root-datanode-Host02.outstarting jobtracker, logging to /opt/hadoop/bin/../logs/hadoop-root-jobtracker-Host01.outHost02: starting tasktracker, logging to /opt/hadoop/bin/../logs/hadoop-root-tasktracker-Host02.out3838Fully-Distributed Mode(13/14)Fully-Distributed Mode bin/hadoop dfsadmin -reportHDFSHDFS/hadoop# bin/hadoop dfsadmin -reportConfigured Capacity: 9231007744 (8.6 GB)......Blocks with corrupt replicas: 0Missing blocks: 0-------------------------------------------------Datanodes available: 1 (1 total, 0 dead)......DFS Remaining%: 41.88%Last contact: Mon May 16 22:15:03 CST 20113939Fully-Distributed Mode(14/14)hadoop-0.20.2-examples.jar grepHDFSinputHadoopconf/hadoop-0.20.2-examples.jar grep/hadoop# bin/hadoop fs -mkdir input/hadoop# bin/hadoop fs -put conf/* input//hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'config[a-z.]+'/hadoop# bin/hadoop fs -cat output/part-0000019 configuration6 configuration.xsl1 configure4040HadoopHBaseHadoopHBase
4141HBase(1/9)HBaseHadoopHBase0.20ZooKeeperNTP4242HBase(2/9)HBaseHBase(http://hbase.apache.org/)HBasehbase-0.90.2.tar.gz/opt/hbaseHBase~# wget http://apache.cs.pu.edu.tw//hbase/hbase-0.90.2/hbase-0.90.2.tar.gz/hadoop~# tar zxvf hbase-0.90.2.tar.gz~# mv hbase-0.90.2 /opt/hbase~# cd /opt/hbase/viconf/hbase-env.sh/hbase# vi conf/hbase-env.sh4343HBase(3/9)conf/hbase-env.sh
export JAVA_HOME=/usr/jdk1.6.0_25/export HBASE_MANAGES_ZK=trueexport HBASE_LOG_DIR=/tmp/hadoop/hbase-logsexport HBASE_PID_DIR=/tmp/hadoop/hbase-pidsconf/hbase-site.xmlHBase/hbase# vi conf/hbase-site.xml4444HBase(4/9)conf/hbase-site.xml
hbase.rootdir hdfs://Host01:9000/hbase hbase.cluster.distributed true
hbase.zookeeper.property.clientPort 2222 hbase.zookeeper.quorum Host01,Host02 hbase.zookeeper.property.dataDir /tmp/hadoop/hbase-data
4545HBase(5/9) hbase.tmp.dir /var/hadoop/hbase-${user.name} hbase.master Host01:60000
4646HBase(6/9)viconf/regionservers/hbase# vi conf/regionserversSlaveslavesSlaveHost02 HadoopHBaseconf//hbase# cp /opt/hadoop/conf/core-site.xml conf//hbase# cp /opt/hadoop/conf/mapred-site.xml conf//hbase# cp /opt/hadoop/conf/hdfs-site.xml conf/4747HBase(7/9)hbaselib/hadoop-core-0.20-append-r1056497.jarhadoophadoop-0.20.2-core.jarhbaselib/hbase# rm lib/hadoop-core-0.20-append-r1056497.jar/hbase# cp /opt/hadoop/hadoop-0.20.2-core.jar ./lib/hbaseSlave/hbase# scp -r /opt/hbase Host02:/opt/hbase4848HBase(8/9)HBase
/hbase# bin/start-hbase.shHost02: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host02.outHost01: starting zookeeper, logging to /tmp/hadoop/hbase-logs/hbase-root-zookeeper-Host01.outstarting master, logging to /tmp/hadoop/hbase-logs/hbase-root-master-Host01.outHost02: starting regionserver, logging to /tmp/hadoop/hbase-logs/hbase-root-regionserver-Host02.out4949HBase(9/9)hbase shellHBaselistHBsae
/hbase# bin/hbase shellHBase Shell; enter 'help' for list of supported commands.Type "exit" to leave the HBase ShellVersion 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011
hbase(main):001:0> list listEnterTABLE0 row(s) in 0.3950 seconds
hbase(main):002:0>5050HadoopHBaseHadoopHBase
5151Hadoop(1/7)bin/start-all.shHadoopbin/stop-all.shHadoopbin/hadoop versionHadoopbin/hadoop dfsadmin reportHDFSbin/hadoop namenode formatbin/hadoop fs -ls HDFSbin/hadoop fs -ls /user/root/inputbin/hadoop fs -mkdir /user/root/tmpbin/hadoop fs -put conf/* /user/root/tmpHDFSbin/hadoop fs -cat /user/root/tmp/core-site.xmlHDFSbin/hadoop fs -get /user/root/tmp/core-site.xml /opt/hadoop/HDFSbin/hadoop fs -rm /user/root/tmp/core-site.xmlHDFSbin/hadoop fs -rmr /user/root/tmpHDFS5252Hadoop(2/7)HDFSbin/hadoop fs/hadoop# bin/hadoop fsUsage: java FsShell [-ls ] [-lsr ] [-du ].........-files specify comma separated files to be copied to the map reduce cluster-libjars specify comma separated jar files to include in the classpath.-archives specify comma separated archives to be unarchived on the compute machines.
The general command line syntax isbin/hadoop command [genericOptions] [commandOptions]5353Hadoop(3/7)MapReduce JobHadoopMapReduce JobjarHadoopbin/hadoop jar [MapReduce Job jar] [Job] [Job]Hadoophadoop-0.20.2-examples.jargrepwordcountpi/hadoop# bin/hadoop jar hadoop-0.20.2-examples.jar5454Hadoop(4/7)Hadoopjarhadoop-0.20.2-core.jarhadoop commonhdfsmapreducehadoop-0.20.2-test.jar Hadoophadoop-0.20.2-ant.jarAnt5555Hadoop(5/7)bin/hadoop jobJob/hadoop# bin/hadoop job -list all5 jobs submittedStates are: Running : 1 Succeded : 2 Failed : 3 Prep : 4JobId State StartTime UserName Priority SchedulingInfojob_201105162211_0001 2 1305555169692 root NORMAL NAjob_201105162211_0002 2 1305555869142 root NORMAL NAjob_201105162211_0003 2 1305555912626 root NORMAL NAjob_201105162211_0004 2 1305633307809 root NORMAL NAjob_201105162211_0005 2 1305633347357 root NORMAL NA5656Hadoop(6/7)Jobbin/hadoop job -status [JobID]/hadoop# bin/hadoop job -status job_201105162211_0001Jobbin/hadoop job -history []bin/hadoop job -history /user/root/output
Hadoop job: job_201105162211_0007=====================================Job tracker host name: Host01job tracker start time: Mon May 16 22:11:01 CST 2011User: rootJobName: grep-sort5757Hadoop(7/7)Jobbin/hadoop job/hadoop# bin/hadoop jobUsage: JobClient [-submit ] [-status ] [-counter ] [-kill ] [-set-priority ]. Valid values for priorities are: VERY_HIGH HIGH NORMAL LOW VERY_LOW.........
The general command line syntax isbin/hadoop command [genericOptions] [commandOptions]5858HadoopHBaseHadoopHBase
5959HBase(1/10)HBasenamestudent IDcourse : mathcourse : historyJohn18085Adam27590bin/hbase hsellHBase/hbase# bin/hbase shellHBase Shell; enter 'help' for list of supported commands.Type "exit" to leave the HBase ShellVersion 0.90.2, r1085860, Sun Mar 27 13:52:43 PDT 2011
hbase(main):001:0>6060HBase(2/10)scoresstudentidcoursecolumn> create [], [column1], [column2],
hbase(main):001:0> create 'scores', 'studentid', 'course'0 row(s) in 1.8970 secondslistHBasehbase(main):002:0> listTABLEscores1 row(s) in 0.0170 seconds6161HBase(3/10)> describe []hbase(main):003:0> describe 'scores'DESCRIPTION ENABLED BLOCKCACHE => 'true'}]}1 row(s) in 0.0260 secondsscoresJohnstudentidcolumn1> put [], [row], [column], []hbase(main):004:0> put 'scores', 'John', 'studentid:', '1'0 row(s) in 0.0600 seconds6262HBase(4/10)Johncourse:mathcolumn80hbase(main):005:0> put 'scores', 'John', 'course:math', '80'0 row(s) in 0.0100 secondsJohncourse:historycolumn85hbase(main):006:0> put 'scores', 'John', 'course:history', '85'0 row(s) in 0.0080 seconds6363HBase(5/10)Adamstudentid2course:math75course:history90 hbase(main):007:0> put 'scores', 'Adam', 'studentid:', '2'0 row(s) in 0.0130 seconds
hbase(main):008:0> put 'scores', 'Adam', 'course:math', '75'0 row(s) in 0.0100 seconds
hbase(main):009:0> put 'scores', 'Adam', 'course:history', '90'0 row(s) in 0.0080 seconds6464HBase(6/10)scores> scan []
hbase(main):011:0> scan 'scores'ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=12 row(s) in 0.0420 seconds6565HBase(7/10)scoresJohn> get [], [row]
hbase(main):010:0> get 'scores', 'John'COLUMN CELL course:history timestamp=1305704046378, value=85 course:math timestamp=1305703949662, value=80 studentid: timestamp=1305703742527, value=13 row(s) in 0.0440 seconds6666HBase(8/10)scorescoursescolumn family> scan [], {COLUMNS => [column family]}
hbase(main):011:0> scan 'scores', {COLUMNS => 'course:'}ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=802 row(s) in 0.0250 seconds6767HBase(9/10)scorescolumn> scan [], {COLUMNS => [[column1], [column2],]}
hbase(main):012:0> scan 'scores', {COLUMNS => ['studentid','course:']}ROW COLUMN+CELL Adam column=course:history, timestamp=1305704304053, value=90 Adam column=course:math, timestamp=1305704282591, value=75 Adam column=studentid:, timestamp=1305704186916, value=2 John column=course:history, timestamp=1305704046378, value=85 John column=course:math, timestamp=1305703949662, value=80 John column=studentid:, timestamp=1305703742527, value=12 row(s) in 0.0290 seconds6868HBase(10/10)disabledrophbase(main):003:0> disable 'scores'0 row(s) in 2.1510 seconds
hbase(main):004:0> drop 'scores'0 row(s) in 1.7780 seconds6969HadoopHBaseHadoopHBase
7070(1/2)HadoopHadoopHDFSMapReduceJobtracker(Mozilla Firefox)http://localhost:50070http://localhost:50030Jobtracker
7171(2/2)HBaseMasterMasterhttp://localhost:60010/Region ServerSlavehttp://localhost:60030/ ZooKeeperMasterhttp://localhost:60010/zk.jsp7272
top related