hadoop install

PowerPoint

Hadoop installHadoop:VMplayer:Ubuntu:ubuntu-12.04.3-desktop-amd64Java:JDK 7Hadoop:Hadoop 1.1.2

VM

P.S.Num lockJavaOpen a terminal window with Ctrl + Alt + TInstall Java JDK 7a. Download the Java JDK (https://www.dropbox.com/s/h6bw3tibft3gs17/jdk-7u21-linux-x64.tar.gz)b. Unzip the filecd Downloadstar -xvf jdk-7u21-linux-x64.tar.gzc. Now move the JDK 7 directory to /usr/libsudo mkdir -p /usr/lib/jvmsudo mv ./jdk1.7.0_21 /usr/lib/jvm/jdk1.7.0

d. Now runsudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/jvm/jdk1.7.0/bin/javaws" 1e. Correct the file ownership and the permissions of the executables:sudo chmod a+x /usr/bin/javasudo chmod a+x /usr/bin/javacsudo chmod a+x /usr/bin/javawssudo chown -R root:root /usr/lib/jvm/jdk1.7.0f. Check the version of you new JDK 7 installation:java -version

Hadoop

11. Install SSH Serversudo apt-get install openssh-clientsudo apt-get install openssh-server12. Configure SSHsu - hduserssh-keygen -t rsa -P ""cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keysssh localhost1.Download Apache Hadoop 1.1.2 (https://www.dropbox.com/s/znonl6ia1259by3/hadoop-1.1.2.tar.gz) and store it the Downloads folder2. Unzip the file (open up the terminal window)cd Downloadssudo tar xzf hadoop-1.1.2.tar.gzcd /usr/localsudo mv /home/hduser/Downloads/hadoop-1.1.2 hadoopsudo addgroup hadoopsudo chown -R hduser:hadoop hadoop3.Open your .bashrc file in the extended terminal (Alt + F2)gksudo gedit .bashrc

4. Add the following lines to the bottom of the file:

# Set Hadoop-related environment variablesexport HADOOP_HOME=/usr/local/hadoopexport PIG_HOME=/usr/local/pigexport PIG_CLASSPATH=/usr/local/hadoop/conf# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)export JAVA_HOME=/usr/lib/jvm/jdk1.7.0/# Some convenient aliases and functions for running Hadoop-related commandsunalias fs &> /dev/nullalias fs="hadoop fs"unalias hls &> /dev/nullalias hls="fs -ls"# If you have LZO compression enabled in your Hadoop cluster and# compress job outputs with LZOP (not covered in this tutorial):# Conveniently inspect an LZOP compressed file from the command# line; run via:## $ lzohead /hdfs/path/to/lzop/compressed/file.lzo## Requires installed 'lzop' command.#lzohead () {hadoop fs -cat $1 | lzop -dc | head -1000 | less}# Add Hadoop bin/ directory to PATHexport PATH=$PATH:$HADOOP_HOME/binexport PATH=$PATH:$PIG_HOME/bin

5.Save the .bashrc file and close it6.Rungksudo gedit /usr/local/hadoop/conf/hadoop-env.sh7.Add the following lines# The java implementation to use. Required.export JAVA_HOME=/usr/lib/jvm/jdk1.7.0/8. Save and close file

9. In the terminal window, create a directory and set the required ownerships and permissionssudo mkdir -p /app/hadoop/tmpsudo chown hduser:hadoop /app/hadoop/tmpsudo chmod 750 /app/hadoop/tmp10. Rungksudo gedit /usr/local/hadoop/conf/core-site.xml11. Add the following between the tags

hadoop.tmp.dir/app/hadoop/tmpA base for other temporary directories.

fs.default.namehdfs://localhost:54310The name of the default file system. A URI whosescheme and authority determine the FileSystem implementation. Theuri's scheme determines the config property (fs.SCHEME.impl) namingthe FileSystem implementation class. The uri's authority is used todetermine the host, port, etc. for a filesystem.

12.Save and close file13. Rungksudo gedit /usr/local/hadoop/conf/mapred-site.xml14. Add the following between the tags

mapred.job.trackerlocalhost:54311The host and port that the MapReduce job tracker runsat. If "local", then jobs are run in-process as a single mapand reduce task.

15. Save and close file16. Rungksudo gedit /usr/local/hadoop/conf/hdfs-site.xml17. Add the following between the tags

dfs.replication1Default block replication.The actual number of replications can be specified when the file is created.The default is used if replication is not specified in create time.

34. Format the HDFS/usr/local/hadoop/bin/hadoop namenode -format35. Press the Start button and type Startup Applications36. Add an application with the following command:

/usr/local/hadoop/bin/start-all.sh

37. Restart Ubuntu and login

DEMO

1. Get the Gettysburg Address and store it on your Downloads folder(https://www.dropbox.com/s/w6yvyg1p57sf6sh/gettysburg.txt)2. Copy the Gettysburg Address from the name node to the HDFScd Downloadshadoop fs -put gettysburg.txt /user/hduser/getty/gettysburg.txt3. Check that the Gettysburg Address is in HDFShadoop fs -ls /user/hduser/getty/

4. Delete Gettysburg Address from your name noderm gettysburg.txt5. Download a jar file that contains a WordCount program into the Downloads folder(https://www.dropbox.com/s/gp6t7616wsypkdo/chiu-wordcount2.jar)6. Execute the WordCount program on the Gettysburg Address (the following command is one line)hadoop jar chiu-wordcount2.jar WordCount /user/hduser/getty/gettysburg.txt /user/hduser/getty/out

7. Check MapReduce resultshadoop fs -cat /user/hduser/getty/out/part-r-00000

hadoop install

Documents