hadoop cluster 安裝

32
Hadoop Cluster 安安 Intern Report

Post on 13-Sep-2014

440 views

Category:

Technology


9 download

DESCRIPTION

Hadoop 2.2.0

TRANSCRIPT

Page 1: Hadoop cluster 安裝

Hadoop Cluster 安裝Intern Report

Page 3: Hadoop cluster 安裝

Software Versions Ubuntu Linux 12.04.4 LTS Hadoop 2.2.0

Page 4: Hadoop cluster 安裝

If  you are using putty to access your Linux box remotely, please install openssh  by running this command, this also helps in configuring SSH access easily in the later part of the installation:

sudo apt-get install openssh-server

Page 5: Hadoop cluster 安裝

Prerequisites: Installing Java v1.7

Adding dedicated Hadoop system user.

Configuring SSH access.

Page 6: Hadoop cluster 安裝

1. Installing Java v1.7:sudo add-apt-repository ppa:webupd8team/java

sudo apt-get update

sudo apt-get install oracle-java7-installer

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

Page 7: Hadoop cluster 安裝
Page 8: Hadoop cluster 安裝

2. Adding dedicated Hadoop system user. a. Adding group: sudo addgroup hadoop

b. Creating a user and adding the user to a group:

sudo adduser –ingroup hadoop hduser

Page 9: Hadoop cluster 安裝
Page 10: Hadoop cluster 安裝

3. Configuring SSH access: su – hduser

ssh-keyegen -t rsa -P "“

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

ssh hduser@localhost

Page 11: Hadoop cluster 安裝
Page 12: Hadoop cluster 安裝

Hadoop Installation

Page 13: Hadoop cluster 安裝

i. Run this following command to download Hadoop version 2.2.0

wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz

ii. Unpack the compressed hadoop file by using this command:

tar -xvzf hadoop-2.2.0.tar.gz

iii. move hadoop-2.2.0 to hadoop directory by using give command

mv hadoop-2.2.0 hadoop

Page 14: Hadoop cluster 安裝

iv. Move hadoop package of your choice sudo mv hadoop /usr/local/

v. Make sure to change the owner of all the files to the hduser user and hadoop group by using this command:

cd /usr/local/

sudo chown -R hduser:hadoop hadoop

Page 15: Hadoop cluster 安裝

Configuring Hadoop

Page 16: Hadoop cluster 安裝

The following are the required files we will use for the perfect configuration of the single node Hadoop cluster.

a. yarn-site.xml:b. core-site.xmlc. mapred-site.xmld. hdfs-site.xmle. Update $HOME/.bashrc

We can find the list of files in Hadoop directory which is located in

cd /usr/local/hadoop/etc/hadoop

Page 17: Hadoop cluster 安裝

a.yarn-site.xml:<configuration> <!-- Site specific YARN configuration properties --> <property>

<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>

</property><property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler

</value> </property></configuration>

Page 18: Hadoop cluster 安裝

b. core-site.xml:

<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>

Page 19: Hadoop cluster 安裝

c. mapred-site.xml:

<configuration> property> <name>mapreduce.framework.name</name>

<value>yarn</value> </property> </configuration>

Page 20: Hadoop cluster 安裝

sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode

sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/datanode

Page 21: Hadoop cluster 安裝

d. hdfs-site.xml:

<configuration> <property>

<name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value> </property> <property>

<name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value> </property> </configuration>

Page 22: Hadoop cluster 安裝

e. Update $HOME/.bashrc i. Go back to the root and edit

the .bashrc file.vi .bashrc

Page 23: Hadoop cluster 安裝

e. Update $HOME/.bashrc

#Set Hadoop-related environment variablesexport HADOOP_PREFIX=/usr/local/hadoopexport HADOOP_HOME=/usr/local/hadoopexport HADOOP_MAPRED_HOME=${HADOOP_HOME}export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}export YARN_HOME=${HADOOP_HOME}export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop#Native Pathexport HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" #Java pathexport JAVA_HOME='/usr/lib/jvm/java-7-oracle'#Add Hadoop bin/ directory to PATHexport PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin

Page 24: Hadoop cluster 安裝

Formatting and Starting/Stopping the HDFS filesystem via the NameNode

Page 25: Hadoop cluster 安裝

i. The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your cluster. You need to do this the first time you set up a Hadoop cluster. Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS). 

hadoop namenode -format

Page 26: Hadoop cluster 安裝

ii. Start Hadoop Daemons by running the following commands:

Name node:hadoop-daemon.sh start namenode

Data node:hadoop-daemon.sh start datanode

Page 27: Hadoop cluster 安裝

Resource Manager: yarn-daemon.sh start resourcemanager

Node Manager: yarn-daemon.sh start nodemanager

Job History Server: mr-jobhistory-daemon.sh start historyserver

Page 28: Hadoop cluster 安裝

Stop Hadoop by running the following command

stop-dfs.sh

stop-yarn.sh

Page 29: Hadoop cluster 安裝

Start and stop hadoop daemons all at once.

start-all.sh 

stop-all.sh

Page 30: Hadoop cluster 安裝
Page 31: Hadoop cluster 安裝
Page 32: Hadoop cluster 安裝

Thanks for listening