hadoop cluster 安裝
Post on 13-Sep-2014
440 views
DESCRIPTION
Hadoop 2.2.0TRANSCRIPT
Hadoop Cluster 安裝Intern Report
主要參考網頁 http://bigdatahandler.com/hadoop-hdfs/i
nstalling-single-node-hadoop-2-2-0-on-ubuntu/
Software Versions Ubuntu Linux 12.04.4 LTS Hadoop 2.2.0
If you are using putty to access your Linux box remotely, please install openssh by running this command, this also helps in configuring SSH access easily in the later part of the installation:
sudo apt-get install openssh-server
Prerequisites: Installing Java v1.7
Adding dedicated Hadoop system user.
Configuring SSH access.
1. Installing Java v1.7:sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
2. Adding dedicated Hadoop system user. a. Adding group: sudo addgroup hadoop
b. Creating a user and adding the user to a group:
sudo adduser –ingroup hadoop hduser
3. Configuring SSH access: su – hduser
ssh-keyegen -t rsa -P "“
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh hduser@localhost
Hadoop Installation
i. Run this following command to download Hadoop version 2.2.0
wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
ii. Unpack the compressed hadoop file by using this command:
tar -xvzf hadoop-2.2.0.tar.gz
iii. move hadoop-2.2.0 to hadoop directory by using give command
mv hadoop-2.2.0 hadoop
iv. Move hadoop package of your choice sudo mv hadoop /usr/local/
v. Make sure to change the owner of all the files to the hduser user and hadoop group by using this command:
cd /usr/local/
sudo chown -R hduser:hadoop hadoop
Configuring Hadoop
The following are the required files we will use for the perfect configuration of the single node Hadoop cluster.
a. yarn-site.xml:b. core-site.xmlc. mapred-site.xmld. hdfs-site.xmle. Update $HOME/.bashrc
We can find the list of files in Hadoop directory which is located in
cd /usr/local/hadoop/etc/hadoop
a.yarn-site.xml:<configuration> <!-- Site specific YARN configuration properties --> <property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value>
</property><property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler
</value> </property></configuration>
b. core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
c. mapred-site.xml:
<configuration> property> <name>mapreduce.framework.name</name>
<value>yarn</value> </property> </configuration>
sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/namenode
sudo mkdir -p $HADOOP_HOME/yarn_data/hdfs/datanode
d. hdfs-site.xml:
<configuration> <property>
<name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value> </property> <property>
<name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value> </property> </configuration>
e. Update $HOME/.bashrc i. Go back to the root and edit
the .bashrc file.vi .bashrc
e. Update $HOME/.bashrc
#Set Hadoop-related environment variablesexport HADOOP_PREFIX=/usr/local/hadoopexport HADOOP_HOME=/usr/local/hadoopexport HADOOP_MAPRED_HOME=${HADOOP_HOME}export HADOOP_COMMON_HOME=${HADOOP_HOME}export HADOOP_HDFS_HOME=${HADOOP_HOME}export YARN_HOME=${HADOOP_HOME}export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop#Native Pathexport HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" #Java pathexport JAVA_HOME='/usr/lib/jvm/java-7-oracle'#Add Hadoop bin/ directory to PATHexport PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin
Formatting and Starting/Stopping the HDFS filesystem via the NameNode
i. The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your cluster. You need to do this the first time you set up a Hadoop cluster. Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS).
hadoop namenode -format
ii. Start Hadoop Daemons by running the following commands:
Name node:hadoop-daemon.sh start namenode
Data node:hadoop-daemon.sh start datanode
Resource Manager: yarn-daemon.sh start resourcemanager
Node Manager: yarn-daemon.sh start nodemanager
Job History Server: mr-jobhistory-daemon.sh start historyserver
Stop Hadoop by running the following command
stop-dfs.sh
stop-yarn.sh
Start and stop hadoop daemons all at once.
start-all.sh
stop-all.sh
Thanks for listening