Bài viết được mình dịch lại từ blog dataottam.com, trong bài viết họ sẽ hướng dẫn bạn một số cách để tiến hành cài đặt Hadoop - một công cụ để xử lý dữ liệu lớn.
Bạn cần chuẩn bị
Hướng dẫn sử dụng
- Ubuntu 14.04
- Script cài đặt (mình để ở dưới)
- Tạo script có tên 3clicks.sh sau đó copy và dán đoạn mã dưới vào file vừa tạo
2. Thực hiện lệnh để cho phép quyền được thực thi#! /bin/bash#sed -i -e 's/\r$//' scriptname.sh#sudo chmod 777 scriptname.sh#./scriptname.sh sudo apt-get update \&& sudo apt-get -y install openssh-server \&& sudo apt-get -y install openjdk-7-jdk \&& sudo wget http://mirror.olnevhost.net/pub/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz \&& sudo tar -zxvf hadoop-1.2.1-bin.tar.gz \&& sudo mv hadoop-1.2.1 /home/ubuntu/hadoop \&& sudo chown -R ubuntu /home/ubuntu/hadoop \&& sudo echo "export HADOOP_HOME=/home/ubuntu/hadoop" >> /home/ubuntu/.bashrc \&& sudo echo "export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64/" >> /home/ubuntu/.bashrc \&& echo "export PATH=\$PATH:\$HADOOP_HOME/bin" >> /home/ubuntu/.bashrc \&& echo "export PATH=\$PATH:\$JAVA_HOME/bin" >> /home/ubuntu/.bashrc \&& sudo mkdir /home/ubuntu/hadoop/tmp \&& sudo chown root /home/ubuntu/hadoop/tmp \&& sudo chmod 777 /home/ubuntu/hadoop \&& sudo chmod 777 /home/ubuntu/hadoop/tmp \&& sudo sed -i 's/# export JAVA_HOME=\/usr\/lib\/j2sdk1.5-sun/export JAVA_HOME=\/usr\/lib\/jvm\/java-1.7.0-openjdk-amd64/' /home/ubuntu/hadoop/conf/hadoop-env.sh \&& sudo sed -i 's/# export HADOOP_OPTS=-server/export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true/' /home/ubuntu/hadoop/conf/hadoop-env.sh \&& sudo sed -i "7d" /home/ubuntu/hadoop/conf/core-site.xml \&& sudo sed -i "7i<property>\n<name>fs.default.name</name>\n<value>hdfs://localhost:9000</value>\n</property>\n<property>\n<name>hadoop.tmp.dir</name>\n<value>/home/ubuntu/hadoop/tmp</value>\n</property>" /home/ubuntu/hadoop/conf/core-site.xml \&& sudo sed -i "7d" /home/ubuntu/hadoop/conf/mapred-site.xml \&& sudo sed -i "7i<property>\n<name>mapred.job.tracker</name>\n<value>localhost:9001</value>\n</property>" /home/ubuntu/hadoop/conf/mapred-site.xml \&& sudo sed -i "7d" /home/ubuntu/hadoop/conf/hdfs-site.xml \&& sudo sed -i "7i<property>\n<name>dfs.replication</name>\n<value>1</value>\n</property>" /home/ubuntu/hadoop/conf/hdfs-site.xml \&& ssh-keygen -b 2048 -t rsa -f /home/ubuntu/.ssh/id_rsa -q -N "" \&& cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys \&& ssh-keyscan localhost >> /home/ubuntu/.ssh/known_hosts
3. Chạy Script bằng lệnh ./3clicks.shsed -i -e ‘s/\r$//’ 3clicks.shsudo chmod 777 3clicks.sh
Ngoài ra để hiểu thêm Script chạy như nào mình sẽ để lại lời giải thích của tác giả
root@ubuntu$ sudo apt-get update \
- First we updated the source list of Ubuntu 14.04 O.S. by and then we moved for second step
&& sudo apt-get -y install openssh-server \
- Install openssh-server to enable the port number 22 for ssh connection:-
&& sudo apt-get -y install openjdk-7-jdk \
- Install the openjdk which is required for Hadoop:-
&& sudo wget http://mirror.olnevhost.net/pub/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1-bin.tar.gz \
- Downloaded the Hadoop-1.2.1-tar.gz:-
&& sudo tar -zxvf hadoop-1.2.1-bin.tar.gz \
- Extract the Hadoop tar file:-
&& sudo mv hadoop-1.2.1 /home/ubuntu/hadoop \
- Renamed the Hadoop-1.2.1 file to Hadoop
&& sudo chown -R ubuntu /home/ubuntu/hadoop \
- Give the ownership to Hadoop
&& sudo echo “export HADOOP_HOME=/home/ubuntu/hadoop” >> /home/ubuntu/.bashrc \
- Give the path of Hadoop in .bashrc
&& sudo echo “export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-amd64/” >> /home/ubuntu/.bashrc \
- Give the path of Java in .bashrc
&& echo “export PATH=\$PATH:\$HADOOP_HOME/bin” >> /home/ubuntu/.bashrc \
- Give the bin path of Hadoop in .bashrc
&& echo “export PATH=\$PATH:\$JAVA_HOME/bin” >> /home/ubuntu/.bashrc \
- Give the bin path of Java in .bashrc
&& sudo mkdir /home/ubuntu/hadoop/tmp \
- Create one directory tmp in hadoop which is a base for other directory
&& sudo chown root /home/ubuntu/hadoop/tmp \
- Give the root privilege to tmp
&& sudo chmod 777 /home/ubuntu/hadoop \
- Give the read, write and execute permission to Hadoop
&& sudo chmod 777 /home/ubuntu/hadoop/tmp \
- Give the read,write and execute permission to tmp
&& sudo sed -i ‘s/# export JAVA_HOME=\/usr\/lib\/j2sdk1.5-sun/export JAVA_HOME=\/usr\/lib\/jvm\/java-1.7.0-openjdk-amd64/’ /home/ubuntu/hadoop/conf/hadoop-env.sh \
- Set the Java path in hadoop-env.sh
&& sudo sed -i ‘s/# export HADOOP_OPTS=-server/export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true/’ /home/ubuntu/hadoop/conf/hadoop-env.sh \
- Set HADOOP_OPTS true in Hadoop-env.sh
&& sudo sed -i “7d” /home/ubuntu/hadoop/conf/core-site.xml \
- Go to the core-site.xml
&& sudo sed -i “7i<property>\n<name>fs.default.name</name>\n<value>hdfs://localhost:9000</value>\n</property>\n<property>\n<name>hadoop.tmp.dir</name>\n<value>/home/ubuntu/hadoop/tmp</value>\n</property>” /home/ubuntu/hadoop/conf/core-site.xml \
- At the 7th line write the configuration of core-site.xml
&& sudo sed -i “7d” /home/ubuntu/hadoop/conf/mapred-site.xml \
- Go to the mapred-site.xml
&& sudo sed -i “7i<property>\n<name>mapred.job.tracker</name>\n<value>localhost:9001</value>\n</property>” /home/ubuntu/hadoop/conf/mapred-site.xml \
- Configuration for mapred-site.xml
&& sudo sed -i “7d” /home/ubuntu/hadoop/conf/hdfs-site.xml \
- Go to the hdfs-site.xml
&& sudo sed -i “7i<property>\n<name>dfs.replication</name>\n<value>1</value>\n</property>” /home/ubuntu/hadoop/conf/hdfs-site.xml \
- Configured for hdfs-site.xml
&& ssh-keygen -b 2048 -t rsa -f /home/ubuntu/.ssh/id_rsa -q -N “” \
- Generate the key
&& cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys \
- Copy the public key in the authorized_keys
ssh-keyscan localhost >> /home/ubuntu/.ssh/known_hosts
- Add the ssh-keyscan localhost to known_hosts