Hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个Hadoop守护进程都作为一个独立的Java进程运行。本文通过自动化脚本配置Hadoop伪分布式模式。测试环境为VMware中的Centos 6.3, Hadoop 1.2.1.其他版本未测试。 伪分布式配置脚本 包括配置core-site.
hadoop可以在单节点上以所谓的伪分布式模式运行,此时每一个hadoop守护进程都作为一个独立的java进程运行。本文通过自动化脚本配置hadoop伪分布式模式。测试环境为vmware中的centos 6.3, hadoop 1.2.1.其他版本未测试。
包括配置core-site.xml,hdfs-site.xml及mapred-site.xml,配置ssh免密码登陆。[1]
#!/bin/bash
# Usage: Hadoop伪分布式配置
# History:
# 20140426 annhe 完成基本功能
# Check if user is root
if [ $(id -u) != "0" ]; then
printf "Error: You must be root to run this script!\n"
exit 1
fi
#同步时钟
rm -rf /etc/localtime
ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
#yum install -y ntp
ntpdate -u pool.ntp.org &>/dev/null
echo -e "Time: `date` \n"
#默认为单网卡结构,多网卡的暂不考虑
IP=`ifconfig eth0 |grep "inet\ addr" |awk '{print $2}' |cut -d ":" -f2`
#伪分布式
function PseudoDistributed ()
{
cd /etc/hadoop/
#恢复备份
mv core-site.xml.bak core-site.xml
mv hdfs-site.xml.bak hdfs-site.xml
mv mapred-site.xml.bak mapred-site.xml
#备份
mv core-site.xml core-site.xml.bak
mv hdfs-site.xml hdfs-site.xml.bak
mv mapred-site.xml mapred-site.xml.bak
#使用下面的core-site.xml
cat > core-site.xml <<eof
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://www.annhe.net/configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://$IP:9000</value>
</property>
</configuration>
eof
#使用下面的hdfs-site.xml
cat > hdfs-site.xml <<eof
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://www.annhe.net/configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
eof
#使用下面的mapred-site.xml
cat > mapred-site.xml <<eof
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://www.annhe.net/configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>$IP:9001</value>
</property>
</configuration>
eof
}
#配置ssh免密码登陆
function PassphraselessSSH ()
{
#不重复生成私钥
[ ! -f ~/.ssh/id_dsa ] && ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/authorized_keys |grep "`cat ~/.ssh/id_dsa.pub`" &>/dev/null && r=0 || r=1
#没有公钥的时候才添加
[ $r -eq 1 ] && cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 644 ~/.ssh/authorized_keys
}
#执行
function Execute ()
{
#格式化一个新的分布式文件系统
hadoop namenode -format
#启动Hadoop守护进程
start-all.sh
echo -e "\n========================================================================"
echo "hadoop log dir : $HADOOP_LOG_DIR"
echo "NameNode - http://$IP:50070/"
echo "JobTracker - http://$IP:50030/"
echo -e "\n========================================================================="
}
PseudoDistributed 2>&1 | tee -a pseudo.log
PassphraselessSSH 2>&1 | tee -a pseudo.log
Execute 2>&1 | tee -a pseudo.log[root@hadoop hadoop]# ./pseudo.sh 14/04/26 23:52:30 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hadoop/216.34.94.184 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:27:42 PDT 2013 STARTUP_MSG: java = 1.7.0_51 ************************************************************/ Re-format filesystem in /tmp/hadoop-root/dfs/name ? (Y or N) y Format aborted in /tmp/hadoop-root/dfs/name 14/04/26 23:52:40 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hadoop/216.34.94.184 ************************************************************/ starting namenode, logging to /var/log/hadoop/root/hadoop-root-namenode-hadoop.out localhost: starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-hadoop.out localhost: starting secondarynamenode, logging to /var/log/hadoop/root/hadoop-root-secondarynamenode-hadoop.out starting jobtracker, logging to /var/log/hadoop/root/hadoop-root-jobtracker-hadoop.out localhost: starting tasktracker, logging to /var/log/hadoop/root/hadoop-root-tasktracker-hadoop.out ======================================================================== hadoop log dir : /var/log/hadoop/root NameNode - http://192.168.60.128:50070/ JobTracker - http://192.168.60.128:50030/ =========================================================================
通过宿主机上的浏览器访问NameNode和JobTracker的网络接口

浏览器访问namenode的网络接口

浏览器访问jobtracker网络接口
将输入文件拷贝到分布式文件系统:
$ hadoop fs -put input input
通过网络接口查看hdfs

通过NameNode网络接口查看hdfs文件系统
运行示例程序
[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input output
通过JobTracker网络接口查看执行状态

Wordcount执行状态
执行结果
[root@hadoop hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input out2 14/04/27 03:34:56 INFO input.FileInputFormat: Total input paths to process : 2 14/04/27 03:34:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/04/27 03:34:56 WARN snappy.LoadSnappy: Snappy native library not loaded 14/04/27 03:34:57 INFO mapred.JobClient: Running job: job_201404270333_0001 14/04/27 03:34:58 INFO mapred.JobClient: map 0% reduce 0% 14/04/27 03:35:49 INFO mapred.JobClient: map 100% reduce 0% 14/04/27 03:36:16 INFO mapred.JobClient: map 100% reduce 100% 14/04/27 03:36:19 INFO mapred.JobClient: Job complete: job_201404270333_0001 14/04/27 03:36:19 INFO mapred.JobClient: Counters: 29 14/04/27 03:36:19 INFO mapred.JobClient: Job Counters 14/04/27 03:36:19 INFO mapred.JobClient: Launched reduce tasks=1 14/04/27 03:36:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=72895 14/04/27 03:36:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/04/27 03:36:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/04/27 03:36:19 INFO mapred.JobClient: Launched map tasks=2 14/04/27 03:36:19 INFO mapred.JobClient: Data-local map tasks=2 14/04/27 03:36:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=24880 14/04/27 03:36:19 INFO mapred.JobClient: File Output Format Counters 14/04/27 03:36:19 INFO mapred.JobClient: Bytes Written=25 14/04/27 03:36:19 INFO mapred.JobClient: FileSystemCounters 14/04/27 03:36:19 INFO mapred.JobClient: FILE_BYTES_READ=55 14/04/27 03:36:19 INFO mapred.JobClient: HDFS_BYTES_READ=260 14/04/27 03:36:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=164041 14/04/27 03:36:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25 14/04/27 03:36:19 INFO mapred.JobClient: File Input Format Counters 14/04/27 03:36:19 INFO mapred.JobClient: Bytes Read=25 14/04/27 03:36:19 INFO mapred.JobClient: Map-Reduce Framework 14/04/27 03:36:19 INFO mapred.JobClient: Map output materialized bytes=61 14/04/27 03:36:19 INFO mapred.JobClient: Map input records=2 14/04/27 03:36:19 INFO mapred.JobClient: Reduce shuffle bytes=61 14/04/27 03:36:19 INFO mapred.JobClient: Spilled Records=8 14/04/27 03:36:19 INFO mapred.JobClient: Map output bytes=41 14/04/27 03:36:19 INFO mapred.JobClient: Total committed heap usage (bytes)=414441472 14/04/27 03:36:19 INFO mapred.JobClient: CPU time spent (ms)=2910 14/04/27 03:36:19 INFO mapred.JobClient: Combine input records=4 14/04/27 03:36:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=235 14/04/27 03:36:19 INFO mapred.JobClient: Reduce input records=4 14/04/27 03:36:19 INFO mapred.JobClient: Reduce input groups=3 14/04/27 03:36:19 INFO mapred.JobClient: Combine output records=4 14/04/27 03:36:19 INFO mapred.JobClient: Physical memory (bytes) snapshot=353439744 14/04/27 03:36:19 INFO mapred.JobClient: Reduce output records=3 14/04/27 03:36:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2195972096 14/04/27 03:36:19 INFO mapred.JobClient: Map output records=4
查看结果
追梦A系列(11.0版本,以下11.0均简称为A)是针对企业网站定制设计的,模板采用全新AS3.0代码编辑,拥有更快的运行和加载速度,A系列模板主要针对图片展示,拥有简洁大气展示效果,并且可以自由扩展图片分类,同时还拥有三个独立页面介绍栏目,一个新闻栏目,一个服务介绍栏目,一个幻灯片展示和flv视频播放栏目。A系列模板对一些加载效果进行了修改,包括背景的拉伸模式以及标题的展示方式等都进行了调整,同
0
[root@hadoop hadoop]# hadoop fs -cat out2/* hadoop 1 hello 2 world 1
也可以将分布式文件系统上的文件拷贝到本地查看
[root@hadoop hadoop]# hadoop fs -get out2 out4 [root@hadoop hadoop]# cat out4/* cat: out4/_logs: Is a directory hadoop 1 hello 2 world 1
完成全部操作后,停止守护进程:
[root@hadoop hadoop]# stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode
因为开启了iptables,所以需要添加相应端口,当然测试环境也可以直接将iptables关闭。
# Firewall configuration written by system-config-firewall # Manual customization of this file is not recommended. *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 50030 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 50075 -j ACCEPT -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -j REJECT --reject-with icmp-host-prohibited COMMIT
NameNode网络接口点击Browse the filesystem,跳转到localhost:50075。[2][3]
修改core-site.xml,将hdfs://localhost:9000改成虚拟机ip地址。(上面的脚本已经改写为自动配置为IP)。
根据几次改动的情况,这里也是可以填写域名的,只是要在访问的机器上能解析这个域名。因此公网环境中有DNS服务器的应该是可以设置域名的。
在/etc/hosts中添加主机名对应的ip地址 [4][5]。(已更新Hadoop安装脚本,会自动配置此项)
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 127.0.0.1 hadoop #添加这一行
[1]. Hadoop官方文档.?http://hadoop.apache.org/docs/r1.2.1/single_node_setup.html [2]. Stackoverflow.?http://stackoverflow.com/questions/15254492/wrong-redirect-from-hadoop-hdfs-namenode-to-localhost50075 [3]. Iteye.?http://yymmiinngg.iteye.com/blog/706909 [4].Stackoverflow.?http://stackoverflow.com/questions/10165549/hadoop-wordcount-example-stuck-at-map-100-reduce-0 [5]. 李俊的博客.?http://www.colorlight.cn/archives/32
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号