分布式实验报告

2023-03-10 来源：欧得旅游网

南京财经大学课程论文考试（封面）

2014 ——2015第 2 学期

课程名称：分布式计算综合实验任课教师：钱钢学生姓名：郑雅雯班级：计算机1202 学号：2120122349

论文题目：linu环境下的hadoop应用

内容摘要：本文就介绍了linux环境下的分布式计算平台Hadoop的部署配置及应用编程

关键词：linux Hadoop 部署应用

一、简介

hadoopH是Apache软件基金会旗下的一个开源分布式计算平台对于Hadoop的集群来讲，可以分成两大类角色： Master和Salve。分布式集群的主要任务包括了：

1、分布式存储系统HDFS（Hadoop Distributed File System）分布式存储系统，它是由一个NameNode和若干个DataNode组成的。其中NameNode作为主服务器，管理文件系统的命名空间和客户端对文件系统的访问操作；集群中的DataNode管理存储的数据。可将HDFS看成一个容量巨大、具有高容错性的磁盘，提供了高可靠性、高扩展性和高吞吐率的数据存储服务。

2、资源管理系统YARN（Yet Another Resource Negotiator），它是Hadoop 2.0新增系统，负责集群的资源管理和调度，使得多种计算框架可以运行在一个集群中，负责集群资源的统一管理和调度

3、分布式计算框架MapReduce，它具有易于编程、高容错性和高扩展性等优点。MapReduce框架是由一个单独运行在主节点上的JobTracker和运行在每个集群从节点的TaskTracker共同组成的。主节点负责调度构成一个作业的所有任务，这些任务分布在不同的从节点上。主节点监控它们的执行情况，并且重新执行之前的失败任务；从节点仅负责由主节点指派的任务。当一个Job被提交时，JobTracker接收到提交作业和配置信息之后，就会将配置信息等分发给从节点，同时调度任务并监控TaskTracker的执行。

NoSQL数据库指的是不使用关系模型存储数据的一种新型数据库。能解决传统关系数据库不能解决的高并发读写、高可扩展性和高可用性的问题。NoSQL 数据存储不需要固定的表结构，通常也不存在连接操作。在大数据存取上具备关系型数据库无法比拟的性能优势

二、 Jdk配置

将jdk-7u75-linux-i586.tar.gz解压到本地目录,并配置java环境变量(java -version)

1、奖tar包解压到/usr文件夹

tar -zxvf jdk-7u75-linux-i586.tar.gz -C /usr 2、将解压后的文件更名为jdk1.7

mv xxxx jdk1.7

3、修改配置文件

vi /etc/profile

三、 Hadoop单机版部署

1、解压hadoop安装包到/usr/hadoop2.6，并把文件夹的读权限分配给普通用户 chown -R abc:abc hadoop2.6*

2、设置ssh免密码登录

（1）、生成其无密码密钥对id_rsa和id_rsa.pub

ssh-keygen -t rsa -P ' '

（2）、id_rsa.pub追加到授权的key里

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

（3）验证是否设置成功

ssh localhost

、修改4个.xml文件

（1)、core-site.xml

fs.default.name

hdfs://hadoop:8020

hadoop.tmp.dir

/home/zpc/hadoop/mytmp

（2)、hdfs-site.xml

dfs.replication 1

dfs.namenode.name.dir /home/abc/hadoop/mytmp/dfs/name

dfs.datanode.data.dir

/home/zpc/hadoop/mytmp/dfs/data

（3)、mapred-site.xml

mapreduce.framework.name yarn

（4)、yarn-site.xml

yarn.nodemanager.aux-services mapreduce_shuffle

6、格式化hdfs

在hadoop/bin/目录下执行

./hadoop namenode -format hadoop 7、启动HDFS和YARN （1）、在sbin目录下,执行start-dfs.sh这个脚本文件启动HDFS

./start-dfs.sh

（2）、再执行sbin下的start-yarn.sh这个脚本文件启动yarn ./start-yarn.sh

8、输入jps检测是否全部启动

四、程序设计

1、设计目的

对输入文件中数据进行就算学生平均成绩。输入文件中的每行内容均为一个学生的姓名和他相应的成绩，每门学科为一个文件。输出学生姓名和平均成绩。 2、设计思路

程序包括两部分的内容：Map部分和Reduce部分，分别实现了map和reduce的功能。

Map处理的是一个文本文件，文件中存放的数据时每一行表示一个学生的姓名和他相应一科成绩。Map阶段将数据集切割成小数据集，每一个数据偏将由一个Mapper负责处理，并将一个数据流解析成对，然后分发到Reducer。

在Reducer中进行合并的时候，有相同key的键值对则送到同一个Reducer上。Reducer进行相应运算并统计结果输出到文件中。 3、程序分析（1）、main函数

public static void main(String[] args) throws Exception { Configuration conf = new Configuration();

String[] ioArgs = new String[] { \"score_in\

String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs(); if (otherArgs.length != 2) {

System.err.println(\"Usage: Score Average \"); System.exit(2); }

Job job = new Job(conf, \"Score Average\"); job.setJarByClass(Score.class); // 设置Map、Combine和Reduce处理类 job.setMapperClass(Map.class);

job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); // 设置输出类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class); // 将输入的数据集分割成小数据流

job.setInputFormatClass(TextInputFormat.class); // 提供一个RecordWriter的实现，负责数据输出

job.setOutputFormatClass(TextOutputFormat.class);

// 设置输入和输出目录

FileInputFormat.addInputPath(job,

newPath(otherArgs[0]));

FileOutputFormat.setOutputPath(job,

new Path(otherArgs[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1); } 2）、map函数

public static class Map extends Mapper { // 实现map函数

public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException {

String line = value.toString();

// 将输入的数据首先按行进行分割

StringTokenizer tokenizerArticle = new StringTokenizer(line, \"\\n\"); // 分别对每一行进行处理

while (tokenizerArticle.hasMoreElements()) {

// 每行按空格划分

StringTokenizer tokenizerLine = new

StringTokenizer(tokenizerArticle.nextToken());

String strName = tokenizerLine.nextToken();

// 学生姓名部分

String strScore = tokenizerLine.nextToken();

// 成绩部分

Text name = new Text(strName);

int scoreInt = Integer.parseInt(strScore); // 输出姓名和成绩

context.write(name, new IntWritable(scoreInt));

} } }

3）、reduce函数

public static class Reduce extends Reducer {

// 实现reduce函数

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; int count = 0;

（（

Iterator iterator = values.iterator(); while (iterator.hasNext()) { sum += iterator.next().get();

// 计算总分

count++;// 统计总的科目数 }

int average = (int) sum / count;

// 计算平均成绩

context.write(key,

new IntWritable(average));

} }

4、输入文件

5、输出结果

五、课程总结

通过本次课程的学习，我了解了hadoop分布式计算的原理，掌握了一些基本的linux编程、linu系统中hadoop的部署和HBASE的原理。学习实践的过程中遇到了许多问题，通过自己的探索和老师同学的帮助，也一一解决了，同时，在不断的发现问题和解决问题中也提升了自己的能力，感到获益匪浅。

解决问题的过程都不会是一帆风顺的，在一次次的尝试中会不断遇到更多的问题，但静下心来仔细思考，这不仅是对思维也是对性情的磨练，获得的不仅是在能力上的提升，更重要的是能让我更冷静的去对待生活中遇到的问题，更勤于思考，沉着应对。

六、附录

ackage com.hebut.mr;

import java.io.IOException; import java.util.Iterator;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class Score {

public static class Map extends

Mapper { // 实现map函数

public void map(key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); // 将输入的数据首先按行进行分割

StringTokenizer tokenizerArticle = new StringTokenizer(line, \"\\n\");

// 分别对每一行进行处理

while (tokenizerArticle.hasMoreElements()) { // 每行按空格划分

StringTokenizer tokenizerLine = new

StringTokenizer(tokenizerArticle.nextToken());

String strName = tokenizerLine.nextToken();

// 学生姓名部分

String strScore = tokenizerLine.nextToken();

// 成绩部分

Text name = new Text(strName);

int scoreInt = Integer.parseInt(strScore); // 输出姓名和成绩

context.write(name, new IntWritable(scoreInt)); } } }

public static class Reduce extends

Reducer { // 实现reduce函数

public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; int count = 0;

Iterator iterator = values.iterator(); while (iterator.hasNext()) { sum += iterator.next().get();

// 计算总分

count++;

// 统计总的科目数

}

int average = (int) sum / count;

// 计算平均成绩

context.write(key, new IntWritable(average)); } }

public static void main(String[] args) throws Exception { Configuration conf = new Configuration();

conf.set(\"mapred.job.tracker\

String[] ioArgs = new String[] { \"score_in\ String[] otherArgs = new GenericOption

Parser(conf,ioArgs).getRemainingArgs(); if (otherArgs.length != 2) {

System.err.println(\"Usage: Score Average \"); System.exit(2);

}

Job job = new Job(conf, \"Score Average\"); job.setJarByClass(Score.class); // 设置Map、Combine和Reduce处理类 job.setMapperClass(Map.class);

job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); // 设置输出类型

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

// 将输入的数据集分割成小数据块splites，提供一//个RecordReder的实现 job.setInputFormatClass(TextInputFormat.class); // 提供一个RecordWriter的实现，负责数据输出

job.setOutputFormatClass(TextOutputFormat.class); // 设置输入和输出目录

FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }

}

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文

全部栏目

分布式实验报告