Hadoop中的文件读写是如何实现的？代码举例讲解

发表于2023年6月18日2023年6月11日作者 IT之美

在Hadoop中实现文件读写的主要方式是:

使用FileSystem API对HDFS进行操作:

获取FileSystem对象:

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

文件写操作:

FSDataOutputStream out = fs.create(new Path("/user/file.txt"));
out.writeBytes("test"); 
out.close();

文件读操作:

FSDataInputStream in = fs.open(new Path("/user/file.txt"));
String text = IOUtils.toString(in);
in.close();

也可以使用FileContext API以更统一的方式操作其他文件系统:

获取FileContext对象:

FileContext fc = FileContext.getFileContext(conf);

文件写操作:

FCDataOutputStream out = fc.create("file:///user/file.txt");
out.writeBytes("test");
out.close();

文件读操作:

FCDataInputStream in = fc.open("file:///user/file.txt");
String text = IOUtils.toString(in);
in.close();

Input/Output Formats用于MapReduce程序中读写数据。

文件操作的主要作用是:

读写HDFS数据。
上传/下载文件到集群。
实现MapReduce中数据输入输出。

来看一些简单示例:

使用FileSystem在HDFS中创建/读取文件:

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

FSDataOutputStream out = fs.create(new Path("/user/file.txt"));
out.writeBytes("test");  
out.close();

FSDataInputStream in = fs.open(new Path("/user/file.txt"));
String text = IOUtils.toString(in);
in.close();

使用 FileContext以统一方式操作其他文件系统:

FileContext fc = FileContext.getFileContext(conf);

FCDataOutputStream out = fc.create("file:///user/file.txt");  
out.writeBytes("test"); 
out.close();  

FCDataInputStream in = fc.open("file:///user/file.txt");
String text = IOUtils.toString(in); 
in.close();

所以通过FileSystem API或更通用的FileContext API,我们可以实现对HDFS或其他存储的文件操作。