views:

54

answers:

1

I have a file that contains java serialized objects like "Vector". I have stored this file over Hadoop Distributed File System(HDFS). Now I intend to read this file (using method readObject) in one of the map task. I suppose

FileInputStream in = new FileInputStream("hdfs/path/to/file");

wont' work as the file is stored over HDFS. So I thought of using org.apache.hadoop.fs.FileSystem class. But Unfortunately it does not have any method that returns FileInputStream. All it has is a method that returns FSDataInputStream but I want a inputstream that can read serialized java objects like vector from a file rather than just primitive data types that FSDataInputStream would do.

Please help!

+1  A: 

FileInputStream doesn't give you facitily to read serialized objects directly. You need to wrap it into ObjectInputStream. You can do the same with FSDataInputStream, just wrap it into ObjectInputStream and then you can read your objects from it.

In other words, if you have fileSystem of type org.apache.hadoop.fs.FileSystem, just use:

ObjectInputStream in = new ObjectInputStream(fileSystem.open(path));
Peter Štibraný
Great, that worked! Thanks
Akhil