views:

819

answers:

7

I am writing a program in Java that requires me to compare the data in 2 files. I have to check each line from file 1 against each line of file 2 and if I find a match write them to a third file. After I read to the end of file 2, how do I reset the pointer to the beginning of the file?

public class FiFo {

    public static void main(String[] args) 
    {
         FileReader file1=new FileReader("d:\\testfiles\\FILE1.txt");
         FileReader file2=new FileReader("d:\\testfiles\\FILE2.txt");
         try{
         String s1,s2;


         while((s1=file1.data.readLine())!=null){
                System.out.println("s1: "+s1);
             while((s2=file2.data.readLine())!=null){
                 System.out.println("s2: "+s2);
             }
         }
    file1.closeFile();
    file2.closeFile();

    }catch (IOException e) {
          e.printStackTrace();
    }

}
}


 class FileReader {
     BufferedReader data;
     DataInputStream in;
 // public static void main(String arg[]) {
public FileReader(String fileName)
{
    try{
     FileInputStream fstream = new FileInputStream(fileName);
        // Get the object of DataInputStream
       in = new DataInputStream(fstream);
         data = new BufferedReader(new InputStreamReader(in));
    }
     catch (IOException e) {
          e.printStackTrace();
        }
} 
     public void closeFile()
     {
         try{
         in.close();
         }

         catch (IOException e) {
              e.printStackTrace();
            }}
+4  A: 

I think the best thing to do would be to put each line from file 1 into a HashMap; then you could check each line of file 2 for membership in your HashMap rather than reading through the entire file once for each line of file 1.

But to answer your question of how to go back to the beginning of the file, the easiest thing to do is to open another InputStream/Reader.

danben
+1 - it is much more efficient to load file 1 first. Unless the files can be very large.
tulskiy
+5  A: 

I believe RandomAccessFile is what you need. It contains: RandomAccessFile#seek and RandomAccessFile#getFilePointer.

rewind() is seek(0)

Gennady Shumakher
+1  A: 

well, Gennady S. answer is what I would use to solve your problem.

I am writing a program in Java that requires me to compare the data in 2 files

however, I would rather not code this up again.. I would rather use something like http://code.google.com/p/java-diff-utils/

Ryan Fernandes
That's great to know that there is an open source that tackles these kind of problems, though GPL license type may become a serious issue in using it.
Gennady Shumakher
@Gennady - only in the land of dinosaurs :-). But seriously, if you are unhappy with the GPL, you are free to develop your own non-GPL libraries.
Stephen C
@Stephen C, it's not me, it's company's law department :-) But anyhow GPL requires your code to to become GPL which is not appropriate in many cases.
Gennady Shumakher
@Gennady - and my response remains the same. If GPL is not suitable for you/your company, don't complain about it. Just find a non-GPL alternative or develop one in-house.
Stephen C
@Stephen C, that wasn't complain. That was the information that important to decide whether the library is usable for the person asking question.
Gennady Shumakher
+1  A: 

Obviously you could just close and reopen the file like this:

     while((s1=file1.data.readLine())!=null){
         System.out.println("s1: "+s1);
         FileReader file2=new FileReader("d:\\testfiles\\FILE2.txt");
         while((s2=file2.data.readLine())!=null){
             System.out.println("s2: "+s2);
             //compare s1 and s2;
         }
         file2.closeFile()
     }

But you really don't want to do it that way, since this algorithm's running time is O(n2). if there were 1000 lines in file A, and 10000 lines in file B, your inner loop would run 1,000,000 times.

What you should do is read each line and store it in a collection that allows quick checks to see if an item is already contained(probably a HashSet).

If you only need to check to see that every line in file 2 is in file 1, then you just add each line in file one to a HashSet, and then check to see that every line in file 2 is in that set.

If you need to do a cross comparison where you find every string that's in one but not the other, then you'll need two hash sets, one for each file. (Although there's a trick you could do to use just one)

If the files are so large that you don't have enough memory, then your original n2 method would never have worked anyway.

Chad Okere
A: 

As others have suggested, you should consider other approaches to the problem. For the specific question of returning to a previous point in a file, java.io.FileReader inherits mark() and reset() methods that address this goal.

trashgod
A: 

As noted, there are better algorithms - investigate these

aside:

FileReader doesn't implement mark and reset, so trashgod's comments are inaccurate. You'd either have to implement a version of this (using RandomAccessFile or what not) or wrap in a BufferedReader. However, the latter will load the whole thing in memory if you mark it

MJB
A: 

Just a quick Question. can't you keep one object pointed at the start of the file and traverse through the file with another object? Then when you get to the end just point it to the object at the beginning of the file(stream). I believe C++ has such mechanisms with file I/O ( or is it stream I/O)

Dark Star1