views:

122

answers:

5

I'm trying to read a line of text from a text file and put each line into a Map so that I can delete duplicate words (e.g. test test) and print out the lines without the duplicate words. I must be doing something wrong though because I basically get just one line as my key, vs each line being read one at a time. Any thoughts? Thanks.

public DeleteDup(File f) throws IOException {

    line = new HashMap<String, Integer>();
    try {
        BufferedReader in = new BufferedReader(new FileReader(f));
        Integer lineCount = 0;
        for (String s = null; (s = in.readLine()) != null;) {
            line.put(s, lineCount);
            lineCount++;
            System.out.println("s: " + s);
        }
    }
    catch(IOException e) {
         e.printStackTrace();
    }
    this.deleteDuplicates(line);
}
private Map<String, Integer> line;
+2  A: 

To be honest, your question isn't particularly clear - it's not obvious why you've got the lineCount, or what deleteDuplicates will do, or why you've named the line variable that way when it's not actually a line - it's a map from lines to the last line number on which that line appeared.

Unless you need the line numbers, I'd use a Set<String> instead.

However, all that aside, if you look at the keySet of line afterwards, it will be all the lines. That's assuming that the text file is genuinely in the default encoding for your system (which is what FileReader uses, unfortunately - I generally use InputStreamReader and specify the encoding explicitly).

If you could give us a short but complete program, the text file you're using as input, the expected output and the actual output, that would be helpful.

Jon Skeet
+1  A: 

Your question is not very clear.

But while going through your code snippet, I think you tried to remove duplicate words in each line.

Following code snippet might be helpful.

public class StackOverflow {

    public static void main(String[] args) throws IOException {
        List<Set<String>> unique = new ArrayList<Set<String>>();

        BufferedReader reader = new BufferedReader(
                 new FileReader("C:\\temp\\testfile.txt"));

        String line =null;
        while((line = reader.readLine()) != null){

            String[] stringArr = line.split("\\s+");
            Set<String> strSet = new HashSet<String>();
            for(String tmpStr : stringArr){
                strSet.add(tmpStr);
            }
            unique.add(strSet);
        }       
    }
}
Upul
A: 

Only problem with your code I see is That DeleteDup doesn't have return type specified. Otherwise code looks fine and reads from file properly.

Please post deleteDuplicates method code and file used.

YoK
+1  A: 

What I understood from your question is to print the lines which do not have duplicate words in the line.
May be you could try the following snippet for it.

public void deleteDup(File f) 
    {
        try 
        {
            BufferedReader in = new BufferedReader(new FileReader(f));
            Integer wordCount = 0;
            boolean isDuplicate = false;
            String [] arr = null;
            for (String line = null; (line = in.readLine()) != null;) 
            {
                isDuplicate = false;
                wordCount = 0;
                wordMap.clear();

                arr = line.split("\\s+");
                for(String word : arr)
                {
                    wordCount = wordMap.get(word);
                    if(null == wordCount)
                    {
                        wordCount = 1;
                    }
                    else
                    {
                        wordCount++;
                        isDuplicate = true;
                        break;
                    }
                    wordMap.put(word, wordCount);
                }
                if(!isDuplicate)
                {
                    lines.add(line);
                }
            }
        }
        catch(IOException e) 
        {
            e.printStackTrace();
        }
    }

    private Map<String, Integer> wordMap = new HashMap<String, Integer>();
    private List<String> lines = new ArrayList<String>();

In this snippet, lines will contain the lines which do not have duplicate words in it. It would have been easier to find your problem if we knew what

this.deleteDuplicates(line);

tries to do. Maybe it is not clearing any of the data structure used. Hence, the words checked in previous lines will be checked for other lines too though they are not present.

aNish
A: 
  1. You are printing out every line read, not just the unique lines.
  2. Your deleteDuplicateLines() method won't do anything, as there will never be any duplicates in the HashMap.

So it isn't at all clear what your actual problem is.

EJP