views:

146

answers:

2

at first i have a 2d array storing many numbers and then i used an array of hushtables to store pairs formed by the numbers of each row of the 2d array. for example in the first row of the 2d array, the numbers are 1 2 3 4 5 then the pairs should be 1,2 1,3 1,4 ,1,5 etc. the code for generating the pairs and store in hashtable is

     Hashtable [] table=new Hashtable [lineCounter];
    int [] pairCounter=new int[lineCounter];
    for(int i=0;i<lineCounter;i++)
  {
     table[i]=new Hashtable();
     for (int j=0;j<lineitem2[i]-1;j++)
     {
         for(int t=j+1;t<lineitem2[i];t++)
         {
             int firstnum=freItem[i][j];
             int secnum=freItem[i][t];
             String value=firstnum+":"+secnum;
             //System.out.println(firstnum+"``"+secnum);
             table[i].put(pairCounter[i],value);
             pairCounter[i]++;
             //System.out.println(i+":::"+table[i].get(firstnum));
         }
     }
 }

after stored each pair of everyline, I want to compare each pair in every to the other lines to find how many times this pair shows up, the code is

Hashtable resulttable=new Hashtable();
   int [] pairresult=new int [lineCounter];

   for(int i=0;i<lineCounter;i++)
   {
           //Set set1 = table[i].entrySet();
        //Iterator it1 = set1.iterator();
        Enumeration keys = table[i].keys();


       //for(int j=i+1;j<lineCounter;j++)
       //{


         while (keys.hasMoreElements())
             {
                int pairs=0;
               //Map.Entry entry1 = (Map.Entry) it1.next();
               int key = (Integer)keys.nextElement();
               String curvalue = (String)table[i].get( key );

               for(int j=i+1;j<lineCounter;j++)
               {

                 //Set set2 = table[j].entrySet();
                 //Iterator it2 = set2.iterator();
                 Enumeration keys2 = table[j].keys();

                 while (keys2.hasMoreElements())
                 {

                   //Map.Entry entry2 = (Map.Entry) it2.next();
                   //System.out.println(entry2.getKey() + " and " + entry2.getValue());
                   int key2 = (Integer)keys2.nextElement();
                   String curvalue2 = (String)table[j].get( key2 );
                   if(curvalue.equals(curvalue2))
                   {
                       pairs++;
                       table[j].remove(key2);
                   }
                 }

                 }
                if(pairs>=0.02*lineCounter)
                {
                  resulttable.put(curvalue,pairs);
                }







            }


   //    }


   }

but after ran it with the input file, I got the error:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.lang.StringBuilder.toString(StringBuilder.java:430)

is there anything wrong with my method of comparing the pairs? and why did I get that error please help thanks.

A: 

If your code works on a smaller data set, it might just be that the default 64Mb that the JVM uses is not enough, does it work when you pass -Xmx512m as argument to the Java command line?

rsp
Thanks I tried to pass -Xmx1024m and it worked, but the computing time is so long, it took 8mins to compute a file contains 211 lines of numbers. is there any other data structure or algorithm to do this?
starcaller
@starcaller, you are creating a large dataset and compare derived information. You could also compare the source dataset and construct the difference dataset from this. This will take much less memory.
rsp
+2  A: 

Okay, suppose you have a Pair class as follows:

public class Pair {

    private final int value1;
    private final int value2;

    public Pair(int value1, int value2) {
        this.value1 = value1;
        this.value2 = value2;
    }

    public int value1() {
        return value1;
    }

    public int value2() {
        return value2;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + value1;
        result = prime * result + value2;
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        Pair other = (Pair) obj;
        if (value1 != other.value1)
            return false;
        if (value2 != other.value2)
            return false;
        return true;
    }

    @Override
    public String toString() {
        return "(" + value1 + ", " + value2 + ")";
    }

}

Note that it's important to properly implement the equals(Object) and hashCode() methods if you expect instances of the Pair class to behave properly when used in hash-based data structures (for example: Hashtable, HashMap, HashSet, HashMultimap, HashMultiset).

Now, this code will read in a file (requires the Guava libraries):

    File file = ...;

    final Map<Pair, Collection<Integer>> lineNumbersByPair = new HashMap<Pair, Collection<Integer>>();

    /*
     * Step 1: Read in the lines, one by one.
     */
    Reader reader = new FileReader(file);
    try {
        BufferedReader bufferedReader = new BufferedReader(reader);
        try {
            String line;

            int lineNumber = 0;
            while ((line = bufferedReader.readLine()) != null) {
                lineNumber++;

                String[] tokens = line.split("\\s+");
                int[] values = new int[tokens.length];

                for (int i = 0; i < tokens.length; i++) {
                    values[i] = Integer.parseInt(tokens[i]);
                }

                for (int i = 0; i < values.length; i++) {
                    for (int j = i + 1; j < values.length; j++) {
                        Pair pair = new Pair(values[i], values[j]);

                        Collection<Integer> lineNumbers;
                        if (lineNumbersByPair.containsKey(pair)) {
                            lineNumbers = lineNumbersByPair.get(pair);
                        } else {
                            lineNumbers = new HashSet<Integer>();
                            lineNumbersByPair.put(pair, lineNumbers);
                        }
                        lineNumbers.add(lineNumber);
                    }
                }
            }
        } finally {
            bufferedReader.close();
        }
    } finally {
        reader.close();
    }

    /*
     * Step 2: Identify the unique pairs. Sort them according to how many lines they appear on (most number of lines to least number of lines).
     */
    List<Pair> pairs = new ArrayList<Pair>(lineNumbersByPair.keySet());
    Collections.sort(
            pairs,
            new Comparator<Pair>() {
                @Override
                public int compare(Pair pair1, Pair pair2) {
                    Integer count1 = lineNumbersByPair.get(pair1).size();
                    Integer count2 = lineNumbersByPair.get(pair2).size();
                    return count1.compareTo(count2);
                }
            }
        );
    Collections.reverse(pairs);

    /*
     * Step 3: Print the pairs and their line numbers.
     */
    for (Pair pair : pairs) {
        Collection<Integer> lineNumbers = lineNumbersByPair.get(pair);
        if (lineNumbers.size() > 1) {
            System.out.println(pair + " appears on the following lines: " + lineNumbers);
        }
    }

In a test, the code read in a file with 20,000 lines, each line containing 10 numbers ranging anywhere between 0 and 1000.

Adam Paynter
thanks, but when I tried to compile the file, it threw an error on the syntax:`final Multimap<Pair, Integer> lineNumbersByPair = HashMultimap.create();` said can't find symbol
starcaller
That's because the program relies on the Guava libraries (as mentioned in the answer). The libraries can be obtained from here: http://code.google.com/p/guava-libraries/downloads/list Unzip the file and look for a file named `guava-r07.jar`. Include the JAR file when you compile the program.
Adam Paynter
I downloaded it and the apache collection lib, and have already did `import org.apache.commons.collections.BidiMap;import org.apache.commons.collections.Factory;import org.apache.commons.collections.MultiHashMap;import org.apache.commons.collections.MultiMap;import org.apache.commons.collections.bidimap.DualHashBidiMap;import org.apache.commons.collections.map.LazyMap;` what else should I import, Thanks
starcaller
@starcaller: Okay, I have revised my answer to no longer depend on the Guava libraries. It should compile using only the standard Java library now.
Adam Paynter
It is amazing to use the collection to do the job, the time efficiency improved so much. Thanks so much
starcaller
@starcaller: You're welcome. I'm glad to help!
Adam Paynter
@ Adam Paynter: Hi Adam, I got another problem, when I try to run a .dat file which is 14.7mb large, and I set memory for the JVM to 1500mb, but it give me an exception:OutOfMemoryErroe:Java heap space. when I tried to set higher memory for it, like 1600mb, it said it's invalid the specified size exceeds the maximum representable size.
starcaller
@starcaller: I'm not sure how to properly configure the heap size. You may want to check for a question regarding that. If not, perhaps you should ask a separate question.
Adam Paynter
@starcaller: Also, you may want to try reverting the code back to the version that uses the Guava libraries (that is, the version that uses `Multimap` and `HashMultimap`). The classes you need to import are `com.google.common.collect.Multimap` and `com.google.common.collect.HashMultimap`. This may prevent excessive numbers of collections from being created.
Adam Paynter
@Adam: can you post the code using HashMultimap again please? I tried to write it but kind of getting lost.Thanks.
starcaller
@starcaller: Check my answer's [revisions](http://stackoverflow.com/posts/3851674/revisions) (hopefully you have enough reputation to see them). It should be the first revision.
Adam Paynter
@Adam: Thanks, now I have to add the google package to the path
starcaller
@Adam: I wrote the code in JCreator, and it compiled well, but when i tried to run it in command line prompt, it gave out error can't find google package, why did this show up since I have add the path in JCreator
starcaller
@starcaller: Java needs access to third party libraries both at *compile time* and at *run time*. By adding it to the path in JCreator, you have given Java access to the libraries at *compile time*. When you run the program, you must also make sure that Java has access to the same libraries. If you're running it via the command line, you will probably have to add the `-cp` command line argument. For example: `java -cp .;guava.jar your.awesome.Program` (the semicolon separates the paths - use a colon (`:`) if you're not on Windows)
Adam Paynter
@Adam:I tried using the google package, but still, it give me an exception:OutOfMemoryErroe:Java heap space
starcaller
@Adam: is there any better way to store the pairs?
starcaller
@starcaller: I don't know a solution immediately off the top of my head (other than trying a heap space larger than `-Xmx1024m`). You *may* find some help from the [Colt library](http://acs.lbl.gov/software/colt/). It's meant for "High Performance Scientific and Technical Computing in Java".
Adam Paynter
@Adam: if I don't need to track which line the pair appears and just record how many lines it appears, how should I modify the code? Thanks
starcaller
@Adam: I am thinking create the pair collection at first based on the numbers in the file regardless to which line they are since the number is from 1 to 999, and then compare the pairs from each line to this premade collection, and update the int value which record how many time it appears. ` final Multimap<Pair, int> allPair = HashMultimap.create(); for(int i=0;i<999;i++) { for(int j=i+1;j<1000;j++) { Pair pair = new Pair(i, j); if(countItem[i]!=0 } }`
starcaller
@Adam:I made it out by using premaking the map, and now, it cost 110s to read a 15m file. so happy, Thanks
starcaller