views:

97

answers:

2

I have an application which forms all the possible pairs and then compare the pairs, but when I run the applicaion, it gave me exception:OutOfMemoryError:Java heap space when running the code. I tried -Xmx1500m, but the exception just kept coming. the code for generating the pairs are as following

File file = ...;

final Map<Pair, Collection<Integer>> lineNumbersByPair = new HashMap<Pair, Collection<Integer>>();

/*
 * Step 1: Read in the lines, one by one.
 */
Reader reader = new FileReader(file);
try {
    BufferedReader bufferedReader = new BufferedReader(reader);
    try {
        String line;

        int lineNumber = 0;
        while ((line = bufferedReader.readLine()) != null) {
            lineNumber++;

            String[] tokens = line.split("\\s+");
            int[] values = new int[tokens.length];

            for (int i = 0; i < tokens.length; i++) {
                values[i] = Integer.parseInt(tokens[i]);
            }

            for (int i = 0; i < values.length; i++) {
                for (int j = i + 1; j < values.length; j++) {
                    Pair pair = new Pair(values[i], values[j]);

                    Collection<Integer> lineNumbers;
                    if (lineNumbersByPair.containsKey(pair)) {
                        lineNumbers = lineNumbersByPair.get(pair);
                    } else {
                        lineNumbers = new HashSet<Integer>();
                        lineNumbersByPair.put(pair, lineNumbers);
                    }
                    lineNumbers.add(lineNumber);
                }
            }
        }
    } finally {
        bufferedReader.close();
    }
} finally {
    reader.close();
}

/*
 * Step 2: Identify the unique pairs. Sort them according to how many lines they appear on (most number of lines to least number of lines).
 */
List<Pair> pairs = new ArrayList<Pair>(lineNumbersByPair.keySet());
Collections.sort(
        pairs,
        new Comparator<Pair>() {
            @Override
            public int compare(Pair pair1, Pair pair2) {
                Integer count1 = lineNumbersByPair.get(pair1).size();
                Integer count2 = lineNumbersByPair.get(pair2).size();
                return count1.compareTo(count2);
            }
        }
    );
Collections.reverse(pairs);

/*
 * Step 3: Print the pairs and their line numbers.
 */
for (Pair pair : pairs) {
    Collection<Integer> lineNumbers = lineNumbersByPair.get(pair);
    if (lineNumbers.size() > 1) {
        System.out.println(pair + " appears on the following lines: " + lineNumbers);
    }
}

I am reading a file around 15mb and it contains like 20000lines of single numbers, and there are around 40 numbers per line, it forms all possible pairs of each line. anyone has any idea about how to solve this? thanks

A: 
birryree
is there a better strcture for storing the pairs?
starcaller
Even worst, the time just listing the pair ... assume you can get 10k pair each second == 319999600000 / 10000 / 60 / 60 / 24 = more than one year! ____ you should rethink your problem, not the data structure.
J-16 SDiZ
if a Pair has two Integer fields, it should take 40 bytes on 64 bit systems or 24 bytes on 32 bit systems, IIRC. so 595MB or 357MB, plus a bit more for arraylist overhead.
Vuntic
A: 

When data became too big to fit memory, the only way is use extended memory (HDD). Here you can partition and store on disk, load small part each onto memory and search.

Or you should use a algobrithm that use less memory and more processor.
1.Search the file, search all num, and create a relative 2-d matrix or something like below.

  1 2 3 4 ...
1 0 1 0 0
2 0 1 0 0
3 0 0 0 0
...


2.You can sort on this matrix.
3.One by one pair, search the file to get line num contain both numbers in pair.
Sorry because my bad english.

pinichi