views:

284

answers:

2

Hello ,

I am trying to create an adjacency list to store a graph.The implementation works fine while storing 100,000 records. However,when I tried to store around 1million records I ran into OutofMemory Error :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.(String.java:215) at java.io.BufferedReader.readLine(BufferedReader.java:331) at java.io.BufferedReader.readLine(BufferedReader.java:362) at liarliar.main(liarliar.java:39)

Following is my implementation

HashMap<String,ArrayList<String>> adj = new HashMap<String,ArrayList<String>>(num);


        while ((str = in.readLine()) != null)
        {

            StringTokenizer Tok = new StringTokenizer(str);
            name = (String) Tok.nextElement();
            cnt = Integer.valueOf(Tok.nextToken());
            ArrayList<String> templist = new ArrayList<String>(cnt);

            while(cnt>0)
             {
                    templist.add(in.readLine());
                    cnt--;
             }
            adj.put(name,templist);

        } //done creating a adjacency list

I am wondering, if there is any better way to implement the adjacency list. Also, I know number of nodes right in the begining and , in the future I flatten the list as I visit nodes. Any suggestions ?

Thanks

+1  A: 

I seem to have an unfair advantage here since I recognize both the name of the problem (liarliar) and the exact nature of the input.

I can tell you that this OutOfMemoryError is by design. You need to find a better algorithm that does not store the entire adjacency information of the graph in memory.

I will refrain from giving too much algorithmic insight, but I can tell you that you need to sit down and think more than you need to program at this stage. Maybe read a good book on algorithms and data structures.


What you're doing right now is that you're actually storing the strings from the input file unnecessarily into your HashMap<String,ArrayList<String>>. This is very space-inefficient given the nature of the problem.

It's much easier and much more efficient if you use a java.util.Scanner instead. new Scanner(new File(inputFilename)) and next(), and nextInt() are all that you need.

Assign a unique int to each name (hint: Map<String, Integer>), and store the much smaller int instead.

polygenelubricants
A: 

Hi polygenelubricants,

Thanks for answering my question. Yes you have guessed it right , what the code is about.

I think yes , using a mapping of string to integers would save me some space req for adjacency list. In that sense the above error can be removed.

However, I used java -jar -Xmx1024M. This gives a way to run the program with more heap memory , and since its allowed to use it in the given prolem , that shld not be the reason my submission is failing.

Performance can be one of the reason for failing the bot though I am not sure.

With regards to your solution,

If I create a Map ,and then store the numbers in the adjacencey list it would save me some space, but it will also add an extra lookup each time I need to access a node during my bfs/dfs traversal. That bothers me . Are you saying I should not create the adjacency list at all ? Did I understand what you are trying to say correctly.

thanks.

p1
A `Map<String, Integer>` lookup is cheap (`O(1)`); you shouldn't be bothered by it unless it's proven to be a problem. Premature optimization is the root of all evils. This is even more true in this case, because there's a _BIG_ optimization that you still haven't discovered. You understood me right: you don't have to create the adjacency list at all. You can solve this problem in `O(V)` space. You can process the input in `O(V+E)` (which is optimal, since that's the size of the input), and after you're done reading the input, you can produce the output rightaway in `O(1)`.
polygenelubricants
Another way to save space without using the `Map<String,Integer>` is to `.intern()` the strings. A lot of names will repeat in the input, and they are redundantly stored in the memory multiple times. If you only store the `.intern()` of those names, the redundant copies will be garbage-collected. And as a bonus, if you consistently `.intern()` the names, you can use `==` and `!=` instead of `equals()`. Read about how string interning works. This is small time optimization, though. The really important one is the one in above comment.
polygenelubricants
I know it has to do with testing bipartiteness/graph colouring .. but I am still thinking in my head to get an answer in O(1) just after creating a Map(name,int) of all the nodes!!The int in Map(name,int) can change as per distance , all even goes in 1 group and odd in other .. Problem with all these approaches, It requires an adjacency list :( to ensure that graph is traced in BFS or DFS format. Else you end up with a new node not knowing which group the node belongs to.. so it becomes difficult to know what nodes the children would go to I am missing something, am I on the right track ?
p1