views:

123

answers:

4

Hi All, I need simple opinion from all Guru!

I developed a program to do some matrix calculations. It work all fine with small matrix. However when I start calculating BIG thousands column row matrix. It kills the speed.

I was thinking to do processing on each row and write the result in a file then free the memory and start processing 2nd row and write in a file, so and so forth.

Will it help in improving speed? I have to make big changes to implement this change. Thats why I need your opinion. What do you think?

Thanks

P.S: I know about colt and Jama matrix. I can not use these packages due to company rules.


Edited

In my program I am storing all the matrix in 2 dimensional array and if matrix is small it is fine. However, when it has thousands column and rows. Then storing all this matrix in memory for calculation cause performance issues. Matrix contains floating values. For processing I read all the matrix store in memory then start calculation. After calculating I write the result in a file.

+3  A: 

Is memory really your bottleneck? Because if it isn't, then stopping to write things out to a file is always going to be much, much slower than the alternative. It sounds like you are probably experiencing some limitation of your algorithm.

Perhaps you should consider optimising the algorithm first.

And as I always say for all performance issue - asking people is one thing, but there is no substitute for trying it! Opinions don't matter if the real-world performance is measurable.

Gian
you can say processing speed is my bottleneck. I am new at this job. I am not Java+Math expert. I don't know how to figure how what is eating my speed. I used jvisualvm but it just show high level graph not the details that which function or class is eating my speed
+2  A: 

I suggest using profiling tools and timing statements in your code to work out exactly where your performance problem is before your start making changes.

You could spend a long time 'fixing' something that isn't the problem. I suspect that the file IO you suggest would actually slow your code down.

If your code effectively has a loop nested within another loop to process each element then you will see your speed drop away quickly as you increase the size of the matrix. If so, an area to look at would be processing your data in parallel, allowing your code to take advantage of multiple CPUs/cores.

Consider a more efficient implementation of a sparse matrix data structure and not a multidimensional array (if you are using one now)

Brabster
Thanks for your quick answer. I already checked and this is Matrix calculation that cause performance problem. In my program I am storing all the matrix in 2 dimensional array and if matrix is small it is fine. However, when it has thousands column and rows. Then storing all this matrix in memory for calculation cause performance issues. Any other idea?
I don't think you can infer that the performance problem is memory-related. Let's say you have a 5000x5000 matrix of doubles, and doubles use 8 bytes on your JVM. The storing the whole matrix in memory consumes 200,000,000 bytes, or around 190 MB - not much on modern computers. You can use tools like jconsole to see what memory in being allocated in your JVM and when (crucially) garbage collection is running.
Brabster
You've tagged sparse matrices. That means that many of the elements are zero. If you are storing your matrices as a 2-dimensional array and most of the elements are zero, consider an alternative implementation that avoids storing (and processing) the empty elements. Maybe a list of the non-empty elements storing the co-ordinates of each would save loads of memory and let you process much more quickly.
Brabster
Thanks for your answer:). If you are going to multiple 5000x5000 matrix with another 5000x5000 how much memory and cpu will it consume? :) I am doing multiplication stuff and I have 6GB RAM and core2du and it takes hours to process it :(
Is Java a strict requirement? This sounds like something that may be a good idea to use a dedicated math program for (like Mathematica)
Thorbjørn Ravn Andersen
+1  A: 

Use the profiler in jvisualvm in the JDK to identify where the time is spent.

I would do some simple experiments to identify how your algoritm scales, because it sounds like you might use one that has a higher runtime complexity than you think. If it runs in N^3 (which is common if you want to multiply a list with an array) then doubling the input size will eight-double the run time.

Thorbjørn Ravn Andersen
Thanks for your answer. How can I see which class, function, or statement is eating my speed in jvisualvm?
http://www.baptiste-wicht.com/2010/07/discover-java-visualvm-1-3/
Thorbjørn Ravn Andersen
+2  A: 

You need to remember that perfoming an NxN multipled by an NxN takes 2xN^3 calculations. Even so it shouldn't take hours. You should get an inprovement by transposing the second matrix (about 30%) but it really shouldn't be taking hours.

So as you 2x N you increase the time by 8x. Worse than that a matrix which fit into your cache is very fast but mroe than a few MB and they have to come from main memory which slows down your operations by another 2-5x.

Putting the data on disk will really slow down your calaculation, I only suggest you do this if you martix doesn't fit in memory, but it will make it 10x - 100x slower so buying a little more memory is a good idea. (In your case your matrixies should be small enough to fit into memory)

I tried Jama, which is a very basic library which use two dimensional arrays instead of one and on 4 year old labtop took 7 minutes. You should be able to get half this time by just using the latest hardware and with multiple threads cut this below one minute.

EDIT: Using a Xeon X5570, Jama multiplied two 5000x5000 matrices in 156 seconds. Using a parallel implementation I wrote, cut this time to 27 seconds.

Peter Lawrey
Hi Peter, Thank you so much. Just for my own experiment. I also downloaded jama and used 7000X4000 matrix, just to perform SVD calculation on my machine. It took some minutes and gave heap size error. I am just using their example with my matrix. If you want, I can provide you the example but I think you can also download with jama. So where is the problem? I used jvisualvm to see where am I going wrong. It again showed me heap size is going up and down, but not the place or class where I am going wrong. How can I see which class, function, or statement is eating my speed?
Running the Matrix.random(7000,4000).svd(), the memory usage started at 812 MB and stayed there. The Jama SVD makes three scratch areas about as big as your original array. I suggest you give the JVM at least 1 GB. i.e. -mx1g on the command line.
Peter Lawrey
Profilers usually have method level instrumentation, i.e. it will tell you how much times in spent in one method. However since all the work in one method, you will need to add your own instrumentation/timings using System.nanoTime(). This works for significant block of work however if you are looking which expression or array access takes the most time, you will have to try removing it and see how it affects the timing. e.g. replace an array access with local variable.
Peter Lawrey