tags:

views:

126

answers:

4

Apologies for the re-post; the earlier time I'd posted I did not have all the details.

My colleague, who quit the firm was a C# programmer, was forced to write Java code that involved (large, dense) matrix multiplication.

He's coded his own DataTable class in Java, in order to be able to

a) create indexes to sort and join with other DataTables

b) do matrix multiplication.

The code in its current form is NOT maintainable/extensible. I want to clean up the code, and thought using something like R within Java will help me focus on business logic rather than sorting, joining, matrix multiplication, etc.

Plus, I'm very new to the concept of DataTable; I just want to replace the DataTable with 2D arrays, and let R handle the rest.

(I currently do not know how to join 2 large datasets in java very efficiently

Please let me know what you think. Also, are there any simple examples that I can take a look at?

A: 

see answer

Justin
why the mark down, its a valid answer to the question of dealing with a large matrix, and it gives way more detail than any of the other answers on this post to date.secondly the poster explains he has no idea about matrices and that link's answer explains it well.
Justin
A: 

Mahout implements matrix and vector operations of this type. It also supports dsitributed, large-scale matrix operations though you may want to ask around on the mailing list for guidance on how to use this pretty new code.

Sean Owen
+1  A: 

Don't take this too harshly but you seem to be preparing to replace one chunk of unmaintainable code with another chunk of unmaintainable code. How do I reach this remarkable conclusion ? By your own admission your Java expertise is not quite up to the task you face and you propose to replace a pure Java solution with Java+R.

I suggest that you identify your core skills and use whatever toolset you are most comfortable with to refactor the code. If you don't I foresee a post on SO in a year or so from your replacement complaining about the unmaintainable code you left behind !

High Performance Mark
Well said ... I accept what you're saying to a certain extent ... but believe me, there are more than just these issues bogging me down at the moment.I precisely do not want to get to another set of unmaintainable code ... just want a clean and elegant solution.I have no reason to be writing code to unnecessarily sort, matrix multiply, when already such solutions exist. I do however have other issues that I'm interested to solve.
Chapax
A: 

Here are some options: Parallel Colt is a numerics library for Java, and Incanter is an R-like system that runs on the JVM.

Jouni K. Seppänen
Thx ... am trying it out
Chapax