views:

170

answers:

4

Although I have Java in the title, this could be for any OO language. I'd like to know a few new ideas to improve the performance of something I'm trying to do.

I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives.

Example:

List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>();

public void newObject(Object[] obj) {
    for(int i = 0; i < obj.length; i++) {
        column-oriented.get(i).add(obj[i]);
    }
}

Note: For simplicity I've omitted the initialization of objects and stuff.

The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas.

How would you do this knowing it's very performance sensitive?

EDIT:

I've tested a few things and found that:

Instead of using ArrayList (or any other Collection), I wrapped an Object[] array in another object to store individual columns. If this array reaches its capacity, I create another array with twice de size and copy the contents from one to another using System.copyArray. Surprisingly (at least for me) this is faster that using ArrayList to store the inner columns...

A: 

Use a LinkedList for implementing the column lists. It's grows linearly with the data and is O(1). (If you use ArrayList it has to resize the internal array from time to time).

After collecting the values you can convert that linked lists to arrays. If N is the number of rows you will pass from holding 3*N refs for each list (each LInkedList has prevRef/nextRef/itemRef) to only N refs.

It would be nice to have an array for holding the different column lists, but of course, it's not a big improvement and you can do it only if you know the column count in advance.

Hope it helps!

Edit tests and theory indicate that ArrayList is better in amortized cost, it is, the total cost divided by the number of items processed... so don't follow my 'advice' :)

helios
`ArrayList` is also O(1) amortized cost and you can easily specify the initial capacity, if the required size is known.
Joachim Sauer
Actually I've tested the performance and ArrayList is way faster.
halfwarp
mmm... wow... I've never thought about amortized cost... you're right! And of course... if you know the initial capacity... presized ArrayList is better.
helios
A: 

I would try using LinkedList for the inner list, because it should have better performance for insertions. Maybe wrappping Object arra into collection and using addAll might help as well.

Gabriel Ščerbák
"because it should have better performance for insertions" ... and much slower performance for indexed access. ArrayList javadocs says: "The add operation runs in amortized constant time, that is, adding n elements requires O(n) time." Usually ArrayList performs much better than LinkedList since it doesn't have to allocate new node per insert as LinkedList does. It needs to reallocate its array from time to time, but not as often. You can also reserve enough space in advance. (Tbh, I wouldn't worry about it **at all** unless profiler shows problem here)
Peter Štibraný
@Peter Štibraný hi, look at the code, no index-based access, so I guess it could be a fair trade-off... btw. I think the guy who asks worries about it, what other reasons for asking could he have?
Gabriel Ščerbák
Code shows only list creation. I guess OP doesn't want to create new list just to keep it on heap unused. :) Arraylists can be created with less overhead and are faster to access. Linked lists are better when you do many removals, or when you need to implement stack/queues, but that is not the case here.
Peter Štibraný
@Peter Štibraný accessing LinkedList sequentially through iterator might be O(1) as well, so it depends... But I grant you that inserts might not be as fast, I have no actual measurements/experience for the single object allocation cost.
Gabriel Ščerbák
@Gabriel: Agree on both counts... sequential access is O(1) in both cases, and if you need inserts into the middle of list, you should probably reconsider your choice of ArrayList.
Peter Štibraný
+2  A: 

The answer depends on the data and usage profile. How much data do you have in such collections? What is proportions of reads/writes (adding objects array)? This affects what structure for inner list is better and many other possible optimizations.

The fastest way to copy data is avoid copying at all. If you know that obj array is not modified further by the caller code (this is important condition), one of possible tricks is to implement you custom List class to use as inner list. Internally you will store shared List<Object[]>. Each call we just add new array to that list. Custom inner list class will know which column it represents (let it be n) and when it is asked to give item at position m, it will transpose m and n and query internal structure to get internalArray.get(m)[n]. This implementation is unsafe because of limitation on the caller that is easy to forget about but might be faster under some conditions (however, this might be slower under other).

iPhone beginner
I already thought of something like this but haven't tested yet. Considering some recent refactoring I've done this could work for me. I'll test and let you know.
halfwarp
A: 

ArrayList may be slow, due to copying of arrays (It uses a similar approach as your self-written collection).

As an alternate solution you could try to simply store the Rows at first and create columns when neccessary. This way, copying of the internal arrays at the list is reduced to a minimum.

Example:

//Notice: You can use a LinkedList for rows, as no index based access is used.
List<Object[]> rows =... 

List<List<Object>> columns;

public void processColumns() {
  columns = new ArrayList<List<Object>>();
  for(Object[] aRow : rows){

    while (aRow.size() > columns.size()){
      //This ensures that the ArrayList is big enough, so no copying is necessary
      List<Object> newColumn = new ArrayList<Object>(rows.size())
      columns.add(newColumn); 
    }

    for (int i = 0; i < aRow.length; i++){
      columns.get(i).add(aRow[i]);
    }
  }
}

Depending on the number of columns, it's still possible that the outer list is copying arrays internally, but normal tables contains far more rows than columns, so it should be a small array only.

Hardcoded