views:

95

answers:

3

Hi guys. I am working on an application which has a large array containing lines of numbers,

transNum[20000][200]//this is the 2d array containing the numbers and always keep track of the line numbers

I am using a nested loop to look for the most frequent items. which is

for(int i=0/*,lineitems=0*/;i<lineCounter;i++)
  {
      for(int j=0,shows=1;j<lineitem1[i];j++)
      {
          for(int t=i+1;t<lineCounter;t++)
          {
              for(int s=0;s<lineitem1[t];s++)
              {
                  if(transNum[i][j]==transNum[t][s])
                      shows++;
              }
          }

          if(shows/lineCounter>=0.2)
          {

              freItem[i][lineitem2[i]]=transNum[i][j];
              lineitem2[i]++;
          }
      }

  }

when I was doing tests using small input arrays like test[200][200], this loop works fine and the computing time is acceptable, but when I try to process the array contains 12000 lines, the computing time is too long, so I am thinking if there are other ways to compute the frequent items rather than using this loop.I just ran a test on 10688 lines, and the time to get all the frequent item is 825805ms, which is way to expensive.

A: 

Depends on your input. If you are also inserting the data in the same code then you can count frequent items as you insert them.


Here is a pseudo-C solution:

int counts[1000000];

while(each number as n)
{
    counts[n]++;
    // then insert number into array
}

EDIT #2: Make sure, so you don't get unexpected results, to initialize all the items in the array to zero.

Thomas O
I used the buffered reader to read Lines of numbers from a .dat file, and then store the numbers in the 2d array
starcaller
See my edit; you can use an array to store counts.
Thomas O
and what's the point of using this counts array, to get the number of counts I still have to scan the array with the same time as the loop thing
starcaller
Yes, but by doing it at the same time as you add data you don't have to scan the array again, and it will insignificantly increase the amount of time it takes to add the data.
Thomas O
oh, I got the point, I'll try if it works, but I think the time will not reduce too much, thanks :)
starcaller
@Thomas O: In java it will be always 0) You doesn't need initialize all items to zero.
Stas
^As in most languages, such as Python. But unless the language defines it, or you know the array is empty (for example you used calloc) you should empty the array.
Thomas O
@Thomas O : In Java primitive types are automatically initialized to “empty” values. There isn't any calloc(). Welcome to java world!)
Stas
Yeah, I am actually a Java programmer. (Sometimes. I prefer C for its raw speed, but Java is easier and almost as fast.) I didn't realise this question was Java until I looked at the tags because the syntax was very C like.
Thomas O
A: 

Bear in mind this is an O(n^2) algorithm at best and could be worse. That means the number of operations is proportional to the count of the items squared. After a certain number of lines, performance will degrade rapidly and there's nothing you can do about it except to improve the algorithm.

Tony Ennis
yea, i know this loop thing is time consuming, the point is what other algorithm could be used to sort this out
starcaller
How are you acquiring the data? Does it change once acquired?
Tony Ennis
initially, the lines of numbers was reading from a .dat file, and I used a buffered reader to read all of the numbers into a 2d array(in order to keep track of the line numbers).
starcaller
A: 

The Multiset implementation from Google Guava project might be useful in such cases. You could store items there and then retrieve set of values with count of each occurrence.

iYasha
but i have to keep track which line that the certain item appears
starcaller
Then you could use Multimap and call method put(item, line number). Then you'd have ability not only to get count of each item but also values of line items.
iYasha
Also TreeMultimap implementation would just allow you to set Comparator for keys in which case you could sort items by number of associated occurrences.
iYasha