views:

60

answers:

1

Let us presume we have k sequences of fixed length p. Each sequence has double values in range 0 to 1.0. For simplicity let us also assume that the sequences are just arrays; in the real implementation, they will be list.

Now, the algorithm needs to find the smallest index the value of which represents a "major upset" in a given sequence. This upset can be a value of 1.0 or a value that goes over a certain threshold (e.g. 0.2). If, for example, moving from j-1 to j the value increases over the threshold, then the index we seek would be j-1.

The upset of 1.0 takes precedence over the threshold value; for instance, if we find an index matching the threshold, we should still check the sequence for containing 1.0.

Finally, the algorithm should produce the smallest index that resulted in the upset. I have quickly put together some code to test the concept and show you the sort of thing I am after. What I am looking for is a possibly more efficient implementation as this algorithm is going to be executed pretty extensively.

List<double[]> nearCaptures = new ArrayList<double[]>();
double threshold = 0.2;
double majorUpset = 1.0;
int[] indexes = new int[nearCaptures.size()];
for (int i = 0; i < nearCaptures.size(); i++) {
    int index = 0;
    double[] tempArray = nearCaptures.get(i);
    Arrays.sort(tempArray);
    int tempIndex = Arrays.binarySearch(tempArray, majorUpset);
    if (tempIndex > 0) {
        for (int j = 1; j < nearCaptures.get(0).length; j++) {
            if (nearCaptures.get(i)[j] == majorUpset) {
                index = j-1;
                break;
            }
        }
    } else {
        for (int j = 1; j < nearCaptures.get(0).length; j++) {
            if (nearCaptures.get(i)[j] >= nearCaptures.get(i)[j-1] + threshold) {
                index = j-1;
                break;
            }
        }
    }
    indexes[i] = index;
}
Arrays.sort(indexes);
System.out.println(indexes[0]);
+3  A: 

Some hints for improving performance (and correctness):

  • When looking for a majorUpset, you perform a sort and a binary search, resulting in O(n log(n)) runtime, followed by a linear search (the for loop). That linear search would be all you need to find whether and where there is a majorUpset.

  • Since tempArray refers to the original array, you mess up your indices when you sort it. If you needed to sort, sort a copy. But as noted above, you won't need to sort.

  • You access the value nearCaptures.get(i) several times in a loop, where it would be better to store it in a local variable, right at the beginning of the i-loop.

Addition:

You might want to perform the search in parallel, because then you can stop as soon as you have found the smallest upset index in any of the arrays.

int p = nearCaptures.get(0).length;  // p is the common array length
// search for majorUpset
for(int j = 0; j < p; j++){
  for (double[] arr : nearCaptures) {
    if (arr[j]==majorUpset) return j; // first majorUpset
  }
}
// search for threshold
for(int j = 1; j < p; j++){
  for (double[] arr : nearCaptures) {
    if (arr[j]>arr[j-1]+threshold) return j-1; // first threshold
  }
}
Christian Semrau
Yes, point taken about the lookup of majorUpset and allocation - thanks Christian. It was just a quick experimental code to see how to attack the issue. Done in a crude, brute force way.
lennyks
You also should factor out the search code for a single array into a helper method. This increases readability of the code. And maybe replace the last `Arrays.sort(indexes)` with a linear search for the smallest element.
Christian Semrau
My addition suggest a different approach, which would make the other hints obsolete.
Christian Semrau
Ups! My addition is an answer to a different question. It does not return the index you are looking for.
Christian Semrau