views:

137

answers:

5

I'm debugging erroneous search returns from my data structures class project. This current project required us to build an ordered unrolled linked list, and run a search on the contents, then return a sublist of items from an inclusive start point to an exclusive end point. In order to do this, I have to search the internal array to find the index point of the start element. I do this by binary search, but since this only returns the first found match and there may be other matches before it, I have to work back in the array to find the first true match. I do this via

//get first index match, work backwards
int index= binarySearch(node.items, 0, node.numUsed, item, comp);
while (index>0 && comp.compare(node.items[index], item)==0){
    index--;
}

The professor provided testing code which breaks up a String and adds each character as an item in the structure. He also included a nested StringCmp class, which is referenced as comp in the binary search declaration above. His compare method is

public int compare(String s1, String s2) {
  cmpCnt++;
  return s1.compareTo(s2);
}

However, when running a test on the sublist method from i to o, this method is returning a true value when comp.compare(h,i)==0 and this is throwing my start results off from the search class I wrote. I originally compensated by return index++ which sufficed to pass the structure test, but threw off the expected start point by one.

So why is this returning true when it is obviously false?

EDIT added printout of sublist method, expected to run from i to o

Input test string= abcdefghijklmnopqrstuvwxyzaeiou Return from sublist method:
chunk 1 (used 4 of 10): [h][i][i][j]
chunk 2 (used 4 of 10): [k][l][m][n]

The h should not be in the list at all, but the comp.compare(node.items[index], item)==0 is returning true that i == h, which is obviously false.

Edit two The second part of the project requires us to parse a text file, build Song objects from Artist, Title and Lyrics fields, and then run a search on the titles using a prefix. The bug occurring here does not occur in single and multi-letter searches, so I think the problem is with the StringCmp nested class in the test.

+5  A: 

Do you know what compare() is supposed to return? Generally functions like that return a negative value if the first argument is less than the second, 0 if they are equal, and a positive value if the first argument is greater than the second.

Also, what is performing the sort on your array? Could you post your binarySearch() code so we can see if the problem lies there?

Jonathan
thats exactly what its supposed to be. for the test code, the compare test calls the compare in StringCmp, which returns the value of String.compareTo().
Jason
I'm confused, did I answer your question? Or is there still a problem?
Jonathan
no, you didn't, sorry. My comment was to corroborate what you wrote in your answer. I added a printout of the results in the OP
Jason
A: 

Check official description of the compareTo(String) method

From the doc:

Returns: the value 0 if the argument string is equal to this string; a value less than 0 if this string is lexicographically less than the string argument; and a value greater than 0 if this string is lexicographically greater than the string argument.

Dima
A: 

Out of curiosity, wouldn't this be more in-line for what you're trying to do?

//get first index match, work backwards
int index= binarySearch(node.items, 0, node.numUsed, item, comp);

for (int i = index; i >= 0; i--) {
    if (comp.compare(node.items[index], item)==0))
        index = i;
}
R. Bemrose
tested the code, returns the same output as in my edit above
Jason
+1  A: 

Since you implemented your own binary search, you should continue to use this to search for the first matching element in the array, instead of stopping as soon as you have found a match.

Your current approach introduces unnecessary complexity, and potentially changes a O(log N) algorithm into an O(N) one if your input array consists of all identical values.

I assume your current algorithm looks something like

int low = 0; 
int high = a.length-1;
while (low <= high) {
    int mid = (low + high) / 2;
    int cmp = comp.compare(a[mid], key);
    if (cmp < 0)
        low = mid + 1;
    else if (cmp > 0)
        high = mid - 1;
    else
        return mid;
}
return -1;

replacing this with the code below should do the trick

int low = 0;
int high = a.length-1;
while (low < high) {
    int mid = (low + high) / 2;
    int cmp = comp.compare(a[mid], key);
    if (cmp < 0)
        low = mid + 1;
    else
        high = mid;
}
if (comp.compare(a[low], key) == 0)
    return low;
else 
    return -1;
Luke Hutteman
Since the internal arrays are less than 100 elements and average 60 elements per node, there is not much efficiency lost in iterating forwards and back from the sentinel points. But you do have an intriquing point and I think I can modify it to return the end value as well as the start.
Jason
if efficiency isn't important, why even do a binary search to begin with? You could simply do a linear search until you find the element you're looking for instead.
Luke Hutteman
Considering your professor added a "cmpCnt++" to the Comparator, I think he does care about algorithmic complexity, and may verify whether you're performing more comparisons than necessary. You may only be applying your logic to small arrays right now, but ideally your solution would be applicable in a performant manner to arrays of any size.
Luke Hutteman
+2  A: 

your while-loop

while (index>0 && comp.compare(node.items[index], item)==0){
    index--;
}

overshoots the first match by one as you're decrementing index for every match, leaving you at an index that no longer matches. Calling the Comparator on the item at index-1 instead would fix this:

while (index>0 && comp.compare(node.items[index-1], item)==0){
    index--;
}
Luke Hutteman