views:

201

answers:

4

using binarySearch never returns the right index =/

int j = Arrays.binarySearch(keys,key);

where keys is type String[] and key is type String

I read something about needing to sort the Array but how do I even do that if that is the case?

I wish #java was smarter =/ <-- read:simpler

Given all this I really just need to know:

How do you search for a String in an array of Strings (less than 1000) then?

+8  A: 

java.util.Arrays.sort(myArray);

That's how binarySearch is designed to work - it assumes sorting so that it can find faster.

If you just want to find something in a list in O(n) time, don't use BinarySearch, use indexOf. All other implementations of this algorithm posted on this page are wrong because they fail when the array contains nulls, or when the item is not present.

public static int indexOf(final Object[] array, final Object objectToFind, int startIndex) {
    if (array == null) {
        return -1;
    }
    if (startIndex < 0) {
        startIndex = 0;
    }
    if (objectToFind == null) {
        for (int i = startIndex; i < array.length; i++) {
            if (array[i] == null) {
                return i;
            }
        }
    } else {
        for (int i = startIndex; i < array.length; i++) {
            if (objectToFind.equals(array[i])) {
                return i;
            }
        }
    }
    return -1;
}
Mark Byers
thanks for not answering my question
codeninja
It doesn't return the right index because it assumes sorting (for performance) and your array is not sorted.
Mark Byers
he did answer your question! It doesn't find it because the array is not sorted. To do a binary search the array must be sorted. The answer is to first sort the array using Arrays.sort and then call binarySearch on it.
TofuBeer
If you just want a linear search, use indexOf from ArrayUtils.
Mark Byers
No it does not. If you do not provide a comparator it uses the "natural order" of the elements, which means it uses the compareTo method that String implements from the Comparable interface: http://java.sun.com/javase/6/docs/api/java/util/Arrays.html#sort%28java.lang.Object[]%29
TofuBeer
A: 

Hi:

Of all the overloaded versions of binarySearch in Java, there is no such a version which takes an argument of String. However, there are three types of binarySearch that might be helpful to your situation:

static int binarySearch(char[] a, char key); static int binarySearch(Object[] a, Object key); static int binarySearch(T[] a, T key, Comparator c)

Michael Mao
Oh yeah, `char` one in particular would work **exceptionally** well.
ChssPly76
@Michael: it's an array *of strings* not a string.
Martinho Fernandes
And String in java ain't a char[] either.
ChssPly76
out of those 3 only the Object[], Object one is what would be needed for this situation. The char[] won't work (a String in not a char[]) and the Comparator version is not needed unless you want to change the default order (alphabetical) of how Strings are compared to one another.
TofuBeer
And somehow this was upvoted.
Martinho Fernandes
Now if only a String were an Object!
DivineWolfwood
+3  A: 

From Wikipedia:

"In computer science, a binary search is an algorithm for locating the position of an element in a sorted list by checking the middle, eliminating half of the list from consideration, and then performing the search on the remaining half.[1][2] If the middle element is equal to the sought value, then the position has been found; otherwise, the upper half or lower half is chosen for search based on whether the element is greater than or less than the middle element."

So the prerequisite for binary search is that the data is sorted. It has to be sorted because it cuts the array in half and looks at the middle element. If the middle element is what it is looking for it is done. If the middle element is larger it takes the lower half of the array. If the middle element is smaller it the upper half of the array. Then the process is repeated (look in the middle etc...) until the element is found (or not).

If the data isn't sorted the algorithm cannot work.

So you would do something like:

final String[] data;
final int      index;

data = new String[] { /* init the elements here or however you want to do it */ };
Collections.sort(data);
index = Arrays.binarySearch(data, value);

or, if you do not want to sort it do a linear search:

int index = -1; // not found

for(int i = 0; i < data.length; i++)
{
    if(data[i].equals(value))
    {
        index = i;
        break; // stop looking
    }
}

And for completeness here are some variations with the full method:

// strict one - disallow nulls for everything
public <T> static int linearSearch(final T[] data, final T value)
{
    int index;

    if(data == null)
    {
        throw new IllegalArgumentException("data cannot be null");
    }

    if(value == null)
    {
        throw new IllegalArgumentException("value cannot be null");
    }

    index = -1;

    for(int i = 0; i < data.length; i++)
    {
        if(data[i] == null)
        {
            throw new IllegalArgumentException("data[" + i + "] cannot be null");
        }

        if(data[i].equals(value))
        {
            index = i;
            break; // stop looking
        }
    }    

    return (index);
}

// allow null for everything

public static <T> int linearSearch(final T[] data, final T value)
{
    int index;

    index = -1;

    if(data != null)
    {
        for(int i = 0; i < data.length; i++)
        {
            if(value == null)
            {
                if(data[i] == null)
                {
                    index = i;
                    break;
                }
            } 
            else
            {            
                if(value.equals(data[i]))
                {
                    index = i;
                    break; // stop looking
                }
            }
        }    
    }

    return (index);
}

You can fill in the other variations, like not allowing a null data array, or not allowing null in the value, or not allowing null in the array. :-)

Based on the comments this is also the same as the permissive one, and since you are not writing most of the code it would be better than the version above. If you want it to be paranoid and not allow null for anything you are stuck with the paranoid version above (and this version is basically as fast as the other version since the overhead of the method call (asList) probably goes away at runtime).

public static <T> int linearSearch(final T[] data, final T value)
{
    final int     index;

    if(data == null)
    {
        index = -1;
    }
    else
    {
        final List<T> list;

        list  = Arrays.asList(data);
        index = list.indexOf(value);
    }

    return (index);
}
TofuBeer
points <3 for effort
codeninja
+1, though I'm afraid your answer might have fallen on deaf ears.
ChssPly76
+1, this is the right one.
BalusC
It's better than the previous attempt but unfortunately it's still wrong. :( It fails if the array contains nulls.
Mark Byers
It isn't wrong - I wouldn't have an array that contained nulls! :-P (and yes I am serious). However you can fix it by adding a check before the .equals if you were to allow nulls. I would also add "if(value == null) throw new IllegalArgumentExcption("value cannot be null") to the method - and if I wanted to allow nulls add the appropriate checks. It is generally a bad idea in my experience to allow null to things. The code is generally less buggy when you just disallow null to things (not because of null pointers but because the design is better).
TofuBeer
there you go, full checking how I would do it and the more permissive way.
TofuBeer
I don't understand... what's the point of rolling your own when you can already get this exact function from existing libraries?
Mark Byers
What function is there that searches an array linerly?
TofuBeer
ArrayUtils.indexOf in Apache's Lang library: http://commons.apache.org/lang/
Mark Byers
Which is the same as rolling my own - but less of an impact if I don't want the bloat of the apache library...
TofuBeer
So much latent hostility here I hesitate to respond. :) Built in linear search in Java: int index = Arrays.asList(array).indexOf(key); The one extra allocation is a cheap pass-through so don't let it fool you. It's cheap and easy.
PSpeed
No latent hostility :-) I'd profile it along with the others and see what was best give the application... but yeah that is a valid way too.
TofuBeer
It equates to a loop almost identical to yours except without the extra stack var since they early return versus set and break. Which to me seems like it would be lost in the noise, so to speak.
PSpeed
I only do one return per method, keeps the code cleaner. If it were actually a performance issue I'd do multiple returns for that one case. I don't think it would be the same since first it would have to copy the array into the list and then go through the list again to find the item... so 2x through the list in the worst case (depending of course how the asList method is implemented on the particular VM.
TofuBeer
No, Arrays.asList() doesn't copy the array... it just wraps it in a thin wrapper that passes directly through to the array. The only additional overhead is the creation of that wrapper (and one more method indirection).
PSpeed
And it doesn't vary from JVM to JVM as it's documented to act that way.
PSpeed
@TofuBeer, the single return point is a false goal, which very often will just make your code worse. You should drop the notion.
Kevin Bourrillion
@PSpeed ah so it is - I didn't read the API doc. Hmmm.... I don't like asList now... sigh (I really wish java had C++ const...). The method indirection will probably go away by the JIT. I'll update my answer to add that way - thx.
TofuBeer
@Kevin in my experience with dealing with people who like multiple return statements per method and those that do not the code from those that do not tends to be better. The one thing I do for an early exit is throwing an exception. So I will have multiple exit conditions, just not multiple returns.
TofuBeer
@Kevin part of what foes with that is also that a long method for me is generally about 10 lines so I generally only have 1 or 2 places that I can return from anyways, which mean it is pretty hard for that to make the code a mess.
TofuBeer
@TofuBeer, asList() does an important job... as does Collections.unmodifiableList(). I don't miss const too much but maybe that's because of all of the abuse I witnessed. ;)
PSpeed
I'd like to have an "asStaticList" or something like that that made a copy of the array. I am pretty sure I write code to do asList on a private array that I returned from a method... so I gave direct access to the private data - oops. At least I now know it doesn't make a copy of the array so I can only use it when appropriate :-)
TofuBeer
Yeah, it's an adapter... which is extremely useful in some cases. Copying would need to be explicit as with any other collection.
PSpeed
A: 

To respond correctly to you question as you have put it. Use brute force

RocketSurgeon