tags:

views:

97

answers:

5

How do I get the index of a single word (represents in a char array) which can be found in a paragraph (again represents in a char array).

the char represents the word

char word[] = new char[]{'w','o','r','d'};

and here's the paragraph

char para[] = new char[]{'f','g','q','z','y','i','o','p','w','o','r','d'};

I would like to get the index of the first letter in this case 8th. I used binary search by when sorting the words get scrambled.

Thanks.

+1  A: 

The simplest method is just to try all possibilities, by looping through each starting point and testing if all characters match. By the fact you've already mentioned binary search, this is probably simple enough for you to already know, though let me know if that's what you're looking for.

If you're looking for the best method, see http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm .

'Best' was probably not the right word actually; there are others that perform better on some cases. But that's the most commonly used algorithm.
+2  A: 

Binary search won't help you in this case. You have to search linearly. The easiest solution would be to search linearly for the first character and, when found, check if the remaining word follows.

A more elaborate solution would be to use a KMP algorithm.

aioobe
Yes, the precondition of a sorted array for binary searching completely destroys the information. The character sequence has to be extracted beforehand.
James P.
I prefer [Boyer-Moore](http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm) myself. KMP is just... not intuitive.
quantumSoup
+1  A: 

You can convert the character arrays to strings. The result of the search in the string is the same as if you had searched the arrays.

String needle = new String(word);
String haystack = new String(para);
int i = haystack.indexOf(needle);

Result:

8

This can be much faster than a naive O(n*m) search because the string function indexOf is optimized.

If you want to do it without creating the temporary strings you can implement a string searching algorithm for byte arrays. You could for example choose the Boyer-Moore algorithm which has worst case O(n).

Mark Byers
The naive search algorithm is actually O(n*m), but this looks a bit like homework so he might not be able to convert it to a String.
quantumSoup
Just curious, what does the O(n^2) refer to? I never saw this at algorithmics.
James P.
@Aircule: Thanks, sorry that was a mistake. @James P.: http://en.wikipedia.org/wiki/Big_O_notation
Mark Byers
By the way I implemented the Rabin-Karp algorithm in java and its performance was horrendous, way worse than the naive algorithm. It's the hashing I guess. Try Booyer-Moore.
quantumSoup
@Mark I actually did a very extensive project on string matching, and I tested several hash functions for RK, both functions that calculated hashes incrementally and just Java's .hashCode(). Regardless of the hash function used, RK performed considerably worse than the naive algorithm. We tested it with patterns up to 600 chars long and input files up to 35 MB in size.
quantumSoup
@Aircule: You need to use a rolling hash. If you use hashCode then you haven't implemented RK. If you implement RK correctly it will be faster for large strings. PS: If you can post a link to your project source code and you paper that would be very interesting to see.
Mark Byers
@Mark Byers We did. Hang on, I'll upload the paper.
quantumSoup
@Aircule: Maybe you missed a test case. Did you try testing with the string 'aaaaa...millions of a.....aaaaa' and the pattern 'aaaa...aaaaab'?
Mark Byers
@Aircule: Very interesting, thanks!!
Mark Byers
+4  A: 

A bit inefficient theoretically, but rather practical and simple:

int position = new String(paragraph).indexOf(new String(word));

If you want to understand how this works - check the static int indexOf(..) method of java.lang.String

Bozho
If the `char[]` won't contain thousands of characters, I see no problem with this.
Bart Kiers
yes, that's why I said 'theoretically'. In practice it will be sufficient.
Bozho
I bet this will be orders of magnitude faster than anything he can implement. The only problem would be the String being too big for the heap.
quantumSoup
@Aircule - I think it is a bit more complicated than that. For instance, creating a `String` from a `char[]` entails making a copy of the `char[]`. Also, there are fast search algorithms that are significantly better at searching large texts than the naive algorithm that `String.indexOf()` (probably) uses.
Stephen C
@Stephen I highly doubt `String.indexOf()` uses the naive algorithm. I tested it.
quantumSoup
@Aircule - Well I'm looking at the source code: String.java 1.205 dated 2009-02-26 Copyright Sun (from Java 1.6). And it **does** use a naive search algorithm for the `indexOf` methods!!
Stephen C
@Aircule - And there's a reason for this. Clever algorithms like Boyer-Moore entail some non-trivial setup. For short strings ... the most common use-case for `String.indexOf(String)` ... the time taken to do the setup exceeds the time saved by the smart algorithm.
Stephen C
A: 

Quick answer and I suppose others will be more elaborate. Initially, I'd do something like this (pseudocode is better for thinking out algorithms):

boolean nonmatchingchar
integer i, j
for each i of word until endof word
    for each j of para until endof para
      if word i isnotequalto para i set nonmatchingchar true     
    end for
end for


if nonmatchingchar is true print "character sequence not found"

Edit: To make this more efficient in the case where you'd have several words to search for, you could constitute a two-dimensional array with words sorted according to their first letter. From there on, you could go through the second array letter by letter and test a subset of words according to that letter.

James P.