ansaurus

Question

Java | compare char word in a char array

Answer 1

+1 A:

The simplest method is just to try all possibilities, by looping through each starting point and testing if all characters match. By the fact you've already mentioned binary search, this is probably simple enough for you to already know, though let me know if that's what you're looking for.

If you're looking for the best method, see http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm .

2010-07-18 09:10:19

'Best' was probably not the right word actually; there are others that perform better on some cases. But that's the most commonly used algorithm.

2010-07-18 09:12:00

Answer 2

+2 A:

Binary search won't help you in this case. You have to search linearly. The easiest solution would be to search linearly for the first character and, when found, check if the remaining word follows.

A more elaborate solution would be to use a KMP algorithm.

aioobe 2010-07-18 09:12:01

Yes, the precondition of a sorted array for binary searching completely destroys the information. The character sequence has to be extracted beforehand.

James P. 2010-07-18 09:26:42

I prefer [Boyer-Moore](http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm) myself. KMP is just... not intuitive.

quantumSoup 2010-07-18 09:27:02

Answer 3

+1 A:

You can convert the character arrays to strings. The result of the search in the string is the same as if you had searched the arrays.

String needle = new String(word);
String haystack = new String(para);
int i = haystack.indexOf(needle);

Result:

This can be much faster than a naive O(n*m) search because the string function indexOf is optimized.

If you want to do it without creating the temporary strings you can implement a string searching algorithm for byte arrays. You could for example choose the Boyer-Moore algorithm which has worst case O(n).

Mark Byers 2010-07-18 09:12:20

The naive search algorithm is actually O(n*m), but this looks a bit like homework so he might not be able to convert it to a String.

quantumSoup 2010-07-18 09:22:59

Just curious, what does the O(n^2) refer to? I never saw this at algorithmics.

James P. 2010-07-18 09:24:11

@Aircule: Thanks, sorry that was a mistake. @James P.: http://en.wikipedia.org/wiki/Big_O_notation

Mark Byers 2010-07-18 09:25:00

By the way I implemented the Rabin-Karp algorithm in java and its performance was horrendous, way worse than the naive algorithm. It's the hashing I guess. Try Booyer-Moore.

quantumSoup 2010-07-18 09:25:14

@Mark I actually did a very extensive project on string matching, and I tested several hash functions for RK, both functions that calculated hashes incrementally and just Java's .hashCode(). Regardless of the hash function used, RK performed considerably worse than the naive algorithm. We tested it with patterns up to 600 chars long and input files up to 35 MB in size.

quantumSoup 2010-07-18 09:42:11

@Aircule: You need to use a rolling hash. If you use hashCode then you haven't implemented RK. If you implement RK correctly it will be faster for large strings. PS: If you can post a link to your project source code and you paper that would be very interesting to see.

Mark Byers 2010-07-18 09:42:53

@Mark Byers We did. Hang on, I'll upload the paper.

quantumSoup 2010-07-18 09:48:06

@Aircule: Maybe you missed a test case. Did you try testing with the string 'aaaaa...millions of a.....aaaaa' and the pattern 'aaaa...aaaaab'?

Mark Byers 2010-07-18 09:51:25

@Aircule: Very interesting, thanks!!

Mark Byers 2010-07-18 09:59:16

Answer 4

+4 A:

A bit inefficient theoretically, but rather practical and simple:

int position = new String(paragraph).indexOf(new String(word));

If you want to understand how this works - check the static int indexOf(..) method of java.lang.String

Bozho 2010-07-18 09:12:31

If the `char[]` won't contain thousands of characters, I see no problem with this.

Bart Kiers 2010-07-18 09:14:26

yes, that's why I said 'theoretically'. In practice it will be sufficient.

Bozho 2010-07-18 09:15:02

I bet this will be orders of magnitude faster than anything he can implement. The only problem would be the String being too big for the heap.

quantumSoup 2010-07-18 09:29:07

@Aircule - I think it is a bit more complicated than that. For instance, creating a `String` from a `char[]` entails making a copy of the `char[]`. Also, there are fast search algorithms that are significantly better at searching large texts than the naive algorithm that `String.indexOf()` (probably) uses.

Stephen C 2010-07-18 09:32:56

@Stephen I highly doubt `String.indexOf()` uses the naive algorithm. I tested it.

quantumSoup 2010-07-18 09:43:46

@Aircule - Well I'm looking at the source code: String.java 1.205 dated 2009-02-26 Copyright Sun (from Java 1.6). And it **does** use a naive search algorithm for the `indexOf` methods!!

Stephen C 2010-07-18 13:01:14

@Aircule - And there's a reason for this. Clever algorithms like Boyer-Moore entail some non-trivial setup. For short strings ... the most common use-case for `String.indexOf(String)` ... the time taken to do the setup exceeds the time saved by the smart algorithm.

Stephen C 2010-07-18 13:41:13

Answer 5

A:

Quick answer and I suppose others will be more elaborate. Initially, I'd do something like this (pseudocode is better for thinking out algorithms):

boolean nonmatchingchar
integer i, j
for each i of word until endof word
    for each j of para until endof para
      if word i isnotequalto para i set nonmatchingchar true     
    end for
end for


if nonmatchingchar is true print "character sequence not found"

Edit: To make this more efficient in the case where you'd have several words to search for, you could constitute a two-dimensional array with words sorted according to their first letter. From there on, you could go through the second array letter by letter and test a subset of words according to that letter.

James P. 2010-07-18 09:18:57

ansaurus

tags:

views:

answers:

Java | compare char word in a char array

related questions