tags:

views:

186

answers:

7

The Problem is simple Find "ABC" in "ABCDSGDABCSAGAABCCCCAAABAABC" without using String.split("ABC")

Here is the solution I propose, I'm looking for any solutions that might be better than this one.

public static void main(String[] args) {
 String haystack = "ABCDSGDABCSAGAABCCCCAAABAABC";
 String needle = "ABC";
 char [] needl = needle.toCharArray();
 int needleLen = needle.length();
 int found=0;
 char hay[] = haystack.toCharArray();
 int index =0;
 int chMatched =0;

 for (int i=0; i<hay.length; i++){

  if (index >= needleLen || chMatched==0)
   index=0;
  System.out.print("\nchar-->"+hay[i] + ", with->"+needl[index]);

  if(hay[i] == needl[index]){
   chMatched++;
   System.out.println(", matched");
  }else {
   chMatched=0;
   index=0;
   if(hay[i] == needl[index]){
    chMatched++;
    System.out.print("\nchar->"+hay[i] + ", with->"+needl[index]);
    System.out.print(", matched");
   }else
   continue;
  }

  if(chMatched == needleLen){
   found++;
   System.out.println("found. Total ->"+found);
  }
  index++;
 } 
 System.out.println("Result Found-->"+found);
 }

It took me a while creating this one. Can someone suggest a better solution (if any) P.S. Drop the sysouts if they look messy to you.

+3  A: 

How about:

boolean found = haystack.indexOf("ABC") >= 0;

**Edit - The question asks for number of occurences, so here's a modified version of the above:

public static void main(String[] args)
{
    String needle = "ABC";
    String haystack = "ABCDSGDABCSAGAABCCCCAAABAABC";

    int numberOfOccurences = 0;
    int index = haystack.indexOf(needle);
    while (index != -1)
    {
        numberOfOccurences++;
        haystack = haystack.substring(index+needle.length());
        index = haystack.indexOf(needle);
    }

    System.out.println("" + numberOfOccurences);
}
Chris Knight
Should be done iteratively, each time starting at 1 beyond the last found pos (or "ABC".length() beyond index) to count the amount of occurrences (which is the question).
extraneon
nah, that only says yes or no. I asked for count. and yes don't use String.Split("ABC")
Taranfx
OK, that wasn't particularly clear from your question which simply asked to find the string "ABC" in a longer string.
Chris Knight
A: 

What about considering regex, Pattern and Matcher classes? Check this tutorial

Dan
+2  A: 

If you're looking for an algorithm, google for "Boyer-Moore". You can do this in sub-linear time.

edit to clarify and hopefully make all the purists happy: the time bound on Boyer-Moore is, formally speaking, linear. However the effective performance is often such that you do many fewer comparisons than you would with a simpler approach, and in particular you can often skip through the "haystack" string without having to check each character.

Pointy
Boyer-Moore is O(N), i.e. linear.
Håvard S
Well when you do it right, you can skip through the text (the "haystack", in terms of this question) such that you don't have to look at every character. It depends on the search string, of course; if the search string is one character long, then it's always just linear. And yes, formally the worst-case performance is linear (I think it's 3n right?)
Pointy
Nice article on wikipedia about this!
extraneon
@Pointy Of course, but that does not change the fact that it is linear. Complexity analysis does not care about constants, so it's O(N). There is no general guarantee that Boyer-Moore will be faster than a naive approach, it all depends on the string being searched.
Håvard S
@Havard complexity analysis may not care about constants, but people who need the performance do. Even if you don't get a guarantee it's faster you at least get a chance.
extraneon
@Pointy: even with skipping, it is still O(N) and lineair time, the constant is only smaller than with a more naive search method.
Ritsaert Hornstra
Fine. However, if you're ever working for **me**, and you pick an algorithm with a bigger constant performance factor than another, well, you won't be working for me for much longer.
Pointy
@extraneon The question asks for a generally "fast" solution, and it is important to understand that there is no such thing. You need to analyze your data; applying Boyer-Moore because it "may be faster" is not a smart thing to do.
Håvard S
The naive method is theta(N), where as Boyer-Moore is O(N).
NickLarsen
@Pointy Although I agree with you when performance is critical I could just as easily pick a suboptimal but easier/trivial to understand solution in other cases where development time and maintenance complexity are more limited. So I'm afraid I don't get to work at your place then :)
extraneon
@extraneon naah I'm much nicer than that :-) This whole discussion is pretty silly given that the original question is probably from an "Intro to Algorithms" class, and all these fine points of performance analysis are waaay more sophisticated.
Pointy
+1  A: 

You say your challenge is to find ABC within a string. If all you need is to know if ABC exists within the string, a simple indexOf() test will suffice.

If you need to know the number of occurrences, as your posted code tries to find, a simple approach would be to use a regex:

public static int countOccurrences(string haystack, string regexToFind) {
   Pattern p = Pattern.compile(regexToFind);
   Matcher m = p.matcher(haystack); // get a matcher object
   int count = 0;
   while(m.find()) {
       count++;
   }
   return count;
}
Håvard S
Thanks Havard, RegEx is a good option but I'm only interested in learning the "Best algorithm" that consumes least CPU time
Taranfx
Like I've stated in two comments, that all depends on the string you're searching in. There is no general "best algorithm" answer, sadly. Prefer the simple solution, premature optimization is the root of all evil. (Knuth)
Håvard S
A: 

Asked by others, better in what sense? A regexp based solution will be the most concise and readable (:-) ). Boyer-Moore (http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm) will be the most efficient in terms of time (O(N)).

David Soroko
Again, there is no guarantee that Boyer-Moore will be more efficient than a naive search (it depends on the string being searched).
Håvard S
Performance. Memory is not an issue.
Taranfx
Also consider Horspool's variation which is simpler to implement: http://en.wikipedia.org/wiki/Boyer-Moore-Horspool_algorithm
binil
A: 

If you don't mind implementing a new datastructure as replacement for strings, have a look at Tries: http://c2.com/cgi/wiki?StringTrie or http://en.wikipedia.org/wiki/Trie

If you don't look for a regular expression but an exact match they should provide the fastest solution (proportional to length of search string).

Searles