ansaurus

Question

Answer 1

+5 A:

I'm not a C developer so I can't nor will comment on the correctness of the code but with regards to efficiency, see:

http://en.wikipedia.org/wiki/String_searching_algorithm

I believe you have the naive searching version. Look at the Knuth-Morris-Pratt algorithm. You can do a little work on the string b before you are searching in a. And then you can do it in O(|a|+|b|). And |b| is larger than |a| then b can't be in a so it becomes O(|a|).

The essence is that if a is:

abcabe

And b is:

aba

Then you know that if the third char fails then a search will also fail if you shift b one char or two chars. Therefore you don't have to check every possible substring:

a[1 .. 3] == b
a[2 .. 4] == b
...

which is O(|a|*|b|) chars but only a subset which is equal to O(|a|)

lasseespeholt 2010-08-16 21:05:13

Answer 2

A:

If a is not properly null-terminated, the function will die horribly.
If b is not properly null-terminated, the function will probably die horribly.
The indentation is strange.

Thom Smith 2010-08-16 21:06:48

All bets are off if the string isn't null terminated, no?

Noel M 2010-08-16 21:09:42

If a thing is not `\0`-terminated, it is not a string (see ISO C99, 7.1.1p1).

Roland Illig 2010-08-16 21:26:00

Generally speaking, if a function will cause the program to die horribly on invalid input, it should check its inputs. That's why there's a whole set of string functions that take size parameters. Bugs happen. At the very least, a bit of error-checking code will let you die *gracefully and predictably* on bad inputs.

Thom Smith 2010-08-16 22:25:35

What if the size passed is wrong? (This is in the same class of error as the input not being nul-terminated).

caf 2010-08-16 23:02:47

How can a c-string not be null-terminated? If you try to traverse it, sooner or later, somewhere in memory, you will find a null byte. Unless, I suppose, there is no null byte anywhere in memory after the starting address. I think you're asking for an impossible validation. You might as well insist that a function validate that a pointer is actually to the right place and not to a random area in memory. How would you know?

Jay 2010-08-17 03:23:51

@Jay: In cases like this, it's common to pass in a size. @caf: The same thing as would happen if you passed the wrong size to `strncpy`. I know that C isn't a safe language. It's always possible to hang yourself. This is precisely why it's a good idea to double-check things. You can't catch all of the bugs all of the time, but you can catch most of the bugs most of the time.

Thom Smith 2010-08-17 04:26:20

Answer 3

A:

This is going to do the job but I there are better ways to do this. Check this article: http://en.wikipedia.org/wiki/String_searching_algorithm

Klark 2010-08-16 21:07:21

Answer 4

+2 A:

yeah...

t can't be assigned b as its destroying const.
it doesn't match the last char in "b" properly.

Keith Nicholas 2010-08-16 21:12:01

Answer 5

+8 A:

If a points to "cat" and b points to "ab", func will return a pointer to "at" (the wrong value) instead of 0 (the intended value) because the pointer t is incremented even though the comparison (*s++ == *t++) fails.

For completeness' sake and in order to answer the second question, I'd offer one solution (surely among other viable ones): Have the result of the comparison be assigned to another variable, e.g. while( ( flag = ( *s++ == *t++ ) ) && *s && *t ); and then if( flag && *t == 0 ).

William 2010-08-16 21:14:17

+1 for the implicit suggestion that one might step through one's code with small examples before posting here.

Jens Gustedt 2010-08-16 21:19:12

Answer 6

A:

Very picky point, in addition to those raised by others:

If a and b are both 0-length, then this routine returns NULL. If it's supposed to be following the specification of strstr, then it must return a in that case. Which makes sense, since the empty string b is indeed a substring of the empty string a.

Steve Jessop 2010-08-16 21:30:34

Answer 7

A:

I think the line:

while( (*s++ == *t++) && *s && *t );

is undefined because you are accessing the variables after the post-increment they might be before the increment or after the increment.

Unless they changed it, side effects of expressions are undefined by the standard as to when they take effect. The only thing guaranteed is *s++ will access s first and then increment for the next statement. What is undefined is whether the && s and && t see the value before or after the increment...

Cervo 2010-08-16 21:34:41

David Thornley 2010-08-16 21:40:14

Answer 8

+2 A:

Well, it does have the slight problem that it doesn't actually work.

Try running with a="xyz" and b="xw". When you hit the while loop the first time, x=x, you increment both pointers, and loop around again. Then y!=w, so you exit the loop. But you've already incremented the pointers, so t==0, and you report a hit.

In general, you report a hit regardless of whether the last character matches.

If b is a 1-character string, the last character is the only character, so a 1-character string matches anything.

I'd recommend against trying to do the loop with a single statement with side effects. As this example illustrates, this is tricky. Even if you get it right, it's very cryptic for people trying to read your code.

Jay 2010-08-16 21:35:18

Exactly! The provided code is for StartsWith not Substring

Nate Zaugg 2010-08-16 21:55:11

Answer 9

A:

Why do you not use a function for your work? Do you know strstr()?

const char* mystrstr(const char* a,const char* b)
{
  size_t blen=strlen(b);
  while( *a )
  {
    if( !strncmp(a,b,blen) )
      return a;
    ++a;
  }
  return 0;       
}

2010-08-17 14:33:23

Answer 10

A:

*t = b; //killing the const-ness of b....

Also to clarity to code you can do while(*a!= '\0') instead of while(*a) Also the second while statement : while( (*s++ == *t++) && *s && *t ); will fail....Try to take int flag = (*s++ = *t++) ; and do bit of simplification

2010-09-08 12:23:09

ansaurus

tags:

views:

answers:

Substring search interview question

related questions