views:

225

answers:

6

For example, given a str of "Stackoverflow is for every one" and remove of "aeiou", the function should transform str to "Stckvrflw s fr vry n".

I have one char array of string: str[] and one char array of chars to be removed:remove[]

My Solution: Loop str[] looking for each in character in remove[]. Shift str[] one place left every-time. I am sure better hack are possible.

A: 

I would loop str[] and store each character which doesn't exist in remove[] in a new array (let's say new_str[]). Then swap new_str[] with str[].

Gregi
Can't use extra array.
AJ
Why waste the extra space?
batbrat
A: 

If you can afford one more buffer, you can: Loop str[] looking for each in character in remove[], but instead of shift, copy to new array.

wishmesh
There is no need for a second buffer...the new string (and every-substring there of) is guaranteed to be no lnger than the old one, so you can use the front of the current buffer as the receiving buffer. Scary and easy to screw up, but O(n*m) time and O(1) extra space.
dmckee
+4  A: 

Shifting the entire string left one place will make this an O(n^2) algorithm effectively. You can do this in-place, in linear time:

void Remove (char * src, const char * match) {
   char * dest = src;
   for (;;) { 
      char ch = *src++; 
      if (!strchr (match, ch)) *dest++ = ch;  // Copy chars that don't match
      if (!ch) break;                         // Stop when we copy over a null  
   }
}

I'm assuming here that these are null terminated. If this is not the case, then you have to pass in the lengths as well and modify the algorithm appropriately. In particular, you will not be able to use strchr. Just for completeness, here's a version that works with char arrays (not null-terminated).

// Removes from str[] (of length strlen), all chars that are found
// in match[] (of length matchlen). Modifies str in place, and returns
// the updated (shortened) length of str. 
int Remove (char[] str, int srclen, char[] match, int matchlen) {
   int dst = 0, found;
   for (int src = 0; src < srclen; src++) { 
      char ch = str[src];  
      found = 0;           // Search if this char is found in match
      for (int i = 0; i < matchlen && !found; i++) 
         if (match[i] == ch) found = 1;
      if (!found) str[dst++] = ch;
   }
   return dst;
}

And finally, this is as close to O(n) as we are going to get, I guess. I'm assuming 8 bit chars here and building a look-up table so this should run in O(n) + O(m) where m is the length of the match string.

int Remove (char* str, int srclen, char* match, int matchlen) {
   bool found[256];
   for (int i = 0; i < 256; i++) found[i] = 0;
   for (int i = 0; i < matchlen; i++) found[match[i]] = 1; 

   int dst = 0;
   for (int src = 0; src < srclen; src++) { 
      char ch = str[src];  
      if (!found[ch]) str[dst++] = ch;
   }
   return dst;
}
Tarydon
Your version is O(n^2). Lookup `strchr` implementation. This is a common mistake though.
dirkgently
Great solution, but claim of linear time is dependent on the input - if strlen(match) == strlen(src), this is also O(n^2).
JBRWilkinson
@dirkgently, JBRWilkinson. Both of you are right. My algorithm is O(n^2) too. Possibly can't be done in less?
Tarydon
+1: using 2 pointers is a good way to avoid unnecessary shifts. @dirkgently, it's not O(n^2) but O(n*strlen(match)) or O(n*m).
Nick D
@Tarydon: see my comment to OP. HTH.
dirkgently
@Nick D: Yes of course -- as I pointed out in my other comment (to OP). My point was using `strchr` gives us a false impression of linear time complexity.
dirkgently
Isn't bool C++ ?
JBRWilkinson
@JBRWilkinson: No, added in C99.
dirkgently
@JBRWilkinson: Aargh. Not a C programmer myself, so the lines between C and C++ are a bit blurry for me. Can classic C do initialization of loop counters within the loop? I think that won't work either...
Tarydon
or just `char found[256] = {0};`
Nick D
@Tarydon:Localizing variable scoping in `for` loops -- Yes. C99 again I believe.
dirkgently
A: 

Using regular expressions to find and replace is a more compact solution. Use the GNU C library or find another which has support for regex search and replace. Of course, if the characters vary each time, you'll have to create the regular expression at runtime. If you stick with your current approach, split it up into functions.

I also like Tarydon's approach. Its less work!!

batbrat
Throwing as general and powerful a tool as regular expressions at this problem can no doubt be make to work, and you might be able to do it in a short and elegant piece of code, but it won't be efficient in the sense of fast and requiring little additional space.
dmckee
Agreed. My answer was based on the fact that homeWorkBoy just mentioned a better hack. He didn't specify whether he wanted a faster algorithm. In retrospect, providing a better algorithm is obviously a better answer.
batbrat
+2  A: 

I believe that this is one of those 'classic' puzzles.

In essence, you scan the 'match' string and make a lookup bit table of possible matches.

Then you walk through 'src' once, checking each char against your table.

O(n) time.

Algorithm something like this:

   static char bits[32];  // Not thread-safe, but avoids extra stack allocation
   char * dest = src;
   memset(bits, sizeof(bits), 0);  
   for (; *remove; remove++)
   {
      bitfields[*match >> 3] |= *remove & 7;
   }

   for (;*src; src++) 
   {
      if (!((bits[*src >> 3] & (*src & 7)) == (*src & 7)))
      { 
        *dest++ = *src;
      }
   }

I believe ischr(), isdigit(), isspace(), etc, work similarly to this technique, but their lookup tables are constant.

JBRWilkinson
There are less terse way to express this for the sake of learning what is being done, but I am glad to see that *someone* got it.
dmckee
+2  A: 

Here is my version, the if statement is eliminated from the copy-loop:

#include <stdio.h>
#include <string.h>

int main( void ){
  unsigned char str[]    = "Stackoverflow is for every one";
  unsigned char remove[] = "aeiou";

  unsigned char table[256] = { [ 0 ... 255 ] = 1 };
  for( unsigned char *r=remove; *r; r++ ){ table[*r]=0; }

  unsigned char *source=str, *dest=str;
  while( (*dest = *source++) ) dest += table[*dest];

  printf( "str: '%s'\n", str );
}
sambowry
+1: Very elegant
Tarydon
+1. The array initialization uses a GCC extension which you ought to mention.
dirkgently
Whoah - compiler-specific extensions? That's cheating!
JBRWilkinson
Only the while loop is real code, other lines are pseudo code.
sambowry