views:

549

answers:

7

I came across a interview question that asked to remove the repeated char from a given string, in-place. So if the input was "hi there" the output expected was "hi ter". It was also told to consider only alphabetic repititions and all the alphabets were lower case. I came up with the following program. I have comments to make my logic clear. But the program does not work as expectd for some inputs. If the input is "hii" it works, but if its "hi there" it fails. Please help.

#include <stdio.h>
int main() 
{
    char str[] = "programming is really cool"; // original string.
    char hash[26] = {0}; // hash table.
    int i,j; // loop counter.
// iterate through the input string char by char.
for(i=0,j=0;str[i];)
{
    // if the char is not hashed.
    if(!hash[str[i] - 'a'])
    {
        // hash it.
        hash[str[i] - 'a'] = 1;
        // copy the char at index i to index j.
        str[j++] = str[i++];
    }
    else
    {
        // move to next char of the original string.
        // do not increment j, so that later we can over-write the repeated char.
        i++;
    }
}

// add a null char.
str[j] = 0;

// print it.
printf("%s\n",str); // "progamin s ely c" expected.

return 0;

}

+10  A: 

when str[i] is a non-alphabet, say a space and when you do:

hash[str[i] - 'a']

your program can blow.

ASCII value of space is 32 and that of a is 97 so you are effectively accessing array hash with a negative index.

To solve this you can ignore non-alphabets by doing :

if(! isalpha(str[i]) {
    str[j++] = str[i++]; // copy the char.
    continue;  // ignore rest of the loop.
}
codaddict
You might need to actually do an explicit check for lower case only as well?
Douglas Leeder
+2  A: 

This is going to break on any space characters (or anything else outside the range 'a'..'z') because you are accessing beyond the bounds of your hash array.

Paul R
+1  A: 

...

// iterate through the input string char by char.
for(i=0,j=0;str[i];)
{
  if (str[i] == ' ')
  {
    str[j++] = str[i++];
    continue;
  }

    // if the char is not hashed.
    if(!hash[str[i] - 'a'])
    {

...

tur1ng
If there is a non-alphabet other than space, say like a '?', this will fail again.
gameover
@gameover: Good point
tur1ng
+2  A: 

This is code golf, right?

d(s){char*i=s,*o=s;for(;*i;++i)!memchr(s,*i,o-s)?*o++=*i:0;*o=0;}
finnw
You should ignore spaces.
KennyTM
You should post that to IOCCC... ;)
tommieb75
+1  A: 
#include <stdio.h>
#include <string.h>

int hash[26] = {0};

static int in_valid_range (char c);
static int get_hash_code (char c);

static char * 
remove_repeated_char (char *s)
{
  size_t len = strlen (s);
  size_t i, j = 0;
  for (i = 0; i < len; ++i)
    {
      if (in_valid_range (s[i]))
    {
      int h = get_hash_code (s[i]);
      if (!hash[h])
        {
          s[j++] = s[i];
          hash[h] = 1;
        }
    }
      else
    {
      s[j++] = s[i];
    }
    }
  s[j] = 0;
  return s;
}

int
main (int argc, char **argv)
{
  printf ("%s\n", remove_repeated_char (argv[1]));
  return 0;
}

static int 
in_valid_range (char c)
{
  return (c >= 'a' && c <= 'z');
}

static int 
get_hash_code (char c)
{
  return (int) (c - 'a');
}
Vijay Mathew
Two things: in_valid_range() is just islower(), so can be removed for brevity; get_hash_code() (and all treatments of hash codes) should be unsigned, since 'char' might not be, thus (int) (c - 'a') can be negative.
unwind
+1  A: 
char *s;
int i = 0;

for (i = 0; s[i]; i++)
{
    int j;
    int gap = 0;
    for (j = i + 1; s[j]; j++)
    {
        if (gap > 0)
            s[j] = s[j + gap];
        if (!s[j])
            break;
        while (s[i] == s[j])
        {
            s[j] = s[j + gap + 1];
            gap++;
        }
    }
}
Phil Wallach
+1  A: 
void striprepeatedchars(char *str)
{
    int seen[UCHAR_MAX + 1];
    char *c, *n;

    memset(seen, 0, sizeof(seen));

    c = n = str;
    while (*n != '\0') {
        if (!isalpha(*n) || !seen[(unsigned char) *n]) {
            *c = *n;
            seen[(unsigned char) *n]++;
            c++;
        }
        n++;
    }
    *c = '\0';
}
jim