views:

928

answers:

8

Hello,

A friend of mine was asked the following question a Yahoo interview:

Given a string of the form "abbccc" print "a1b2c3". Write a function that takes a string and return a string. Take care of all special cases.

How would you experts code it?

Thanks a lot

+4  A: 

Follow the following algo and implement it.

  1. Run a loop for all the letters in string.
  2. Store the first character in a temp char variable.
  3. For each change in character initialize a counter with 1 and print the count of previous character and then the new letter.
GJ
A: 

Something like this:

void so(char s[])
{
        int i,count;
        char cur,prev;

        i = count = prev = 0;
        while(cur=s[i++])
        {
                if(!prev)
                {
                        prev = cur;
                        count++;
                }
                else
                {
                        if(cur != prev)
                        {
                                printf("%c%d",prev,count);
                                prev = cur;
                                count = 1;
                        }
                        else
                                count++;
                }
        }

        if(count)
                printf("%c%d",prev,count);
        printf("\n");
}
codaddict
A: 

Damn, thought you said C#, not C. Here is my C# implementation for interest's sake.

    private string Question(string input)
    {
        var output = new StringBuilder();
        while (!string.IsNullOrEmpty(input))
        {
            var first = input[0];
            var count = 1;
            while (count < input.Length && input[count] == first)
            {
                count++;
            }

            if (count > input.Length)
            {
                input = null;
            }
            else
            {
                input = input.Substring(count);
            }
            output.AppendFormat("{0}{1}", first, count);
        }

        return output.ToString();
    }
Jaco Pretorius
That isn't even a function.
anon
I would convert it to C but I haven't done any C in almost 5 years.
Jaco Pretorius
Lol - yeah, it's code. You can just go put it in a function/method. I'm trying to save space to make it more readable.
Jaco Pretorius
While I find it laudable that you want to save space for better readability, I have to wonder why you put 9 empty lines into a function of 11 lines, then. Including the complete function harness, on the other hand, increases readability by showing what parameters are passed in.
Svante
There, I added the extra line. Please point out the NINE empty lines - I must be going mad cause I can only see 2!
Jaco Pretorius
+7  A: 

There's more than one way to do it, but I'd probably run over the input string twice: once to count how many bytes are required for the output, then allocate the output buffer and go again to actually generate the output.

Another possibility is to allocate up front twice the number of bytes in the input string (plus one), and write the output into that. This keeps the code simpler, but is potentially very wasteful of memory. Since the operation looks like a rudimentary compression (RLE), perhaps it's best that the first implementation doesn't have the output occupy double the memory of the input.

Another possibility is to take a single pass, and reallocate the output string as necessary, perhaps increasing the size exponentially to ensure O(N) overall performance. This is quite fiddly in C, so probably not the initial implementation of the function, especially in interview conditions. It's also not necessarily any faster than my first version.

However it's done, the obvious "special case" is an empty input string, because the obvious (to me) implementation will start by storing the first character, then enter a loop. It's also easy to write something where the output may be ambiguous: "1122" is the output for the input "122", but perhaps it is also the output for the input consisting of 122 1 characters. So you might want to limit run lengths to at most 9 characters (assuming base 10 representation) to prevent ambiguity. It depends what the function is for - conjuring a complete function specification from a single example input and output is not possible.

There's also more than one way to design the interface: the question says "returns a string", so presumably that's a NUL-terminated string in a buffer newly-allocated with malloc. In the long run, though, that's not always a great way to write all your string APIs. In a real project I would prefer to design a function that takes as input the string to process, together with a pointer to an output buffer and the length of that buffer. It returns either the number of bytes written, or if the output buffer isn't big enough it returns the number which would have been written. Implementing the stated function using this new function is easy:

char *stated_function(const char *in) {
    size_t sz = new_function(in, NULL, 0);
    char *buf = malloc(sz);
    if (buf) new_function(in, buf, sz);
    return buf;
}

I'm also confused what "print" means in the question - other answerers have taken it to mean "write to stdout", meaning that no allocation is necessary. Does the interviewer want a function that prints the encoded string and returns it? Prints and returns something else? Just returns a string, and is using "print" when they don't really mean it?

Steve Jessop
+1 for the detailed answer.
Jamie Keeling
I admit you'd never get that far in an actual interview, though...
Steve Jessop
The other alternative is to allocate the maximum size, build the string, then `realloc` once to trim it down to the actual size.
caf
Any time you are writing a string algorithm, force your user to tell you how long the string is. There are too many edge cases to deal with if you don't. The best way I've ever heard this described is to call a string function that searches for a null (eg strcpy, strlen...) is a clear indication to anyone reading your code that you are willing to walk forever to find nothing (searching for a null).Also it is bad practice to allocate memory in a function, force the caller to pre-allocate the buffer you return into.
Chris
@caf: realloc is not required ever to shrink allocations, and is often implemented such that it never does so. You could of course allocate a new buffer and copy yourself. @Chris: on allocating memory, sure I've argued that the interface isn't good, but I've also answered the question, which is how to implement that interface. As for the caller specifying the length of the output string, I think I've already completely addressed that. To get the length of the output string, you have to do all the RLE work. There's no way the caller should do that other than by calling this function.
Steve Jessop
Oh, and "any time you are writing a string algorithm, force your user to tell you how long the string is". If you mean output string then that's already in my answer. If you mean input string, then that sounds to me like a half-assed attempt to store strings as data+length instead of nul-terminated. That's a sensible goal, but a full-assed solution (some kind of string struct) would be vastly superior.
Steve Jessop
+13  A: 
if (0==strcmp(s, "abbccc")) 
  return "a1b2c3";
else
  tip_the_interviewer(50);

Taken care of.

Alex Brown
If your interviewer shares your contempt for inadequate specs, an excellent answer :-)
Steve Jessop
+1 since its funny :)
Johan
+1: behaves as from specs. hence it's correct. Edit: actually, I file a bug report. It says "print", not "return".
Stefano Borini
+1  A: 
int priya_homework(char *input_str, char *output_str, int out_len)
{
  char pc,c;
  int count=0,used=0;

  /* Check for NULL and empty inputs here and return*/

  *output_str='\0';

  pc=*input_str;
  do
  {
    c=*input_str++;
    if (c==pc)
    {
      pc=c;
      count++;
    }
    else
    {
      used=snprintf(output_str,out_len,"%c%d",pc,count);
      if (used>=out_len)
      {
        /* Output string too short */
        return -1;
      }
      output_str+=used;
      out_len-=used;
      pc=c;
      count=1;
    }
  } while (c!='\0' && (out_len>0));

  return 0;
}
Suresh Krishnan
+2  A: 

As others have pointed out, the spec is ambiguous. I think that's fine for an interview question: the point may well be to see what the job applicant does in an ambiguous situation.

Here's my take on the code. I've made some assumptions (since I can't very well ask the interviewer in this case):

  1. This is a simple form of run-length encoding.
  2. Output is of the form {character}{count}.
  3. To avoid ambiguity, the count is 1..9.
  4. Runs of the same character longer than 9 are split into multiple counts.
  5. No dynamic allocation is done. In C, it's usually better to let caller take care of that. We return true/false to indicate if there was enough space.

I hope the code is clear enough to stand on its own. I've included a test harness and some test cases.

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


static void append(char **output, size_t *max, int c)
{
    if (*max > 0) {
        **output = c;
        *output += 1;
        *max -= 1;
    }
}


static void encode(char **output, size_t *max, int c, int count)
{
    while (count > 9) {
        append(output, max, c);
        append(output, max, '0' + 9);
        count -= 9;
    }
    append(output, max, c);
    append(output, max, '0' + count);
}


static bool rle(const char *input, char *output, size_t max)
{
    char prev;
    int count;

    prev = '\0';
    count = 0;
    while (*input != '\0') {
        if (*input == prev) {
            count++;
        } else {
            if (count > 0)
                encode(&output, &max, prev, count);
            prev = *input;
            count = 1;
        }
        ++input;
    }

    if (count > 0)
        encode(&output, &max, prev, count);
    if (max == 0)
        return false;
    *output = '\0';
    return true;
}


int main(void)
{
    struct {
        const char *input;
        const char *facit;
    } tests[] = {
        { "", "" },
        { "a", "a1" },
        { "aa", "a2" },
        { "ab", "a1b1" },
        { "abaabbaaabbb", "a1b1a2b2a3b3" },
        { "abbccc", "a1b2c3" },
        { "1", "11" },
        { "12", "1121" },
        { "1111111111", "1911" },
        { "aaaaaaaaaa", "a9a1" },
    };
    bool errors;

    errors = false;
    for (int i = 0; i < sizeof(tests) / sizeof(tests[0]); ++i) {
        char buf[1024];
        bool ok;
        ok = rle(tests[i].input, buf, sizeof buf);
        if (!ok || strcmp(tests[i].facit, buf) != 0) {
            printf("FAIL: i=%d input=<%s> facit=<%s> buf=<%s>\n",
                   i, tests[i].input, tests[i].facit, buf);
            errors = true;
        }
    }

    if (errors)
        return EXIT_FAILURE;
    return 0;
}
Lars Wirzenius
+1  A: 

This smells like a homework question, but the code was just too much fun to write.

The key ideas:

  • A string is a (possibly empty) sequence of nonempty runs of identical characters.
  • Pointer first always points to the first in a run of identical characters.
  • After the inner while loop, pointer beyond points one past the end of a run of identical characters.
  • If the first character of a run is a zero, we've reached the end of the string. The empty string falls out as an instance of the more general problem.
  • The space required for a decimal numeral is always at most the length of a run, so the result needs at most double the memory. The code works fine with a run length of 53: valgrind reports no memory errors.
  • Pointer arithmetic is beautiful.

The code:

char *runcode(const char *s) {
  char *t = malloc(2 * strlen(s) + 1);  // eventual answer
  assert(t);
  char *w = t; // writes into t;
  const char *first, *beyond; // mark limits of a run in s
  for (first = s; *first; first = beyond) { // for each run do...
    beyond = first+1;
    while (*beyond == *first) beyond++;  // move to end of run
    *w++ = *first;                       // write char
    w += sprintf(w, "%d", beyond-first); // and length of run
  }
  *w = '\0';
  return t;
}

Things I like:

  • No auxiliary variable for the character whose run we're currently scanning.
  • No auxiliary variable for the count.
  • Reasonably sparing use of other local variables.
Norman Ramsey
+1 for the "Things You Like".
rocknroll