views:

178

answers:

3

I'm preparing to go to a computer science contest by completing problems from past contests. Most of them are pretty easy, but this one is bugging me...it seems simple but I'm just not being able to do it.

If you have a string of ones and zeros:

100111010001111100101010

What would be the code to take that as an input and then output this:

1:1 2:0 3:1 1:0 1:1 3:0 5:1 2:0 1:1 1:0 1:1 1:0

Where the digit to the left of each colon is the number of times the digit after the colon appears.

So, another example...inputting:

1100011

Would output:

2:1 3:0 2:1

According to the problem this is similar to the algorithm used to compress fax transmissions.

An answer in java would be best, but all I'm really looking for is pseudocode or even thoughts on how to do it.

Thanks in advance.

+2  A: 

Just as a thought: why would you bother with the number on the right? It will always alternate between 1 and 0 won't it, so just assume it starts with 1 and encode an initial 0 if the actual sequence starts with 0. In other words, you'd end up with:

1 2 3 1 1 3 5 2 1 1 1 1

But basically you need to keep track of "what am I currently looking at?" and "how many of them have I seen"? If it changes, write out what you've been looking at and the count, and then update "what I'm looking at" to the new value and the count to 1, then keep going. Don't forget to write out the last value at the end of the data as well.

(I haven't given pseudocode or Java as I think you'll learn more by taking small hints than being presented with working code. If you need further hints though, just say.)

Jon Skeet
Thanks for your post, Jon. I used your hint and got much further than I did before but I still had some bugs that got cleared up after a look at Ray's code. :D
Salty
+5  A: 

This is called Run-Length-Encoding (RLE) and is used in a number of things (such as the Windows Bitmap file-format) to provide very basic compression (especially if the original includes lots of repeated values (like a bitmap or fax) containing a long run of the same colour).

int[] array = { ........ }; // your values...
for ( int i=0; i < array.Length; i++ )
{
   int count = 1;
   int value = array[i];

   // Consume until different..
   while ( i+1 < array.Length && array[i] == array[i+1] )
   { 
       count++; 
       i++ 
   }

   Console.WriteLine("{0}:{1}", count, value);
}

// OR, as suggested by @jon  [done in my head, so could probably be improved a lot...]
int count = 0;
int oldValue = -1;
for ( int i=0; i<array.Length; i++ )
{
   int newValue = array[i];
   count = ( newValue != oldValue ) ? 1 : count+1;

   if ( i+1 >= array.Length || array[i+1] != newValue)
   {
      Console.WriteLine("{0}:{1}", count, newValue);
   }

   oldValue = newValue;
}
Ray Hayes
I wouldn't nest the loops - that's relatively hard to read, IMO. Just a single loop which knows the last-seen value and the current count is simpler. Just use an "if" to test for changes and output if so. (That also allows you to work on an IEnumerable more easily.)
Jon Skeet
Thank you, Ray. That's exactly what I was looking for. :)
Salty
A: 

all I'm really looking for is pseudocode or even thoughts on how to do it.

Here are some thoughts:

  • How to test whether a bit in a byte is one or zero: use a 'bitwise-AND' operation to mask off the other bits
  • How to test whether a different bit in the byte is one or zero:
    • Either, use a different bitmask
    • Or, shift or rotate the bits in the byte before you mask it

Use the above methods, to process the 8 bits in the first byte. Then repeat this to handle the next 8 bits, in the next byte.

Some pseudo-code may be something like the following:

main()
{
  Encoder encoder = new Encoder;
  foreach (byte b in inputStream)
  {
    encoder.input(b);
  }
  //tell the encoder that the stream is finished
  //that there will be no subsequent bytes
  ///and that the final bits should be flushed now
  encoder.finished();
}

class Encoder
{
  //member data
  bool m_b; //value of the most-recently-processed bit
  int m_n; //number of the most-recently-processed bits

  //public methods
  void finished()
  {
    //TODO format and write the {m_n}:{m_b} value to output
    //and reset m_n to zero
  }

  void input(byte b)
  {
    for int (i = 0; i < 8; ++i)
    {
      //get the bit value
      bool bit = getbit(b, i);
      //see whether we can append it
      bool canAppend =
        (bit == m_b) || //new bit is same as previous bit
        (0 == m_n); //no previous bit
      //flush previous bits if can't append
      if (!canAppend)
        finished();
      //append current bit
      m_b = bit;
      ++m_n;
    }
  }

  //private helper method
  bool getbit(byte b, int i)
  {
    //TODO return the bit value using a mask
  }
}

When you write your code, don't forget to also test it using various input data, including special cases (including for example all the bits in a byte having the same value).

ChrisW