views:

556

answers:

3

I want to create a mechanism (in C#) where text from a QueryString is displayed on a website.

For example, in C# I might literally do;

public void Page_Load(blah)
{
      litSomething.text = Reques.QueryString["msg"];
}

Assume that the message is written in English (allowing UTF8 would be nice), and is no longer than say 1000 characters. I want to compress this text down as much as possible and still be able to place it in a QueryString.

We can predefine as many dictionary terms as we like (well with-in reason?). The server side code will encode and decode the messages.

(Obviously I'll be adding in all the usual XSS protection, HttpUtitlity.HtmlEncode etc type stuff. Also pointers to free dictionary sources would be good!)

Any tips, adivce, source code? This isn't my homework before you ask!

Update
Thanks for the suggestions. I want to make this a GET, so people IM/email URLs. Im thinking along the lines of bit.ly which would also be a cheat in itself. Wanted this to be a generic "short text compression" question though.

+8  A: 

Well, the immediate problems are:

  • The result of compression is basically going to be binary, so you'll need to base64-encode it, which will make it 1/3 bigger again. (You should use a websafe base64 encoding too.)
  • No compression algorithm will always reduce the size of the text

This means that if you can't cope with (say) ~1300 characters in the query string, there's no guarantee that it will always work. (As Marc says, use the body of a POST instead if you possibly can... then you can probably ignore compression in the first place.)

If you're happy with those though, there's nothing particularly different about your situation than any other:

  • Encode the string into bytes
  • Compress
  • Convert the compressed bytes back into text using Convert.ToBase64String (and then replace web-nasty characters)

On the other side, apply the same transformation in reverse.

Given that the compression API is stream-based, you could use StreamWriter to avoid explicitly converting from text to binary first.

Jon Skeet
A: 

Depends where the messages come from. If they're all yours, then you've got a static dictionary and your query string need only be a couple of characters long.

I guess the message could be anything and would be user-generated, in which case a dynamically-learning method would be sweetest: keep a track of what users put in there and adjust your compression dictionary as you go along. Use some uncommon but URL-safe character as an escape character to show there's a dictionary key coming up.

You could seed it by grabbing some word list off the internet. A quick google should find you the most common 100 or 1000 English words.

teedyay
+2  A: 

You can encode the string as UTF-8 so that you get a byte array, that you can compress. The result is also a byte array, so you can use Base-64 encoding to get it as a string:

private static string Compress(string data) {
   using (MemoryStream ms = new MemoryStream()) {
      using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true)) {
         zip.Write(Encoding.UTF8.GetBytes(data), 0, data.Length);
      }
      return Convert.ToBase64String(ms.ToArray());
   }
}

Decompressing is just the other way around:

private static string Decompress(string data) {
   using (MemoryStream ms = new MemoryStream(Convert.FromBase64String(data))) {
      using (GZipStream zip = new GZipStream(ms, CompressionMode.Decompress, true)) {
         using (BinaryReader reader = new BinaryReader(zip)) {
            return Encoding.UTF8.GetString(reader.ReadBytes(10000));
         }
      }
   }
}
Guffa