ansaurus

Question

Run length encoding of hexadecimal strings including newlines.

Answer 1

+2 A:

Firstly: are you certain that just compressing the text doesn't give much the same result as compressing the "converted to binary" form?

Assuming you want to go ahead with converting to binary, I can suggest two options:

At the start of each line, write a number stating how many bytes are in the line. Then when you decompress, you read and convert that many bytes, then write a newline. If you know that each line is always going to be less than 256 bytes long, you can just represent this as a single byte. Otherwise you might want a larger fixed size, or some variable size encoding (e.g. "while the top bit is set, this is still part of the number") - the latter gets hairy pretty quickly.
Alternatively, "escape" a newline by representing it as (say) 0xFF, 0x00. You'd then also need to escape a genuine 0xFF as (say) 0xFF 0xFF. When you read the data, if you read an 0xFF you'd then read the next byte to determine whether it represented a newline or a genuine 0xFF.

EDIT: I believe your original approach was fundamentally flawed. Whatever you get out of GZipStream is not text, and shouldn't be treated as if it were text using Encoding. However, you can turn it into ASCII text very easily, by calling Convert.ToBase64String. By the way, another trick you've missed is to call ToArray on the MemoryStream, which will give you the contents as a byte[] with no extra messing around.

Jon Skeet 2010-08-11 21:58:09

I think I've added complexity by trying to convert to binary instead of just converting to a byte array, however I have not had luck with the code I will append to my question.

JYelton 2010-08-11 22:09:48

I've updated the question - I think this worked with the binary conversion because it somehow ensured all compressed bytes were ascii-printable characters. If I simply convert the string to a byte array then compress it, the resulting bytes are outside of the printable characters and thus why I am unable to decode it.

JYelton 2010-08-11 22:22:08

@JYelton: In that case, there's a much simpler answer. Editing...

Jon Skeet 2010-08-11 22:25:46

@Jon: Getting there! Your suggestions are working, however there remains the last problem of converting the compressed base64 string back to its original form. Also I am not sure where I can eliminate some code by using `MemoryStream.ToArray()`

JYelton 2010-08-11 22:34:20

@JYelton: The base64 string isn't compressed... it's a base64 string *of the compressed data*. As for reversing it: use `Convert.FromBase64String` to get back to the compressed binary data. Uncompress that and you've got the original data. And `MemoryStream.ToArray` simplifies this code: `byte[] compressed = new byte[ms.Length]; ms.Read(compressed, 0, compressed.Length);`

Jon Skeet 2010-08-11 23:15:24

@Jon: The new implementation is complete and works great. It involved creating four small helper methods that convert to and from byte arrays and strings; and byte arrays and Base64 encoded strings. Thanks for the guidance as always.

JYelton 2010-08-11 23:55:55

Answer 2

A:

If the data you posted is representative of all the data, then you have a newline every 4 bytes, so if you need it when converting back, just stick one in every 4 bytes of data

BioBuckyBall 2010-08-11 22:00:17

Unfortunately it is greatly simplified, most lines will be about 80 bytes long, or 40 2-character hex strings. It is variable though.

JYelton 2010-08-11 22:08:53

@JYelton too bad, variable length obviously makes it harder :(

BioBuckyBall 2010-08-11 22:13:20

It would be cake if that were the case! I would just continue stripping newlines and re-create them on the other side if the lengths were static.

JYelton 2010-08-11 22:14:37

ansaurus

tags:

views:

answers:

Run length encoding of hexadecimal strings including newlines.

related questions