views:

203

answers:

3

I'm trying to compress any given string to a shorter version, copy paste-able compressed string that doesn't contain any line breaks.

I tried gzcompress, but then copy/pasting the result into a different php script and trying to gzuncompress throws "Warning: gzuncompress(): data error"

Is there any native php function that compresses a string, and the result is a string without any line breaks?

Thanks.

+5  A: 

You could try base64_encode / base64_decode. If you're compressing to binary for cut and paste, I would suggest you base64 encode it first.

Dan Breen
I should've mentioned that my goal is to get a shorter length version of the original text.
Andrei Serdeliuc
You can still use the compression algorithm, but if you immediately base64 encode the resulting binary, it will cater well for cut and paste.
Dan Breen
I tried:echo strlen($string); echo strlen(base64_encode(gzcompress($string)));Result being:185,188
Andrei Serdeliuc
I guess w/ strings that small, compression isn't as useful, and the base64 does make it longer. With larger data, it would make a more noticeable difference.
Dan Breen
Best answer. base64_encode does make a difference with longer strings.
Andrei Serdeliuc
A: 

you can escape your line-breaks after compressing: run gzcompress() on your string, replace line-breaks with a known 2 characters pair in the compressed result. to uncompress, replace the known 2 characters pair by line-breaks, then run gzuncompress()...

actually, you will need to perform 2 replacements. since i can't express this in english (not my native tongue), here is an example: use '+n' to escape line-breaks. you will first need to escape every '+' which are standing alone, since if it is followed by a 'n' it will be accidentally replaced by a line-break when uncompressing; let's chose '++' for escaping '+. then replace line-breaks by '+n'. when uncompressing, replace every '+n' pair by a line-break, then every '++' pair by '+'. that's it !

Adrien Plisson
echo str_replace(PHP_EOL, '', gzcompress($string)); still returns a multi-line result, so I'm assuming it's not going to work as the invisible characters that make up the new line are not actually line breaks?
Andrei Serdeliuc
wouldn't they be single new-line or line-feed ('\r' and '\n') ? anyway, i seem to remember that a zip output is binary, so i don't see how you are going to make up with all the non-printable characters...
Adrien Plisson
+1  A: 

It's impossible to design a general compression algorithm which always produces output shorter than the input. So, if you always want shorted output than input, you have to start restricting what your algorithm can do. You need to think about which characters are acceptable in the input (long) string, and which characters are acceptable in your output (short) string. Once you have a good idea of these, you can start working out what your options are.

Tim