If someone could suggest me an alternative compression algorithm I would be equally happy.
There is always good old deflate, a much more common member of the LZ compression family. JavaScript implementation. How to handle raw deflate content with Python's zlib module.
This a lot of overhead in relatively slow client-side code to be compressing submission data, and it's not trivial to submit the raw bytes you will obtain from it.
do they Gzip GET parameters within a request?
GET form submissions in the query string must by nature be fairly short, or you will overrun browser or server URL length limits. There is no point compressing anything so small. If you have a lot of data, it needs to go in a POST form.
Even in a POST form, the default enctype
is application/x-www-form-urlencoded
, which means a majority of bytes are going to get encoded as %nn
sequences. This will balloon your form submission, probably beyond the original uncompressed size. To submit raw bytes you would have to use a enctype="multipart/form-data"
form.
Even then, you're going to have encoding problems. JS strings are Unicode not bytes, and will get encoded using the encoding of the page containing the form. That should normally be UTF-8, but then you can't actually generate an arbitrary sequence of bytes for upload by encoding to it, since many byte sequences are not valid in UTF-8. You could have bytes-in-unicode by encoding each byte as a code unit to UTF-8, but that would bloat your compressed bytes by 50% (since half the code units, those over 0x80
, would encode to two UTF-8 bytes).
In theory, if you didn't mind losing proper internationalisation support, you could serve the page as ISO-8859-1 and use the escape/encodeURIComponent
idiom to convert between UTF-8 and ISO-8859-1 for output. But that won't work because browsers lie and actually use Windows code page 1252 for encoding/decoding content marked as ISO-8859-1. You could use another encoding that mapped every byte to a character, but that'd be more manual encoding overhead and would further limit characters you could use in the page.
You could avoid encoding problems by using something like base64, but then, again, you've got more manual encoding performance overhead and a 33% bloat.
In summary, all approaches are bad; I don't think you're going to get much useful out of this.