views:

255

answers:

4

Hi All,

I've got a javascript application that sends a large amount of numerical data down the wire. This data is then stored in a database. I am having size issues (too much bandwidth, database getting too big). I am now ready to sacrifice some performance for compression.

I was thinking of implementing a base 62 number.toString(62) and parseInt(compressed, 62). This would certainly reduce the size of the data but before I go ahead and do this I thought I would put it to the folks here as I know there must be some outside the box solution I have not considered.

The basic specs are: - Compress large number arrays into strings for JSONP transfer (So I think UTF is out) - Be relatively fast, look I'm not expecting same performance as I have now but I also don't want gzip compression either.

Any ideas would be greatly appreciated.

Thanks

Guido Tapia

+2  A: 

Well, as mentioned in the accepted answer to this question, the jsolait library has an implementation of LZW compression in JavaScript...

Josh
Looks great about 50% compression on my very very early tests. I will report back once I validate that its acceptable performance hit. Thanks heaps.
gatapia
Glad I could help!
Josh
Performance turned out to be acceptable. Great little script, who would have thought eh.
gatapia
Just found an issue with this. Pushing lzw compressed data down the wire encodes it. So what I get in the server is an encoded version of the compressed string which is actually larger than the uncompressed string. :( After all the work + tests, looks like I may have to revert.
gatapia
A: 

Options

  • Use a js library (see answer from Josh) but watch for script timeouts
  • Use some kind of activex or silverlight uploader that also does zip
  • Use a java plug-in
  • Use a flash based compressing uploader
James Westgate
A: 

Another way of doing this might be to encode to binary types such as signed/unsigned ints, and manually decode as at http://snippets.dzone.com/posts/show/685 which would require server side code to create the binary data.

You could then huffman compression or something similar like RLE (see http://rosettacode.org/wiki/Run-length_encoding#JavaScript for an implementation, though it may have some issues in IE without modifying) to compress the data further.

EDIT: Alternatively, you could convert the numbers themselves to a base (radix) in the unencoded URI character range (see http://en.wikipedia.org/wiki/Percent-encoding) which should work well if many of the numbers are larger than 2 digits. I converted the code at http://code.activestate.com/recipes/111286-numeric-base-converter-that-accepts-arbitrary-digi/ from python to do this.

It currently doesn't handle floats, but it could be done fairly easily:

function get_map(s) {
    d = {}
    for (var i=0; i<s.length; i++) {
        d[s.charAt(i)] = i}
    d.length = s.length
    d._s = s
    return d}

var separate_with = '~';
var encodable = get_map('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.'); // - is reserved for negatives obviously :-P
var base10 = get_map('0123456789')

// UNCOMMENT ME for length/speed testing in a wider base!
// You may wish to experiment with the ranges for a happy medium between bandwidth and DB space :-P
/*var encodable = ''
for (var i=1; i<128; i++) {
    encodable += String.fromCharCode(i)
}
encodable = get_map(encodable)*/

function baseconvert(number, fromdigits, todigits) {
    var number = String(number)

    if (number.charAt(0) == '-') {
        number = number.slice(1, number.length)
        neg=1}
    else {
        neg=0}

    // make an integer out of the number
    var x = 0
    for (var i=0; i<number.length; i++) {
        var digit = number.charAt(i)
        x = x*fromdigits.length + fromdigits[digit]
    }

    // create the result in base 'todigits.length'
    res = ""
    while (x>0) {
        remainder = x % todigits.length
        res = todigits._s.charAt(remainder) + res
        x = parseInt(x/todigits.length)
    }

    if (neg) res = "-"+res
    return res
}

function encodeNums(L) {
    var r = []
    for (var i=0; i<L.length; i++) {
         r.push(baseconvert(L[i], base10, encodable))
    }
    return r.join(separate_with)
}

function decodeNums(s) {
    var r = []
    var s = s.split(separate_with)
    for (var i=0; i<s.length; i++) {
         r.push(parseInt(baseconvert(s[i], encodable, base10)))
    }
    return r
}

var test = [5, 654645, 24324, 652124, 65, 65289543, 65278432, 643175874158, 652754327543]
alert(encodeNums(test))
alert(decodeNums(encodeNums(test)))
David Morrissey
This is what I proposed in my initial question (base 62 encoding). Your example does however raise the potential for a larger radix (128) however I doubt that that will JSONP transfer correctly, I think 62 is about as good as it will get. Thanks for the code tho, I'll compare to mine for performance.
gatapia
A: 

I'm now toying with the idea of encoding the length of the number into the number itself. I still have not perfected this algorithm but will post it once done. But roughly this is what I am currently trying to achieve:

Boundaries:

  • Loss of precision is allowed (+- 3)
  • Max number will be 3200
  • Min is 0 (so no negatives)
  • No decimal - floats

So now given my max allowed number I know that the length of the encoded digit in base 62 will have a max length of 2. So any encoded number is either 1 or 2 characters in length. Awesome. So now I'm going to make the number odd or even depending if its 1 or 2 characters (remember I can handle loss of precission). This removes the need for separators.

Now I'm seeing about 70%-80% compression with this, its very buggy at the moment but I'm excited about it, so the post to encourage discussion around this methodology.

gatapia
If binary was allowed, I'd use the first of the eight bits in each char for asking whether the number continues as at http://en.wikipedia.org/wiki/Variable-length_code. But reserving 31 chars for "continue" and 31 for "stop" would probably also work. Then again, you've probably already thought of that by the sound of it :-)
David Morrissey