views:

57

answers:

4

I have an array of point data, the values of points are represented as x co-ordinate and y co-ordinate.

These points could be in the range of 500 upto 2000 points or more.

The data represents a motion path which could range from the simple to very complex and can also have cusps in it.

Can I represent this data as one spline or a collection of splines or some other format with very tight compression.

I have tried representing them as a collection of beziers but at best I am getting a saving of 40 %. For instance if I have an array of 500 points , that gives me 500 x and 500 y values so I have 1000 data pieces. I around 100 quadratic beziers from this. each bezier is represented as controlx, controly, anchorx, anchory. which gives me 100 x 4 = 400 pcs of data. So input = 1000pcs , output = 400pcs.

I would like to further tighen this, any suggestions?

+1  A: 

If for example you use 32-bit integers for point coords and there is range limit,
like x: 0..1000, y:0..400, you can pack (x, y) into a single 32-bit variable.

That way you achieve another 50% compression.

Nick D
Nice Idea! I will have to verify this with my input data, as I don't really know what limits my point data would cross, also my data is signed that is sometimes the co-ordinates may also be negative, so I guess that would set a -16k to +16k range instead of the 32K limit.
Kevin Boyd
+2  A: 

By its nature, spline is an approximation. You can reduce the number of splines you use to reach a higher compression ratio.

You can also achieve lossless compression by using some kind of encoding scheme. I am just making this up as I am typing, using the range example in previous answer (1000 for x and 400 for y),

  1. Each point only needs 19 bits (10 for x, 9 for y). You can use 3 bytes to represent a coordinate.
  2. Use 2 byte to represent displacement up to +/- 63.
  3. Use 1 byte to represent short displacement up to +/- 7 for x, +/- 3 for y.

To decode the sequence properly, you would need some prefix to identify the type of encoding. Let's say we use 110 for full point, 10 for displacement and 0 for short displacement.

The bit layout will look like this,

Coordinates:        110xxxxxxxxxxxyyyyyyyyyy
Dislacement:        10xxxxxxxyyyyyyy
Short Displacement: 0xxxxyyy

Unless your sequence is totally random, you can easily achieve high compression ratio with this scheme.

Let's see how it works using a short example.

3 points: A(500, 400), B(550, 380), C(545, 381)

Let's say you were using 2 byte for each coordinate. It will take 16 bytes to encode this without compression.

To encode the sequence using the compression scheme,

A is first point so full coordinate will be used. 3 bytes. B's displacement from A is (50, -20) and can be encoded as displacement. 2 bytes. C's displacement from B is (-5, 1) and it fits the range of short displacement 1 byte.

So you save 10 bytes out of 16 bytes. Real compression ratio is totally depending on the data pattern. It works best on points forming a moving path. If the points are random, only 25% saving can be achieved.

ZZ Coder
I am finding it a bit difficult to understand this, could you please give an example to explain your view and how much compression could I get?
Kevin Boyd
Assuming you have high frequency Short Displacements,followed by Displacement and Coordinates. If there's no real difference in the frequency distribution, you could use a 2bit prefix.
Indeera
Too hard to go into details in this comment box. See the example I added to the answer.
ZZ Coder
Thanks for the explanation, will calculate the amount of byte savings that I will get using some sample data.
Kevin Boyd
+1  A: 

You could do a frequency analysis of the numbers you are trying to encode and use varying bit lengths to represent them, of course here I am vaguely describing Huffman coding

Indeera
Nice suggestion, could you give an example pertaining to the said application and how could I apply it in script.
Kevin Boyd
+1  A: 

Firstly, only keep enough decimal points in your data that you actually need. Removing these would reduce your accuracy, but its a calculated loss. To do that, try converting your number to a string, locating the dot's position, and cutting of those many characters from the end. That could process faster than math, IMO. Lastly you can convert it back to a number.

 150.234636746  ->  "150.234636746"  ->  "150.23"  ->  150.23


Secondly, try storing your data relative to the last number ("relative values"). Basically subtract the last number from this one. Then later to "decompress" it you can keep an accumulator variable and add them up.

 A    A    A             A    R   R
 150, 200, 250     ->    150, 50, 50
Jenko
Most importantly, remember to (1) Cut off decimals then, (2) Subtract to convert to relatives. Inverse this order and your data won't add up correctly and would therefore get a lot of incremental loss on the decompressing end!
Jenko