ansaurus

Question

compact representation and delivery of point data

Answer 1

+1 A:

If for example you use 32-bit integers for point coords and there is range limit,
like x: 0..1000, y:0..400, you can pack (x, y) into a single 32-bit variable.

That way you achieve another 50% compression.

Nick D 2009-08-11 15:20:39

Nice Idea! I will have to verify this with my input data, as I don't really know what limits my point data would cross, also my data is signed that is sometimes the co-ordinates may also be negative, so I guess that would set a -16k to +16k range instead of the 32K limit.

Kevin Boyd 2009-08-11 15:27:39

Answer 2

+2 A:

By its nature, spline is an approximation. You can reduce the number of splines you use to reach a higher compression ratio.

You can also achieve lossless compression by using some kind of encoding scheme. I am just making this up as I am typing, using the range example in previous answer (1000 for x and 400 for y),

Each point only needs 19 bits (10 for x, 9 for y). You can use 3 bytes to represent a coordinate.
Use 2 byte to represent displacement up to +/- 63.
Use 1 byte to represent short displacement up to +/- 7 for x, +/- 3 for y.

To decode the sequence properly, you would need some prefix to identify the type of encoding. Let's say we use 110 for full point, 10 for displacement and 0 for short displacement.

The bit layout will look like this,

Coordinates:        110xxxxxxxxxxxyyyyyyyyyy
Dislacement:        10xxxxxxxyyyyyyy
Short Displacement: 0xxxxyyy

Unless your sequence is totally random, you can easily achieve high compression ratio with this scheme.

Let's see how it works using a short example.

3 points: A(500, 400), B(550, 380), C(545, 381)

Let's say you were using 2 byte for each coordinate. It will take 16 bytes to encode this without compression.

To encode the sequence using the compression scheme,

A is first point so full coordinate will be used. 3 bytes. B's displacement from A is (50, -20) and can be encoded as displacement. 2 bytes. C's displacement from B is (-5, 1) and it fits the range of short displacement 1 byte.

So you save 10 bytes out of 16 bytes. Real compression ratio is totally depending on the data pattern. It works best on points forming a moving path. If the points are random, only 25% saving can be achieved.

ZZ Coder 2009-08-11 15:44:03

I am finding it a bit difficult to understand this, could you please give an example to explain your view and how much compression could I get?

Kevin Boyd 2009-08-11 15:47:52

Assuming you have high frequency Short Displacements,followed by Displacement and Coordinates. If there's no real difference in the frequency distribution, you could use a 2bit prefix.

Indeera 2009-08-11 16:10:23

Too hard to go into details in this comment box. See the example I added to the answer.

ZZ Coder 2009-08-11 16:17:01

Thanks for the explanation, will calculate the amount of byte savings that I will get using some sample data.

Kevin Boyd 2009-08-11 17:13:37

Answer 3

+1 A:

You could do a frequency analysis of the numbers you are trying to encode and use varying bit lengths to represent them, of course here I am vaguely describing Huffman coding

Indeera 2009-08-11 15:58:19

Nice suggestion, could you give an example pertaining to the said application and how could I apply it in script.

Kevin Boyd 2009-08-11 16:16:22

Answer 4

+1 A:

Firstly, only keep enough decimal points in your data that you actually need. Removing these would reduce your accuracy, but its a calculated loss. To do that, try converting your number to a string, locating the dot's position, and cutting of those many characters from the end. That could process faster than math, IMO. Lastly you can convert it back to a number.

 150.234636746  ->  "150.234636746"  ->  "150.23"  ->  150.23

Secondly, try storing your data relative to the last number ("relative values"). Basically subtract the last number from this one. Then later to "decompress" it you can keep an accumulator variable and add them up.

 A    A    A             A    R   R
 150, 200, 250     ->    150, 50, 50

Jenko 2009-08-12 15:14:15

Most importantly, remember to (1) Cut off decimals then, (2) Subtract to convert to relatives. Inverse this order and your data won't add up correctly and would therefore get a lot of incremental loss on the decompressing end!

Jenko 2009-08-12 15:26:26

ansaurus

tags:

views:

answers:

compact representation and delivery of point data

related questions