views:

40

answers:

4

Hi everyone , lately i've faced a problem which gets me so confused , the problem is : i want to compress a sequence so no information is lost , for example :

a,a,a,b --> a,b

a,b,a,a,c --> a,b,a,a,c (it can't be compressed to a,b,a,c because in this way we lose a,a)

Is there any algorithm to do such a thing ? what is name of this problem ? is it compression ? or anything else ? I would really appreciate any help Thanks in advance

A: 

Unless you have to code some solution yourself, you could use some ZIP compression library for the programming language you're using.

And yes, this is data compression.

mdrg
+1  A: 

RLE

Ignacio Vazquez-Abrams
+1  A: 

Yep, compression. A simple algorithm would be runlength encoding. There also information theory, which is the basis for compression algorithms.

Information theory: More common inputs should be shorter, thus making the sentence length shorter.

So, if you're encoding binary, where the sequence 0101 is very commmon (about 25% of the input), then a simple compression would be:

0101 = 0
anything else = 1[original 4 bits]

So the input: 0101 1100 0101 0101 1010 0101 1111 0101
Would be compressed to: 0 11100 0 0 11010 0 11111 0

Thats a compression of 32 bits -> 20 bits.

An important lesson: The compression algorithm choice is entirely dependent on the input. The wrong algorithm and you will likely make the data longer.

Alexander Rafferty
As i found runlength encoding algorithm as in this example(wikipedia): WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW -> 12W1B12W3B24W1B14W only compress subsequent items , the problem is i want the result to be something like this : WBW
Aryan_wk
WBW? How will you then take that, and get the original information?
Alexander Rafferty
A: 

Every algorithm which is able to transform data in such a way that is takes up less memory is called compression. May it be lossless or lossy.

e.g. (compressed form for "example given" :-) )

The following is imho the simples form, called run length encoding, short RLE:

a,a,a,b,c -> 3a,1b,1c

As you can see all subsequent characters which are identical are compressed into one.

You can also search for subsequent patterns which is much more difficult:

a,b,a,b,a,c --> 2(a,b),1(a),1(c)

There are lots of literature and web sources about compression algorithms, you should use them to get a deeper view.

codymanix
Thanks for reply , i've searched a lot but didn't find anything really useful resolving the problem, are you sure there's any solution existed for this special problem ?
Aryan_wk
in your first example you "compress" a 5 character list into a 6 character list, this isn't compression, that is encoding, and encoding to an expansion at that!
fuzzy lollipop
It shows that not every compression algorithm works best with every input..
codymanix
Every compression algorithm is also an encoding algorithm.
codymanix
Which one is 5 to 6 ? a,a,a,b --> a,b | a,b,a,a,c --> a,b,a,a,c ? it's compression , and nothing i could find searching the web
Aryan_wk
it's better to edit the example to this one : a,a,a,b -> a,a,b
Aryan_wk