views:

161

answers:

2

Hello, I was wondering if there is a compression algorithm, in common use today, that contains a fixed point, i.e., an identity file.

To explain, let's call C : byte[] -> byte[] a function that represents the compression algorithm. I want to know if there exists (and what it is, if it is possible to determine in reasonable time) a file f such that

C(f) = f

That is, a file that, when compressed by a suitable, widely-known compression algorithm in common use nowadays, will produce itself as the result.

Do you know of such a phenomenon?

+4  A: 

Yes! This is a variant on the quine problem.

  1. An example that uses gzip: http://groups.google.com/group/comp.compression/browse%5Fthread/thread/c57c322e15c782aa/350d9fb166fdf11f

  2. An example that uses zip/unzip: http://www.steike.com/code/useless/zip-file-quine/

ire_and_curses
Cool! By the way, do you have a copy of the article from the second link? The guy said he lost it... I'd really appreciate to read it!Thank you for your answer!
Bruno Reis
@Bruno Reis:Sorry, I'm afraid I don't.
ire_and_curses
Neat. It's interesting to note that although decompressing the first result yields itself, compressing it doesn't. Brian's answer elaborates on this, of course.
Nick Johnson
+3  A: 

Warning: Rather pedantic answer.

There are many cases where D(f) = f (D being defined as decompression). However, compression is not as precisely defined. For most compression algorithms, different implementations of the compression algorithm will give different output files (of varying sizes). Consider two programs, 1 and 2. For full interoperability, it is necessary that D1(F) must equal D2(F) for all valid F. Similarly, it is necessary that D2(C2(f)) == D2(C1(F)) == D1(C1(F)) == D1(C2(F)), for all valid F. However it is totally unnecessary that C1(F) == C2(F), and this is in fact rarely the case.

So, you are unlikely to, if you actually compress such quines, to end up with the same file, unless you use the same program to do so that was used to generate it (which is unlikely, since such quines are usually hand-crafted, with C(F) never even being tested).

While it is possible (indeed, trivial!) to produce a program for which C(F) == F for some F, most people tend to instead point out as quines the more well-defined case where D(F) == F (since D1(F)==D2(F) for all valid, compatible decompression of the format of F, assuming F is valid).

So, there are likely cases where C(F) == F, but generally this is the wrong question to ask, and you should instead ask for cases where D(F) == F...which other people who answered the question have provided.

Brian
Brian, you are absolutely right! I should have asked the opposite question. Thanks for you answer!
Bruno Reis