This is a theoretical question, so expect that many details here are not computable in practice or even in theory.
Let's say I have a string s
that I want to compress. The result should be a self-extracting binary (can be x86 asm but can also be some other hypothetical Turing-complete low level language) which outputs s
.
Now, we can easily iterate through all possible such binaries/programs, ordered by size. Let B_s
be the sub-list of these binaries who output s
(of course B_s
is uncomputable).
As every set of positive integers must have a minimum, there must be a smallest program b_min_s
in B_s
From s
, I can also construct a canonical program b_cano_s
which just outputs s
in a trivial way. I.e. the size of b_cano_s
will be in O(#s)
-- if we think of ELF with data segments, we will even have #b_cano_s ~ #s
.
Is there a set A
of possible operations on the binaries which:
1 . Will preserve the output.
2a . Given b_cano_s
, we can arrive somehow by operations from A
at b_min_s
.
(2b . Given b_cano_s
, we can arrive at all programs in B_s
.)
for all possible strings s
.
The conditions 1+2a are weaker than the conditions 1+2b. Maybe, if there is such a set A
, we will automatically have both, though. (Is that so?)
Does such a set A
exists? I was thinking about some obvious operations, like searching for some repeated strings and shorten this. Or some of the other common compression methods. However, that probably is not enough to arrive at all programs B_s
and my intention says also not necessarily at b_min_s
for the same reason.
If it exists, can we express it, i.e. is it computable?