views:

145

answers:

4

Hi, I'm looking for a algorithim that can compute an approximation of the Kolmogorov complexity of given input string. So if K is the Kolmogorov complexity of a string S, and t represents time, then the function would behave something like this.. limit(t->inf)[K_approx(t,S)] = K.

+7  A: 

In theory, a program could converge on the Kolmogorov complexity of its input string as the running time approaches infinity. It could work by running every possible program in parallel that is the length of the input string or shorter. When a program of a given length is found, that length is identified as the minimum length known for now, is printed, and no more programs >= that length are tried. This algorithm will (most likely) run forever, printing shorter and shorter lengths, converging on the exact Kolmogorov complexity given infinite time.

Of course, running an exponential number of programs is highly intractible. A more efficient algorithm is to post a code golf on StackOverflow. A few drawbacks:

  • It can take a few days before good results are found.
  • It uses vast amounts of our most valuable computing resources, costing thousands of dollars in productivity loss.
  • Results are produced with less frequency over time as resources are diverted to other computations.
  • The algorithm terminates prematurely for many inputs, meaning it does not work in general.
Joey Adams
Or you will soon hit a single program that runs forever, and you can't decide whether to stop it or let it run a few more seconds (decades).
rwong
@rwong: Right, that's why you run them in parallel. For the many programs that do seem to run forever, they are allowed to keep running until a shorter solution is found (if ever).
Joey Adams
I suppose it would be resonable to add another parameter to the function that specifies the maximum length of the Turing machine..so we could have a function that has a property like this perhaps ??? limit(t->inf)[limit(T_max->inf)[K_approx(t,S,T_max)]] = K
Tony
Worms come to my mind ... (for those not familiar with code-golf: http://meta.stackoverflow.com/questions/20736/what-is-code-golf-on-stack-overflow)
rwong
@Tony: As in, a space limitation, or a limitation of program length? You only need finite time to find the shortest program of a given running time (because they all terminate, duh). Less obviously, you only need finite time to find the shortest program using limited space. The latter is true because the halting problem is decidable for programs running in finite space (think about it). Thus, you can limit as running time -> infinity or as space -> infinity. You need not do both.
Joey Adams
@Joey this would be a limitation in the maximum "size" of the Turing machines being checked. If I understand you, you are saying that given a finite set of TM's of a set max "size" you can always determine if the program will halt or not.. because the program doing the termination analysis can be "larger" than all of the TM's in question thus eliminating the possiblity of giving the termination analysis program itself as input. I know that one of the proofs for undecidablity involves giving a program itself as input.
Tony
@Joey your proposed solution isn't really practical for real life. I can't run an infinite number of parallel turing machines in real life. Besides, I'm only looking for an approximation.
Tony
@Joey: are you referring to http://en.wikipedia.org/wiki/State_space_enumeration ?
rwong
A: 

I think this might work? If somebody sees an error, please point it out.

function KApprox(S:string,t:integer,TapeSizeMax:integer) : Turing Machine of size k
  begin

    // An abstract data type that represents a turing machine of size k
    var TM(k:integer) : Turing Machine of size k;
    var TMSmallest(k:integer) : Turing Machine of size k;  

    var j : integer;
    var i : integer;

    for (j = t to 0 step -1) // reduce the time counter by 1
      begin
       for (i = TMax to 1 step -1) // go to the next smaller size of TM
         begin
          foreach (TM(i)) // enumerate each TM of size i
             begin 
               if (TM(i).halt(TapeSizeMax) == true) and (TM(i).output() == S) then
                 begin
                   if (sizeof(TM(i)) < sizeof(TMSmallest(i))) then
                      TMSmallest(i): = TM(i);
                 end;
             end;
         end;
      end;      
    return TMSmallest;
 end;
Tony
I think the fatal flaw is that `TM[i].output()` might never return.
Gabe
@Gabe.. good point.. Will need to resovle that.
Tony
I think this will fix the halting issue.
Tony
The problem now is that there is no such thing as the `halts()` function. You can write one that works for some machines, but not all of them.
Gabe
@ok, so I specified that the halts function will work for only thoes machines that are of size TMax. hence .halts(TMax)
Tony
But you can't write a `halts()` function for TMax > 4. See http://mathworld.wolfram.com/HaltingProblem.html
Gabe
Ok, then will have to restrict the tape size of the TM...TM.halts(TapeSizeMax) == true.
Tony
+1  A: 

The wikipedia page for Kolmogorov complexity has a subsection entitled "Incomputability of Kolmogorov complexity", under the "Basic results" section. This is not intended to be a basic measure that you can compute, or even approximate productively.

There are better ways of achieving what you want, without a doubt. If a measure of randomness is what you want, you could try the binary entropy function. Compressibility by one of the standard algorithms might also fit the bill.

Rob Lachlan
The Wiki article doesn't even mention the phrase "approximate productively" anywhere. The question of computing the KC of a string isn't being asked. It's undecidable.. end of story. All I'm looking for is a function that will lead to better and better approximations by giving it more time and space resources.
Tony
@Tony: Your algorithm isn't fully specified. I'm not sure how you plan to test every possible Turing Machine up to some size with every possible input string, but even if you could do this in some meaningful way, the Time cost would be exponential on input. However nice the theory might seem, it's just not something which will work for you in practice.
Rob Lachlan
@Rob, the function only takes 1 string as input "S:String", and will only test turing machines of size TMax. so we're not testing all turing machines, and hence cannot get the exact KC of the input string.
Tony
@Tony - I think it is pretty much taken as read that small randomly generated Turing machines will not be solutions (i.e. valid compressors) for the vast majority of input strings.
Stephen C
@Stephen. The Turning machines are not to be randomly generated. They are to be enumerated. Any notion of optmization must only be considered when an algorithim is decided on.
Tony
@Tony - I think it is pretty much taken as read that **no** small Turing machines will be solutions (i.e. valid compressors) for the vast majority of input strings. Any algorithm for calculating approximate KC that enumerates Turing machines will be exponential on the size of the Turing machines. In short, such an algorithm would be impractical.
Stephen C
@Stephen, i see this as being true. When the first resolution based automated theorem proves were made, they had semi-decidable running times, and still do. However, progress was made to improve the runtime in specific cases. The reason I make this analogy is because ATP's are sound (and for the most part) complete. I would like a machine learning method that isn't going to miss some fundamental inferences because of a lack of completeness in it's search method. I hope this helps understand what I'm looking for.
Tony