views:

228

answers:

6

Imagine, there are two same-sized sets of numbers.

Is it possible, and how, to create a function an algorithm or a subroutine which exactly maps input items to output items? Like:

Input = 1, 2, 3, 4
Output = 2, 3, 4, 5

and the function would be:

f(x): return x + 1

And by "function" I mean something slightly more comlex than [1]:

f(x):
    if x == 1: return 2
    if x == 2: return 3
    if x == 3: return 4
    if x == 4: return 5

This would be be useful for creating special hash functions or function approximations.


Update:

What I try to ask is to find out is whether there is a way to compress that trivial mapping example from above [1].

+1  A: 

It seems to me that you want a hashtable. These are based in hash functions and there are known hash functions that work better than others depending on the expected input and desired output.

If what you want a algorithmic way of mapping arbitrary input to arbitrary output, this is not feasible in the general case, as it totally depends on the input and output set.

For example, in the trivial sample you have there, the function is immediately obvious, f(x): x+1. In others it may be very hard or even impossible to generate an exact function describing the mapping, you would have to approximate or just use directly a map.

Vinko Vrsalovic
The second one - "algorithmic way of mapping". Why is this not feasible in general case? I have only one input and one output..?
Ian_K
sdcvvc answered that :)
Vinko Vrsalovic
+3  A: 
Jacob
Yes, except the curve-fitting produces very high-order polynomial for large sets, am I mistaken? But for smaller cases, yes, this could work, thanks.
Ian_K
You can choose the order of polynomial you want to use for your approximation function. Linear, quadratic, etc.You could also attempt piecewise linear.
mskfisher
As mskfisher says, curve-fitting can be forced to produce low-degree polynomials (with approximation), and even without approximation, polynomial interpolation can find the lowest-degree polynomial that fits (i.e., it will find "x+1" for the set in your question). You can also try splines, which can sometimes have a shorter description than a single polynomial that fits all points.
ShreevatsaR
A: 

In some cases (such as your example), linear regression or similar statistical models could find the relation between your input and output sets.

Dave Costa
I would like to have an exact mapping of inputs-outputs, I don't believe linear regression can do that...?
Ian_K
If there is a perfect linear relationships between the inputs and outputs, as in your example, linear regression would find it. The correlation of the regression would be 1, to indicate a perfect fit.
Dave Costa
A: 

Doing this in the general case is arbitrarially difficult. For example, consider a block cipher used in ECB mode: It maps an input integer to an output integer, but - by design - deriving any general mapping from specific examples is infeasible. In fact, for a good cipher, even with the complete set of mappings between input and output blocks, you still couldn't determine how to calculate that mapping on a general basis.

Obviously, a cipher is an extreme example, but it serves to illustrate that there's no (known) general procedure for doing what you ask.

Nick Johnson
+5  A: 

Finding the shortest program that outputs some string (sequence, function etc.) is equivalent to finding its Kolmogorov complexity, which is undecidable.

If "impossible" is not a satisfying answer, you have to restrict your problem. In all appropriately restricted cases (polynomials, rational functions, linear recurrences) finding an optimal algorithm will be easy as long as you understand what you're doing. Examples:

In case of polynomial sequences, it often helps to consider the sequence bn=an+1-an; this reduces quadratic relation to linear one, and a linear one to a constant sequence etc. But there's no silver bullet. You might build some heuristics (e.g. Mathematica has FindSequenceFunction - check that page to get an impression of how complex this can get) using genetic algorithms, random guesses, checking many built-in sequences and their compositions and so on. No matter what, any such program - in theory - is infinitely distant from perfection due to undecidability of Kolmogorov complexity. In practice, you might get satisfactory results, but this requires a lot of man-years.

See also another SO question. You might also implement some wrapper to OEIS in your application.

Fields:

Mostly, the limits of what can be done are described in

  • complexity theory - describing what problems can be solved "fast", like finding shortest path in graph, and what cannot, like playing generalized version of checkers (they're EXPTIME-complete).

  • information theory - describing how much "information" is carried by a random variable. For example, take coin tossing. Normally, it takes 1 bit to encode the result, and n bits to encode n results (using a long 0-1 sequence). Suppose now that you have a biased coin that gives tails 90% of time. Then, it is possible to find another way of describing n results that on average gives much shorter sequence. The number of bits per tossing needed for optimal coding (less than 1 in that case!) is called entropy; the plot in that article shows how much information is carried (1 bit for 1/2-1/2, less than 1 for biased coin, 0 bits if the coin lands always on the same side).

  • algorithmic information theory - that attempts to join complexity theory and information theory. Kolmogorov complexity belongs here. You may consider a string "random" if it has large Kolmogorov complexity: aaaaaaaaaaaa is not a random string, f8a34olx probably is. So, a random string is incompressible (Volchan's What is a random sequence is a very readable introduction.). Chaitin's algorithmic information theory book is available for download. Quote: "[...] we construct an equation involving only whole numbers and addition, multiplication and exponentiation, with the property that if one varies a parameter and asks whether the number of solutions is finite or infinite, the answer to this question is indistinguishable from the result of independent tosses of a fair coin." (in other words no algorithm can guess that result with probability > 1/2). I haven't read that book however, so can't rate it.

Strongly related to information theory is coding theory, that describes error-correcting codes. Example result: it is possible to encode 4 bits to 7 bits such that it will be possible to detect and correct any single error, or detect two errors (Hamming(7,4)).

The "positive" side are:

  • symbolic algorithms for Lagrange interpolation and Pade approximation are a part of computer algebra/symbolic computation; von zur Gathen, Gerhard "Modern Computer Algebra" is a good reference.

  • data compresssion - here you'd better ask someone else for references :)

sdcvvc
"If 'impossible' is not a satisfying answer" - haha!Yes, "Kolmogorov complexity" is exactly the right direction. And Thanks for all the links.
Ian_K
The links are incredible, please, don't you have more? What is the field dealing with such things called, anyway? Formal languages? Complexity theory?
Ian_K
Good answer. Somehow K-C theory completely slipped my mind.
Nick Johnson
Wow! Now that opens some fields I had never heard of.
Matthieu M.
Here you go :-)
sdcvvc
I can't thank you enough. Totally incredible!
Ian_K
A: 

Discerning an underlying map from input and output data is exactly what Neural Nets are about! You have unknowingly stumbled across a great branch of research in computer science.

ldog