views:

316

answers:

7

Motivation: I'd like to be able to use toy functional programming in languages without first-order functions, by using natural numbers instead of functions.

A universal function is a function f : N -> (N -> N), equivalently f : N * N -> N that enumerates all possible computable functions. In other words, there's a number k such that f(k) is the squaring function, there's a number j such that f(j) is the n-th prime function etc.

To write such a function, one can take any Turing-complete language (programming language compiler, lambda calculus, Turing machines...) and enumerate all programs. I'd like to allow not only evaluation, but also operations on functions like addition, composition, currying. For example, given indices of two functions f,g I'd like to know what is the index of the function f+g, or f composed with g. This would allow "toy functional programming".

What is a good way to write such code library? I'm not looking for a minimalistic Turing tarpit that will struggle to compute factorial of 10, nor I don't want to write an advanced compiler. It should have some basic functions like addition and possibility to write loop, but not much more.

Solutions in all high-level languages are welcome. Pseudocode, Haskell and Python are preferred. You can assume arbitrary precision arithmetic. Using eval or similar is not allowed.

Clarification: Enumerated functions will consist of all partial recursive (computable) ones - this includes functions that don't halt on some inputs. The universal function will hang in that cases; of course this is unavoidable. See also: m-recursive functions - http://en.wikipedia.org/wiki/Μ-recursive_function.

A: 

not an easy question. I guess you have to start with a function generator which is able generate all functions one by one. This will result in an enumeration.

Since you have to deal with multiple endless dimensions... let's think about it.

Let's reduce the problem to functions with n parameters and the basic operations +, -, *, /.
Let's build all functions with only one operation:

a + a
a + b
a - a
a - b
a * a
a * b
a / a
a / b

I guess it is easy to see that some of these functions make more sense as others and some of them could be equal, but at least, it is a mapping which can be generated through a loop.

Now in the next iteration, one could easily add to each of these functions

  • one of the already existing parameters with all operations
  • a third parameter with all operations

Afterwards you have a huge list of functions for which you can repeat step two.

Since the is a function which estimates all more complex functions like sin and log (taylor series), these should be covered in this functions space too.

Does this help? Feel free to edit this post!

Just re-read your post. If you want to enumerate all programmatic functions and not only numeric once, I guess it will be more complex. I guess then it just would make sense to work with a mapping "function <-> number" by zipping the source of your function and treating the zip file as a large number. The other way around, you can try to unzip any number and see if it creates a useful function :-) But I guess you will have lots of number which are not even zip files.

But it would fullfill your requirement that for every function there is a number which represents it :-)

Ralf
+6  A: 

What you want is called an interpreter.

First, any enumeration with the properties you want is not going to fit the interesting functions you want to manipulate in the first 2^32, or even the first 2^64, integers. So you will need larger integers, allocated somewhere in memory and referenced through a pointer.

Why not use arrays of chars (strings) representing program in any existing syntax then? Consider such a string as an integer if that makes you happy. The number of the function to compute f1()+f2() is the string made of (representation of f1), "+", and (representation of f2). You get the idea...

What this approach does not have is unicity of the representation of a function, that was maybe implied in your question (I am not sure). What I am sure of is that unicity of representation is incompatible with having simple, or even computable, composition operations on function representations -- For instance, there would be an easy solution to the Halting problem if it wasn't the case.

Pascal Cuoq
Agreed - unicity of representation is impossible. Instead of using strings, trees might be used to represent parsed strings; the internal storage isn't that important. The question is, what set of functions should I allow the interpreter to have?
sdcvvc
Right on. There are more than 2^64 functions of the form `(x -> 1501 * x + 67)` alone, and you never know which one will turn out to be useful.
Jason Orendorff
sdcvvc, now you seem to be asking a programming language design question without providing any requirements.
Jason Orendorff
Not only are there more than 2^64 functions of the form (x -> 1501 * x +67), but there are countably infinite functions of the form (x-> n). You really need to place some limitations on what functions you want to describe.
Jacob Schlather
@Jason Orendorff: the only requirements I put are: (1) being Turing-complete (2) as easy implementation as possible, using natural numbers; it will be enough if the level of expressiveness will be comparable to BASIC, or Floop (en.wikipedia.org/wiki/BlooP_and_FlooP). Any design will do. @Liberalkid: all computable ones; up to recursive isomorphism there's only one possible solution, by Myhill isomorphism theorem.
sdcvvc
A: 

You can take any programming language such that you can determine whether something is a program or not, and list all programs in lexicographic order. To avoid at least a little of the combinatoric explosion, you can assign user-defined names (variables, functions, etc.) in a normalized form. Obviously, this is going to result in an immense number of functions, and it's not going to be easy to pick out which ones are actually useful. Any automatic method of trimming will either exclude some functions that you're actually going to want, or fail to trim the combinatoric explosion enough to be useful, or both.

The other disadvantage of this is that it's going to be very difficult to go from the number to the function: it's hard to find a better way of finding function 433,457,175,432,167,463 than to enumerate about four hundred quadrillion functions.

The other way is to encode a function into a number by mapping the symbols to numbers, and effectively concatenating them.

Assume that the symbols are +, -, :=, ==, <, if, then, endif, do, end_do_condition, enddo, and a statement delimiter. That's 11 symbols right there, without variables, for a pretty minimal set that doesn't include anything like a function call, and requires that you multiply and divide yourself. (I'm not actually sure this would work without a logical operator or two.) Add five variable names, and you've got a programming language with 4-bit characters. This means that a maximum of sixteen characters will fit into a 64-bit unsigned integer.

Once you've got this, all the possible relations between functions are going to be representable as an arithmetic relation, but an immensely complicated one that will be far to complex to get right in practice.

In short, while this is theoretically possible, it's going to be far too clumsy in practice. It would probably be easier to write an interpreter for a functional language in your nonfunctional language of choice.

David Thornley
isn't your definition of a 'programming language with 4-bit characters' ..assembly?
lorenzog
A: 

To write such a function, one can take any Turing-complete language (programming language compiler, lambda calculus, Turing machines...) and enumerate all programs

I'm not very sure if that can be done at all... It feels like it goes against the Turing-Church thesis. In order to enumerate all programs, first you need an algorithm that determines which programs are valid and which ones aren't, and that's impossible... Unless you don't care about that and allow incorrect programs in your language.

But maybe the Godelization of a formal system could help you... I'd try with Lisp, having the code as data could help a lot.

fortran
Yes, it's impossible, but all programming languages handle that in the same way (right?) by allowing some "invalid" programs (e.g. programs that don't terminate) while rejecting the ones with obvious errors. The set of programs that fit a given grammar is always countable... right?
Jason Orendorff
It's not impossible to determine which programs are valid and which aren't (well, maybe in C++, but not in a simple Turing machine language). It's impossible to decide which programs halt and which don't. But you don't need to know that in order to run every program that does halt - what you do is run the first step of program 1, then the first step of program 2, then step 2 of 1, then 1 of 3, 2 of 2, 3 of 1, and so on. If a program halts when run on its own, it will halt when run this way. Obviously you need a lot of memory, though...
Steve Jessop
Yes, that's true, overlooking that it seems possible...
fortran
... so an alternative would be to run the first million steps of each of the first million programs in turn, then start all over again and run the first two million steps of the first two million programs in turn, and so on. You don't have to store the state of each program, just for the programs you're interested in, flag which have halted. Of course this requires a lot of time, since you're re-doing work. But that should be expected if you're running every single program that exists :-)
Steve Jessop
Anyway, the point is you can enumerate valid programs, just make sure you pick a language in which its easy to determine whether a program is valid or not. With a suitable sandboxed virtual machine; an instruction set where anything that isn't explicitly defined as an operation is a no-op; and jump-out-of-bounds means "halt", every sequence of bytes is a valid program. Easy :-) It's just not the same thing as evaluating all computable functions, since (a) "all programs" includes the non-halting ones, and (b) if a program is to be used as a function, then whether it halts depends on input.
Steve Jessop
@Jason: No, not unless your language has a size limit. Theoretically at least, you can make a program with any number of characters up to infinity, and thus there are an infinite number of programs. Subtracting one infinity from another is still infinity (in this case the fist infinity is a superset of the second). (Not a math guy so tell me if any of that is wrong)
RCIX
@RCIX countable is "a kind of infinity", exactly those infinities that can be mapped to the natural numbers (its cardinality is Aleph_0). Aleph_0 +-k is still Aleph_0 (e.g., if you subtract "one" from infinity you still get the same infinity, not a subset). But also multiplying Aleph_0 by a constant yields still to Aleph_0 (you can map the integer numbers to the natural). And even more, squaring Aleph_0 is still Aleph_0! (you can still map the rational numbers to the integers)...
fortran
the previous comment was just a mathematical introduction of the concept of countable, let's back to the basics... An easy way to map programs to natural numbers is just interpreting it's source code as a big binary number. My initial objection was about being able to distinguish the valid programs from the invalid ones, to have a mapping without gaps.
fortran
And crucially for what fortran says, the set of finite sequences of a countable alphabet is countable. Programs are finite sequences of a finite alphabet (ASCII, for instance, or Unicode), so even better. If programs could be infinitely long, then there would be uncountably many of them (that is, more than Aleph_0), like the real numbers. But they can't, so there aren't.
Steve Jessop
A: 

Not sure I understand. One thing though - you can't enumerate all possible computable functions. Short answer: because otherwise there will be a universal antivirus. Long answer: because if there was such an enumeration, you would have in that set also the function that computes the enumeration itself. Like Russel's paradox.

A different answer to your question would be - you want to ``list'' all possible computable functions; to do that, you might want to represent them as prime numbers, and use their composition as multiplication. This would guarantee uniqueness. Factorisation will give you the reverse function.

lorenzog
For the usual definitions of "enumerate" and "computable", you can enumerate all computable functions. Fix an universal Turing machine, start with those functions that are computed with an initial ribbon of length one, then move to those computed with an initial ribbon of length two, ... What does it matter that the function that computes the enumeration is somewhere in the enumeration?
Pascal Cuoq
Also, unfortunately function composition is not commutative, and multiplication is.
Pascal Cuoq
It is impossible to enumerate all recursive functions. It is possible to enumerate all partial recursive ones.
sdcvvc
thanks, modified answer accordingly.
lorenzog
-1: Answer still wrong. An enumeration of all partial recursive functions includes all recursive functions, including the (computable) function that enumerates the partial recursive functions. Turing's Entscheidungsproblem tells us that there is no recursive test for which functions in this sequence are computable.
Charles Stewart
+1  A: 

While it is not too hard to enumerate all possible expressions in some language, you won't be able to restrict these to those expressions that denote terminating functions.

But if you are not interested in termination, using combinators (with some arithmetic primitives thrown in for usefulness) might be the best way, since you avoid introducing variable names that way.

mfx
A: 

As Pascal said, what you want is a interpreter, but one can do even better: Use the processor as interpreter directly.

Feed the number N (say, as some big int array) directly to a buffer and execute this buffer as machinecode.

For every possible function you computer is able to execute there exists a N. Unfortunatly. not every N is a valid program (this wasn't requested) or the terminating program (which isn't possible).

On the other hand, this function will produce such gems as World of Warcraft, Microsoft Office 17 including Service Pack 6 and Windows 9.

drhirsch