views:

182

answers:

3

The function:

Given a list lst return all permutations of the list's contents of exactly length k, which defaults to length of list if not provided.

(defun permute (lst &optional (k (length lst)))
  (if (= k 1)
   (mapcar #'list lst)
   (loop for item in lst nconcing
     (mapcar (lambda (x) (cons item x)) 
             (permute (remove-if (lambda (x) (eq x item)) lst) 
                      (1- k))))))

The problem: I'm using SLIME in emacs connected to sbcl, I haven't done too much customization yet. The function works fine on smaller inputs like lst = '(1 2 3 4 5 6 7 8) k = 3 which is what it will mostly be used for in practice. However when I Call it with a large input twice in a row the second call never returns and sbcl does not even show up on top. These are the results at the REPL:

CL-USER> (time (nth (1- 1000000) (permute '(0 1 2 3 4 5 6 7 8 9))))
Evaluation took:
12.263 seconds of real time
12.166150 seconds of total run time (10.705372 user, 1.460778 system)
[ Run times consist of 9.331 seconds GC time, and 2.836 seconds non-GC time. ]
99.21% CPU
27,105,349,193 processor cycles
930,080,016 bytes consed

(2 7 8 3 9 1 5 4 6 0)
CL-USER> (time (nth (1- 1000000) (permute '(0 1 2 3 4 5 6 7 8 9))))

And it never comes back from the second call. I can only guess that for some reason I'm doing something horrible to the garbage collector but I can't see what. Does anyone have any ideas?

+3  A: 

One thing that's wrong in your code is your use of EQ. EQ compares for identity.

EQ is not for comparing numbers. EQ of two numbers can be true or false.

Use EQL if you want to compare by identity, numbers by value or characters. Not EQ.

Actually

(remove-if (lambda (x) (eql x item)) list)

is just

(remove item list)

For your code the EQ bug COULD mean that permute gets called in the recursive call without actually a number removed from the list.

Other than that, I think SBCL is just busy with memory management. SBCL on my Mac acquired lots of memory (more than a GB) and was busy doing something. After some time the result was computed.

Your recursive function generates huge amount of 'garbage'. LispWorks says: 1360950192 bytes

Maybe you can come up with a more efficient implementation?

Update: garbage

Lisp provides some automatic memory management, but that does not free the programmer from thinking about space effects.

Lisp uses both a stack and the heap to allocate memory. The heap maybe structured in certain ways for the GC - for example in generations, half spaces and/or areas. There are precise garbage collectors and 'conservative' garbage collectors (used by SBCL on Intel machines).

When a program runs we can see various effects:

  1. normal recursive procedures allocate space on the stack. Problem: the stack size is usually fixed (even though some Lisps can increase it in an error handler).

  2. a program may allocate huge amount of memory and return a large result. PERMUTE is such a function. It can return very large lists.

  3. a program may allocate memory and use it temporarily and then the garbage collector can recycle it. The rate of creation and destruction can be very high, even though the program does not use a large amount of fixed memory.

There are more problems, though. But for each of the above the Lisp programmer (and every other programmer using a language implementation with garbage collection) has to learn how to deal with that.

  1. Replace recursion with iteration. Replace recursion with tail recursion.

  2. Return only the part of the result that is needed and don't generate the full solution. If you need the n-th permutation, then compute that and not all permutations. Use lazy datastructures that are computed on demand. Use something like SERIES, which allows to use stream-like, but efficient, computation. See SICP, PAIP, and other advanced Lisp books.

  3. Reuse memory with a resource manager. Reuse buffers instead of allocating objects all the time. Use an efficient garbage collector with special support for collecting ephemeral (short-lived) objects. Sometimes it also may help to destructively modify objects, instead of allocating new objects.

Above deals with the space problems of real programs. Ideally our compilers or runtime infrastructure may provide some automatic support to deal with these problems. But in reality this does not really work. Most Lisp systems provide low-level functionality to deal with this and Lisp provides mutable objects - because the experience of real-world Lisp programs has shown that programmers do want to use them to optimize their programs. If you have a large CAD application that computes the form of turbine blades, then theoretical/puristic views about non-mutable memory simply does not apply - the developer wants the faster/smaller code and the smaller runtime footprint.

Rainer Joswig
Am I right in thinking that the recursive implementation generates huge amounts of garbage because each call returns a list that is modified which creates a new list and discards the returned list as garbage? Is there a way around this using destructive operations or does this mean any recursive implementation will be garbage heavy?
Andrew Myers
Check these answers out:http://stackoverflow.com/questions/352203/generating-permutations-lazily
Nathan Sanders
@Rainer Thanks for the extra info! I'm an embedded programmer and primarily use C so learning what I need to worry about when using a GC'ed language is something I really need to work on.@Nathan Thanks for the tip, that looks really interesting.
Andrew Myers
+1  A: 

From the looks of the output, you're looking at the slime-repl, right?

Try changing to the "*inferior-lisp*" buffer, you'll probably see that SBCL has dropped down to the ldb (built-in low-level debugger). Most probably, you've managed to blow the call-stack.

Vatine
The inferior-lisp has indeed dropped into the ldb. Looks like it's time for me to learn something about that.
Andrew Myers
+2  A: 

SBCL on most platforms uses generational garbage collector, which means that allocated memory which survives more than some number of collections will be more rarely considered for collection. Your algorithm for the given test case generates so much garbage that it triggers GC so many times that the actual results, which obviously have to survive the entire function runtime, are tenured, that is, moved to a final generation which is collected either very rarely or not at all. Therefore, the second run will, on standard settings for 32-bit systems, run out of heap (512 MB, can be increased with runtime options).

Tenured data can be garbage collected by manually triggering the collector with (sb-ext:gc :full t). This is obviously implementation dependent.

Ramarren