views:

454

answers:

3

How should I monitor the progress of a mapped function in clojure?

When processing records in an imperative language I often print a message every so often to indicate how far things have gone, e.g. reporting every 1000 records. Essentially this is counting loop repetitions.

I was wondering what approaches I could take to this in clojure where I am mapping a function over my sequence of records. In this case printing the message (and even keeping count of the progress) seem to be essentially side-effects.

What I have come up with so far looks like:

(defn report
  [report-every val cnt]
  (if (= 0 (mod cnt report-every))
     (println "Done" cnt))
  val)

(defn report-progress
  [report-every aseq]
  (map (fn [val cnt] 
          (report report-every val cnt)) 
       aseq 
       (iterate inc 1)))

For example:

user> (doall (report-progress 2 (range 10)))
Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)

Are there other (better) ways of achieving this effect?

Are there any pitfalls in what I am doing? (I think I am preserving laziness and not holding the head for example.)

+12  A: 

The great thing about clojure is you can attach the reporting to the data itself instead of the code that does the computing. This allows you to separate these logically distinct parts. Here is a chunk from my misc.clj that I find I use in just about every project:

(defn seq-counter 
  "calls callback after every n'th entry in sequence is evaluated. 
  Optionally takes another callback to call once the seq is fully evaluated."
  ([sequence n callback]
     (map #(do (if (= (rem %1 n) 0) (callback)) %2) (iterate inc 1) sequence))
  ([sequence n callback finished-callback]
     (drop-last (lazy-cat (seq-counter sequence n callback) 
                  (lazy-seq (cons (finished-callback) ())))))) 

then wrap the reporter around your data and then pass the result to the processing function.

(map process-data (seq-counter inc-progress input))
Arthur Ulfeldt
I think I am doing something crudely similar above, attaching the reporting to a seq with which anything could be done. I was envisioning attaching it to a sequence of results but it could equally well be the sequence of inputs. Your code is much nicer though. I hadn't progressed (pardon the pun) to using a callback for the reporting message (or more general function) and I was calling the reporting function for every value.
Alex Stoddard
Is there anywhere that you share for misc.clj? I would certainly benefit from seeing other ideas and implementations of useful stuff like seq-counter
Alex Stoddard
yes it is really the same as your initial example, i was a little fast on the "ohh thats in misk.clj" with out properly understanding the question. http://code.google.com/p/cryptovide/source/browse/src/com/cryptovide/misc.clj.
Arthur Ulfeldt
the function butlast-with-callback frightens me somewhat.
Arthur Ulfeldt
+2  A: 

I don't know of any existing ways of doing that, maybe it would be a good idea to browse clojure.contrib documentation to look if there's already something. In the meantime, I've looked at your example and cleared it up a little bit.

(defn report [cnt]
  (when (even? cnt)
    (println "Done" cnt)))

(defn report-progress []
  (let [aseq (range 10)]
    (doall (map report (take (count aseq) (iterate inc 1))))
    aseq))

You're heading in the right direction, even though this example is too simple. This gave me an idea about a more generalized version of your report-progress function. This function would take a map-like function, the function to be mapped, a report function and a set of collections (or a seed value and a collection for testing reduce).

(defn report-progress [m f r & colls]
  (let [result (apply m
                 (fn [& args]
                   (let [v (apply f args)]
                     (apply r v args) v))
                 colls)]
    (if (seq? result)
      (doall result)
      result)))

The seq? part is there only for use with reduce which doesn't necessarily returns a sequence. With this function, we can rewrite your example like this:

user> 
(report-progress
  map
  (fn [_ v] v)
  (fn [result cnt _]
    (when (even? cnt)
      (println "Done" cnt)))
  (iterate inc 1)
  (range 10))

Done 2
Done 4
Done 6
Done 8
Done 10
(0 1 2 3 4 5 6 7 8 9)

Test the filter function:

user> 
(report-progress
  filter
  odd?
  (fn [result cnt]
    (when (even? cnt)
      (println "Done" cnt)))
  (range 10))

Done 0
Done 2
Done 4
Done 6
Done 8
(1 3 5 7 9)

And even the reduce function:

user> 
(report-progress
  reduce
  +
  (fn [result s v]
    (when (even? s)
      (println "Done" s)))
  2
  (repeat 10 1))

Done 2
Done 4
Done 6
Done 8
Done 10
12
Nicolas Buduroi
I don't think you understood what I was trying to do with 'doall' (sorry for the lousy and unclear code). I was just using doall to test reporting at the repl to force reporting over processing the whole sequence (otherwise it would be lazily evaluated). doall wasn't part of my attempted reporting function or intended sequence processing.
Alex Stoddard
+4  A: 

I would probably perform the reporting in an agent. Something like this:

(defn report [a]
  (println "Done " s)
  (+ 1 s))

(let [reports (agent 0)]
  (map #(do (send reports report)
            (process-data %))
       data-to-process)
Dan
That's an interesting approach. Curiously the reporting doesn't show up in my repl if I use slime-mode in emacs but does print in a normal repl.
Alex Stoddard
On further reflection I could just increment things in the function sent to agent. The printing of the progress could be a regular function at the repl that accesses the agent's state.
Alex Stoddard
Good point actually. In fact, if you are updating a GUI, you probably have to do it in the main thread (or defer it to the main thread, dispatchLater etc) anyway.
Dan