If the logic you're using to process the Cartesian product isn't somehow inherently sequential, then maybe you could just split your inputs into halves (perhaps splitting each input seq in two), calculate 8 separate Cartesian products (first-half x first-half x first-half, first-half x first-half x second-half, ...), process them and then combine the results. I'd expect this to give you quite a boost already. As for tweaking the performance of the Cartesian product building itself, I'm no expert, but I do have some ideas & observations (one needs to calculate a cross product for Project Euler sometimes), so I've tried to summarise them below.
First of all, I find the c.c.combinatorics
function a bit strange in the performance department. The comments say it's taken from Knuth, I believe, so perhaps one of the following obtains: (1) it would be very performant with vectors, but the cost of vectorising the input sequences kills its performance for other sequence types; (2) this style of programming doesn't necessarily perform well in Clojure in general; (3) the cumulative overhead incurred due to some design choice (like having that local function) is large; (4) I'm missing something really important. So, while I wouldn't like to dismiss the possibility that it might be a great function to use for some use cases (determined by the total number of seqs involved, the number of elements in each seq etc.), in all my (unscientific) measurements a simple for
seems to fare better.
Then there are two functions of mine, one of which is comparable to for
(somewhat slower in the more interesting tests, I think, though it seems to be actually somewhat faster in others... can't say I feel prepared to make a fully educated comparison), the other apparently faster with a long initial input sequence, as it's a restricted functionality parallel version of the first one. (Details follow below.) So, timings first (do throw in the occasional (System/gc)
if you care to repeat them):
;; a couple warm-up runs ellided
user> (time (last (doall (pcross (range 100) (range 100) (range 100)))))
"Elapsed time: 1130.751258 msecs"
(99 99 99)
user> (time (last (doall (cross (range 100) (range 100) (range 100)))))
"Elapsed time: 2428.642741 msecs"
(99 99 99)
user> (require '[clojure.contrib.combinatorics :as comb])
nil
user> (time (last (doall (comb/cartesian-product (range 100) (range 100) (range 100)))))
"Elapsed time: 7423.131008 msecs"
(99 99 99)
;; a second time, as no warm-up was performed earlier...
user> (time (last (doall (comb/cartesian-product (range 100) (range 100) (range 100)))))
"Elapsed time: 6596.631127 msecs"
(99 99 99)
;; umm... is syntax-quote that expensive?
user> (time (last (doall (for [x (range 100)
y (range 100)
z (range 100)]
`(~x ~x ~x)))))
"Elapsed time: 11029.038047 msecs"
(99 99 99)
user> (time (last (doall (for [x (range 100)
y (range 100)
z (range 100)]
(list x y z)))))
"Elapsed time: 2597.533138 msecs"
(99 99 99)
;; one more time...
user> (time (last (doall (for [x (range 100)
y (range 100)
z (range 100)]
(list x y z)))))
"Elapsed time: 2179.69127 msecs"
(99 99 99)
And now the function definitions:
(defn cross [& seqs]
(when seqs
(if-let [s (first seqs)]
(if-let [ss (next seqs)]
(for [x s
ys (apply cross ss)]
(cons x ys))
(map list s)))))
(defn pcross [s1 s2 s3]
(when (and (first s1)
(first s2)
(first s3))
(let [l1 (count s1)
[half1 half2] (split-at (quot l1 2) s1)
s2xs3 (cross s2 s3)
f1 (future (for [x half1 yz s2xs3] (cons x yz)))
f2 (future (for [x half2 yz s2xs3] (cons x yz)))]
(concat @f1 @f2))))
I believe that all versions produce the same results. pcross
could be extended to handle more sequences or be more sophisticated in the way it splits its workload, but that's what I came up with as a first approximation... If you do test this out with your programme (perhaps adapting it to your needs, of course), I'd be very curious to know the results.