tags:

views:

538

answers:

3

How can I cast a sequence back to vector after a sequence producing operation (like sort)? Does using (vec..) on a sequence that was a vector is costly?

One (bad?) possibility is creating a new vector out of sequence:

(vec (sort [1 2 3 4 5 6]))

I am asking because I need random access (nth ..) to huge sorted vectors - which are now huge sequences after the sort, with horrible O(n) random access time

+4  A: 

If you need to random access on the result of sort with huge vectors, then the time took by the call to vec should be far outweighed by time savings of doing so.

If you profile and find that it is too slow, you'll probably have to use java arrays.

Brian
That's what I am doing now. calling vec. But I wonder if there is some better way
bugspy.net
+3  A: 

From my own tests (nothing scientific) you may be better with working directly on arrays in cases where you do lots of sorting. But if you sort rarely and have a lots of random access to do though, going with a vector may be a better choice as random access time is more than 40% faster on average, but the sorting performance is horrible due to converting the vector to an array and then back to a vector. Here's my findings:

(def foo (int-array (range 1000)))

(time
  (dotimes [_ 10000]
    (java.util.Arrays/sort foo)))

; Elapsed time: 652.185436 msecs

(time
  (dotimes [_ 10000]
    (nth foo (rand-int 1000))))

; Elapsed time: 7.900073 msecs

(def bar (vec (range 1000)))

(time
  (dotimes [_ 10000]
    (vec (sort bar))))

; Elapsed time: 2810.877103 msecs

(time
  (dotimes [_ 10000]
    (nth bar (rand-int 1000))))

; Elapsed time: 5.500802 msecs

P.S.: Note that the vector version doesn't actually store the sorted vector anywhere, but that shouldn't change the result considerably as you would use simple bindings in a loop for speed.

Nicolas Buduroi
Nice. Now it is easy to see that doing (vec) on sorted vector is 4 times slower than sorting direct arrays! The random access time is so fast in both vector and array that the 40% doesn't matter I think.
bugspy.net
+3  A: 

Meikel Brandmeyer just posted a solution to this on the Clojure group.

(defn sorted-vec
  [coll]
  (let [arr (into-array coll)]
    (java.util.Arrays/sort arr)
    (vec arr)))

Clojure's sort returns a seq across a sorted array; this approach does much the same thing, but returns a vector, not a seq.

If you wish, you can even skip the conversion back into a Clojure persistent data structure:

(defn sorted-arr
  "Returns a *mutable* array!"
  [coll]
  (doto (into-array coll)]
    (java.util.Arrays/sort))

but the resulting Java array (which you can treat as a Clojure collection in most cases) will be mutable. That's fine if you're not handing it off to other code, but be careful.

Rich
Should be java.util.Arrays/sort (he forgot the s).But if it is the same thing, how come it it 4 times faster ? I posted in the Clojure group the timings.
bugspy.net
It's not 4 times faster, at least on a server VM (which you should be using).c.c.sort uses an explicit Comparator, and returns a seq. It also calls to-array, not into-array: the former returns an array of Objects, whilst the latter returns a typed array. Apart from those things, which make sort more general, it's the same code.Those generalities cost.The whole purpose of this exercise is to avoid some of that work; that's why this version is faster.
Rich
here's the threadhttp://groups.google.com/group/clojure/browse_thread/thread/d5b1152c9647d0fb#
Gene T