views:

141

answers:

3

I saw the following example in Rich's video on sequences http://blip.tv/file/734409 about 33-36 minutes into it:

(first "abcd") => \a

Now, he say that this expands to (sort of):

(first "abcd") => (first (seq "abcd")) => (first '(\a \b \c \d))

So, it looks like an O(N) operation because the full copy of the string is being made. First of all, if a String is immutable, then why is it copied? (Edit: based on an answer, it probably is not; just looked that way when printed.) Secondly, suppose first operated on something else in Java that is mutable, say a linked list of integers. Should first act in a lazy manner (e.g. create a persistent sequence first)? Would not it make sense to evaluate it right away and save it? It would be some sort of a hack that would break the nice abstraction, but get the job done fast, I think. When you call (seq "abcd"), you do not know how it will be used. When you call a first on a seq, you know what to do. But, when you call first on "abcd", I think that performing a hacky and fast "grab and save it", approach is better than grab a sequence and then call first.

Am I missing something? Did Rich Hickey skip some steps?

Let me know if I have questions. Thanks!

+5  A: 

It doesn't follow that a full copy of the string is being made. (But this is a good question.)

In the source for clojure.lang.RT you'll notice that the runtime uses charsequence to create the sequence:

static ISeq seqFrom(Object coll){
    if(coll instanceof Seqable)
        return ((Seqable) coll).seq();
    else if(coll == null)
        return null;
    else if(coll instanceof Iterable)
        return IteratorSeq.create(((Iterable) coll).iterator());
    else if(coll.getClass().isArray())
        return ArraySeq.createFromObject(coll);
    else if(coll instanceof CharSequence)
        return StringSeq.create((CharSequence) coll);
          .
          .
          .

So really, this is a java question, not a clojure question. I haven't checked, but I'm fairly certain that CharSequence does the "right thing".

Rob Lachlan
Thanks for the answer, Rob. Suppose `CharSequence` does the right thing, because it can, because `String` is immutable. Now, what if I had a mutable linked list of 999 integers defined in Java code, and I wanted to grab just the first item. Can I do better than copying 999 items first, or is this the penalty that I would have to pay for doing things the "non-clojure" way? E.g. I should have made a class which contains those immutable perhaps? It seems to me like there would be cases when an imperfect Java lib needs to be used from within Clojure. I understand that first element can be large ..
Hamish Grubijan
(continued), so calling `first` can take awhile - say for instance we call first on a mutable linked list of mutable linked lists. I understand why we would want to turn the very first inner mutable linked list into a persistant sequence by means of copying, but why do the same thing to the `rest` of it, to things that will never be looked at in this particular case?
Hamish Grubijan
There's no need to copy 999 items into a seq, since the seq just uses the underlying data structure as needed, and for performance and immutability reasons, caches what it finds. If you call `first` on some collection (e.g., your linked list), then the only element of that collection the seq needs to look at is the first one. The as-yet unobserved elements of the underlying collection can be mutating without notice. Only when the seq actually reads an element will that element be "persistent" within the seq.
Alex Taggart
@Hamish: What Alex said. In cases where the seq depends on iterable, and is constructed on a mutable java collection, I'm pretty sure you're on your own as far as synchronization is concerned. seq will not guarantee the immutability of fundamentally mutable structures. Luckily, clojure's data structures make it so that you rarely need mutability.
Rob Lachlan
+2  A: 

You might look at the seq as if it creates a lazy sequence based on the string.

It doesn't actually create a lazy sequence, it actually short circuits to the java implementation of CharSequence, but that's an implementation detail.

From a complexity point of view, first is O(1) on a seq, and you can create a seq in constant time from anything is iterable.

ithkuil
+2  A: 

The other answers are correct, but I'll use this chance to point out an interesting effect of Clojure's philosophy of immutability.

So, it looks like an O(N) operation because the full copy of the string is being made.

Strings, like other data structures in Clojure, are immutable. (They're a special case, being implemented in the JVM rather than Clojure, but that's immaterial to this point.)

Copies of immutable objects are free. That's right, free. Because you don't need to copy at all. The same allocated object in memory can simply be reused wholesale, because it's guaranteed to always be the same as when it was "copied."

So a function like seq never has to actually copy anything. It just operates on whatever is passed, directly, and returns a (possibly lazy) sequence that provides an abstract interface to whatever you called seq on.

And so, seq is always O(1).

levand
Thanks, any further input on working with mutable Java objects which can be turned into a sequence?
Hamish Grubijan
IIRC, calling seq returns a lazy Clojure sequence backed by a Java iterator over the collection. So as the lazy sequence is realized, it needs to follow all the rules that pertain to iterators (as far as mutating the underlying object goes) Once it's been realized, however, it is an immutable Clojure sequence.
levand