views:

67

answers:

5

I'm planning on building up a potentially large string by iterating over a collection and generating chunks at a time. If I were to simply start with an NSMutableString and repeatedly append chunks to it, does that work reasonably efficiently or is it a Schlemiel the Painter situation? It seems plausible to me that NSMutableString is implemented in a way that avoids this, but I can't find any discussion of this in the official documentation and I'd like to be sure.

(Now that I'm writing this out, I realize that in this case I can build up an NSArray of strings and use -componentsJoinedByString: just as easily, but this would be good to know anyway.)

+1  A: 

I prefer to use NSArray's componentsJoinedByString: method over appendString because it gives you finer control over what the string actually looks like with less effort.

Jacob Relkin
Yes, I realized that, but it would be nice to know whether appendString should ever be used.
zem
`-componentsJoinedByString:` is an `NSArray` method.
Alexsander Akers
@Alexsander, Yes I know that.
Jacob Relkin
A: 

I more often use NSString's -stringByAppendingString: and -stringByAppendingFormat: methods. And sometimes the +stringWithFormat: method.

Alexsander Akers
That doesn't really answer the question, since `stringByAppending`Anything: would either have or not have the problem the same as `append`Whatever: does or doesn't. `stringWithFormat:` would *definitely* have the problem because it would have to copy both prefix and suffix into the resulting string.
Peter Hosey
I assume these would have at best the same performance as NSMutableString's version. It doesn't answer my question, which is whether these work more efficiently than braindead old-school null-terminated c-style string concatenation.
zem
A: 

It's an implementation detail. If it's implemented with a C string, then it would have a Schlemiel the Painter problem, because Schlemiel needs to look for the null terminator to know where to start doing more painting. If it's length-declared in any way, then no such problem exists, because Schlemiel knows how long a pole he needs to pole-vault to the end of the string to do more painting.

Ultimately, you can't know this (without disassembling) and can't rely on the answer staying the same across updates, so you just have to have a certain amount of faith that the Apple framework engineers know what they're doing.

It's better to wait until your something in your app is not as fast as it should be, and then profile your app to find out what it is. In my experience, it's almost never the framework being slow.

Peter Hosey
This is what I was worried about. I feel like this is the kind of implementation detail that SHOULD be disclosed (or at least the asymptotic complexity). For this case I'll just use componentsJoinedByString (which intuitively ought to be optimized for large numbers of strings) and accept this answer unless someone pops in with some secret knowledge.
zem
A: 

Typically I will use appendString: when I'm building up smaller strings, or making relatively few modifications to it. If I am building up a large string, depending on how large of course, perhaps creating the string out of an NSArray filled with string items may make sense.

Generally speaking though, I won't build big strings up by using appendString:.

jer
+3  A: 

The Schlemiel situation per se does not occur, since all internal NS/CFString representations use an explicit length, like all sane string implementations. (A somewhat modified version of the source of the basic CoreFoundation types from OS X 10.6.2 is available here.) The real question is about allocation overhead.

In the posted code, mutable strings’ buffers grow by 50 % at a time unless they’re very large (at least ULONG_MAX / 3UL), giving an O(log n) bound on reallocations for practical scenarios. Using the NSArray approach should result in a single allocation. On the other hand, if you build the string piecewise and release the pieces while you go, you might get less cache/VM thrashing.

So basically, the golden rule of optimization applies: if benchmarking shows a problem (on a large but realistic data set), try both.

Ahruman
Exactly what I was looking for, thank you.
zem