What is it about the way strings are implemented that makes them so expensive to manipulate?
Is it impossible to make a "cheap" string implementation?
or am I completely wrong in my understanding?
Thanks
What is it about the way strings are implemented that makes them so expensive to manipulate?
Is it impossible to make a "cheap" string implementation?
or am I completely wrong in my understanding?
Thanks
Which language?
Strings are typically immutable, meaning that any change to the data results in a new copy of the string being created. This can have a performance impact with large strings.
This is an important feature, however, because it allows for optimizations such as interning. Interning reduces the size of text data by pointing identical strings to the same copy of data.
If you are concerned about performance with strings, use a StringBuilder (available in C# and Java) or another construct that works with mutable text data.
If you are working with a large amount of text data and need a powerful string solution while still saving space, look into using ropes.
If you want a universal string working in every condition, you have to sacrifice efficiency in some cases. This is a classic tradeoff between getting one thing fast and another. So... either you use a "standard" string working properly (but not in an optimal way), or a string implementation which is very fast in some cases and cumbersome in other.
Sometimes you need immutability, sometimes random access, sometimes quick insertions/deletions...
The problem with strings is that they are not primitive types. They are arrays. Therefore, they suffer the same speed and memory problems as arrays(with a few optimizations, perhaps).
Now, "cheap" implementations would require lots of stuff: concatenation, indexOf, etc. There are many ways to do this. You can improve the implementation, but there are some limits. Because strings are not "natural" for computers, they need more memory and are slower to manipulate... ALWAYS. You'll never get a string concatenation algorithm faster than any decent integer sum algorithm.
It depends entirely on what you're trying to do with it. Mostly it's that it usually requires at least 1 new array allocation unless it's replacing a single character in a direct seek. At the simplest level a string is an array of chars. So just about anything you want to do involves iterating, removing, or inserting new things into an array.
Changes and copying of strings tends to involve memory management.
Memory management is not good for performance since it tends to require some kind of global mutex that makes your code scale poorly to multiple cores.
Since it creates new copy of the object every time in java its advisable to use StringBuffer
Syntax
StringBuffer strBuff=new StringBuffer();
strBuff.append("StringBuffer");
strBuff.append("is");
strBuff.append("more");
strBuff.append("economical");
strBuff.append("than");
strBuff.append("String");
String string=strBuff.tostring();
You want to read this Joel Spolsky article:
http://www.joelonsoftware.com/articles/fog0000000319.html
Me, I'm disappointed .NET doesn't have a native type called F***edString
.
Look into mutable strings, immutable strings, and ropes, and think about how you would implement common operations in a low-level language (say, C). Consider:
Coming up with algorithms for these situations will give you a feel for when each type of storage would be appropriate.
Many of the points here are well taken. In isolated cases you may be able to cheat and do thing like using a 64bit int to compare 8 bytes at time in a string, but there are not a lot of generalized cases where you can optimize operations. If you have "pascal style" string with a numeric length field compares can be short circuited logic to only check the rest of the string if the length is not the same. Other operations typically require you to handle the characters a byte at time or completely copy them when you use them. i.e. concatenation => get length of string 1, get length of string 2, allocated memory, copy string 1, copy string 2. It would be possible to do operations like this using a DMA controller in a string libary, but the overhead of setting it up for small strings would outweigh the benefits.
Pete