tags:

views:

151

answers:

5

That is pretty much the whole question. I'm afraid I don't know templates (or C++, really), but I know algorithms and data structures (even some OOP! :). Anyway, to make the question a bit more precise, consider what I would like to be part of the answer (among others I don't know in advance).

  1. Why is it coded as a template?
  2. How does the template work?
  3. How does it do mem allocation?
  4. Why is (is not) better than mere null terminated char arrays?

Thank you.

A: 

A good free online resource is "Thinking in C++" by Bruce Eckel, whose site is here: http://mindview.net/Books/TICPP/ThinkingInCPP2e.html .

The second volume of his free book is mirrored here: http://www.smart2help.com/e-books/ticpp-2nd-ed-vol-two/#_ftnref14 . Chapter three is all about the string class, why it's a template, and why it's useful.

Zeke
Thanks for the resource. I'm not really asking about how to use it, just how it's implemented itself at the lowest level.
Dervin Thunk
+2  A: 
  1. string is not the template, string is a specialization of the basic_string class template for char. It's a template so that for example you can typedef wstring which specializes on wide characters, and use all the same code for the encapsulated value.

  2. See @Gman's comment. Compile-time code reuse, while retaining the ability to selectively special-case, is the basic rationale for templates.

  3. Implementation dependent. Some do single-instance allocation, with copy on write. Some use a builtin buffer for small strings and allocate from heap only after a certain size is reached. I suggest you investigate how it works on your compiler by walking the constructor and follow-on code in <string>, as that will help you understand 2. hands on, which is way more valuable than just reading about it (though a book or other reading is a great idea for intro to templates).

  4. Because const char* and the CRT that supports it is a bug farm for the unwary. Check out all the stuff you get for free with std::string. Plus a whole bunch of Standard C++ algorithms that work with string iterators.

Steve Townsend
+12  A: 
  1. std::string is actually a typedef to a std::bacic_string<char>, and therein lies the answer to your #1 above. Its a template in order to make basic_string work with pretty much anything. char, unsigned char, wchar_t, pizza, whatever... string itself is just a programmer convenience that uses char as the datatype, since that's what's often wanted.

  2. Unanswerable as asked. If you're confused about something, please try to narrow it down a bit.

  3. There are two answers. One, from the application-layer point of view, all basic_string objects use an allocator object to do the actual allocation. Allocation methods may vary from one implementation to the next, and for different template parameters, but in practice they will use new at the lower levels to allocate & manage the contained resource.

  4. Its better than mere char arrays for a wide variety of reasons.

    • string managers the memory for you. You do not have to ever allocate buffer space when you add or remove data to the string. If you add more than will fit in the currently-allocated buffer, string will reallocate it for you behind the scenes.

    • In this regard, string can be thought of as a kind of smart pointer. For the same reasons why smart pointers are better than raw pointers, string s are better than raw char arrays.

    • Type safety. This may seem a little convoluted, but string used properly has better type safety than char buffers. Consider a common scenario:

 

 #include <string>
 #include <sstream>
 using namespace std;

 int main()
 {
   const char* jamorkee_raw = "jamorkee";

   char raw_buf[0x1000] = {};
   sprintf( raw_buf, "This is my string.  Hello, %f", jamorkee_raw);  

   const string jamorkee_str = "jamorkee";
   stringstream ss;
   ss << "This is my string.  Hello " << jamorkee_str; 
   string s = ss.str();
 }

the type safety issue raised in the above by using a raw char buffer isn't even possible when using string along with streams.

John Dibling
What is with the formatting?!?! Graaaargh!
John Dibling
+1 for awesome answer. Simple but not *too* simple!
David Titarenco
Something to do with code blocks after lists. :/ Here's my attempt, you don't have to keep it.
GMan
@GMan: Thanks, I'll take it!
John Dibling
+6  A: 

A rather quick (and therefore probably incomplete) shot at answering some of the questions:

  1. Why is it coded as a template?

Templates provide the capability for the class functions to work on arbitrary data types. For example the basic_string<> template class can work on char units (which is what the std::string typedef does) or wchar_t units (std::wstring) or any POD type. Using something other than char or wchar_t is unusual (std::vector<> would more likely be used), but the possibility exists.

  1. How does it do mem allocation?

This isn't specified by the standard. In fact, the basic_string<> template allows an arbitrary allocator to be used for the actual allocation of memory (but doesn't determine at what points allocations might be requested). Some implementations might store short strings in actual class members, and only allocate dynamically when the strings grow beyond a certain size. The size requested might be exactly what's need to store the string or might be a multiple of the size to allow for growth without a reallocation.

Additional information stolen from another SO answer:

Scott Meyer's book, Effective STL, has a chapter on std::string implementations that's a decent overview of the common variations: "Item 15: Be aware of variations in string implementations".

He talks about 4 variations:

  • several variations on a ref-counted implementation (commonly known as copy on write) - when a string object is copied unchanged, the refcount is incremented but the actual string data is not. Both object point to the same refcounted data until one of the objects modifies it, causing a 'copy on write' of the data. The variations are in where things like the refcount, locks etc are stored.

  • a "short string optimization" implementation. In this variant, the object contains the usual pointer to data, length, size of the dynamically allocated buffer, etc. But if the string is short enough, it will use that area to hold the string instead of dynamically allocating a buffer

  1. Why is (is not) better than mere null terminated char arrays?

One way the string class is better than a mere null terminated array is that the class manages the memory required, so defects involving allocation errors or overrunning the end of the allocated arrays are reduced. Another (perhaps minor) benefit is that you can store 'null' characters in the string. A drawback is that there's perhaps some overhead - especially that you pretty much have to rely on dynamic memory allocation for the string class. In most scenarios that's probably not a major issue, on some setups (embedded systems for example) it can be a problem.

Michael Burr
Nice complement. Thanks.
Dervin Thunk
Not so quick, not so incomplete. Nice answer. :)
John Dibling
I was going to try to say something about the "How does the template work?" question, but I decided to punt.
Michael Burr
+1  A: 

Why is it coded as a template?

Several people have given the answer that having std::basic_string be a template means that you can have both std::basic_string<char> and std::basic_string<wchar_t>. What nobody has explained is why C and C++ have multiple character types in the first place.

C, especially in its early versions, was minimalistic about data types. Why have bool when the integers 0 and 1 work just fine? And why have distinct types for "byte" and "character" when they're both 8 bits?

The problem is that 8 bits limits you to 256 characters, which is adequate for an alphabetic language like English or Russian, but nowhere near enough for Japanese or Chinese. And now we have Unicode with its 21-bit code points. But char couldn't be expanded to 16 or 32 bits because the assumption that char = byte was so entrenched. So we got a separate type for "wide characters".

But now we have the problem that wchar_t is UTF-32 on Linux but UTF-16 on Windows. And to solve that problem the next version of the C++ standard will add the char16_t and char32_t types (and corresponding string types).

dan04