views:

440

answers:

6

I have a map that represents a DB object. I want to get 'well known' values from it

 std::map<std::string, std::string> dbo;
 ...
 std::string val = map["foo"];

all fine but it strikes me that "foo" is being converted to a temporary string on every call. Surely it would be better to have a constant std::string (of course its probably a tiny overhead compared to the disk IO that just fetched the object but its still a valid question I think). So what is the correct idiom for std::string constants?

for example - I can have

 const std::string FOO = "foo";

in a hdr, but then I get multiple copies

EDIT: No answer yet has said how to declare std::string constants. Ignore the whole map, STL, etc issue. A lot of code is heavily std::string oriented (mine certainly is) and it is natural to want constants for them without paying over and over for the memory allocation

EDIT2: took out secondary question answered by PDF from Manuel, added example of bad idiom

EDIT3: Summary of answers. Note that I have not included those that suggested creating a new string class. I am disappointed becuase I hoped there was a simple thing that would work in header file only (like const char * const ). Anyway

a) from Mark b

 std::map<int, std::string> dict;
 const int FOO_IDX = 1;
 ....
 dict[FOO_IDX] = "foo";
 ....
 std:string &val = dbo[dict[FOO_IDX]];

b) from vlad

 // str.h
 extern const std::string FOO;
 // str.cpp
 const std::string FOO = "foo";

c) from Roger P

 // really you cant do it

(b) seems the closest to what I wanted but has one fatal flaw. I cannot have static module level code that uses these strings since they might not have been constructed yet. I thought about (a) and in fact use a similar trick when serializing the object, send the index rather than the string, but it seemed a lot of plumbing for a general purpose solution. So sadly (c) wins, there is not simple const idiom for std:string

+8  A: 

The copying and lack of "string literal optimization" is just how std::strings work, and you cannot get exactly what you're asking for. Partially this is because virtual methods and dtor were explicitly avoided. The std::string interface is plenty complicated without those, anyway.

The standard requires a certain interface for both std::string and std::map, and those interfaces happen to disallow the optimization you'd like (as "unintended consequence" of its other requirements, rather than explicitly). At least, they disallow it if you want to actually follow all the gritty details of the standard. And you really do want that, especially when it is so easy to use a different string class for this specific optimization.

However, that separate string class can solve these "problems" (as you said, it's rarely an issue), but unfortunately the world has number_of_programmers + 1 of those already. Even considering that wheel reinvention, I have found it useful to have a StaticString class, which has a subset of std::string's interface: using begin/end, substr, find, etc. It also disallows modification (and fits in with string literals that way), storing only a char pointer and a size. You have to be slightly careful that it's only initialized with string literals or other "static" data, but that is somewhat mitigated by the construction interface:

struct StaticString {
  template<int N>
  explicit StaticString(char (&data)[N]); // reference to char array
  StaticString(StaticString const&); // copy ctor (which is very cheap)

  static StaticString from_c_str(char const* c_str); // static factory function
  // this only requires that c_str not change and outlive any uses of the
  // resulting object(s), and since it must also be called explicitly, those 
  // requirements aren't hard to enforce; this is provided because it's explicit
  // that strlen is used, and it is not embedded-'\0'-safe as the
  // StaticString(char (&data)[N]) ctor is

  operator char const*() const; // implicit conversion "operator"
  // here the conversion is appropriate, even though I normally dislike these

private:
  StaticString(); // not defined
};

Use:

StaticString s ("abc");
assert(s != "123"); // overload operators for char*
some_func(s); // implicit conversion
some_func(StaticString("abc")); // temporary object initialized from literal

Note the primary advantage of this class is explicitly to avoid copying string data, so the string literal storage can be reused. There's a special place in the executable for this data, and it is generally well optimized as it dates back from the earliest days of C and beyond. In fact, I feel this class is close to what string literals should've been in C++, if it weren't for the C compatibility requirement.

By extension, you could also write your own map class if this is a really common scenario for you, and that could be easier than changing string types.

Roger Pate
If there's enough demand, I could re-invent this class from scratch to put on SO under SO's open source license (I'm normally very careful about that). Let's say if this comment gets to +10 votes, then I'll do it. :)
Roger Pate
+7  A: 
  1. It's possible to avoid the overhead of creating a std::string when all you want is a constant string. But you'll need to write a special class for that because there's nothing similar in the STL or in Boost. Or a better alternative is to use a class like StringPiece from Chromium or StringRef from LLVM. See this related thread for more info.

  2. If you decide to stay with std::string (which you probably will) then another good option is to use the Boost MultiIndex container, which has the following feature (quoting the docs):

    Boost MultiIndex [...] provides lookup operations accepting search keys different from the key_type of the index, which is a specially useful facility when key_type objects are expensive to create.

Maps with Expensive Keys by Andrei Alexandrescu (C/C++ Users Journal, Feb. 2006) is related to your problem and is a very good read.

Manuel
@Manuel you are right - the pdf is an exact match and an interesting read. He does not discuss the constant question tho
pm100
A: 

The issue is that std::map copies the key and values into its own structures.

You could have a std::map<const char *, const char *>, but you would have to provide functional objects (or functions) to compare the key and value data, as this stencil is for pointers. By default, the map would compare pointers and not the data the pointers point to.

The trade off is one-time copy (std::string) versus accessing a comparator (const char *).

Another alternative is to write your own map function.

Thomas Matthews
+2  A: 

The correct idiom is the one you're using. 99.99% of the time there is no need to worry about the overhead of std::string's constructor.

I do wonder if std::string's constructor could be turned into an intrinsic function by a compiler? Theoretically it might be possible, but my comment above would be explanation enough for why it hasn't happened.

Mark Ransom
Better yet, just make the optimizer advanced enough to figure out how to make the use of strlen in string's ctor (the code boils down to the equivalent of that for std::string, or wcslen for wstring) an intrinsic, when used on a string literal. Harder, true, and I don't know if any do this, but it would also benefit more than just std::string. -- Oh, silly me, that's only half the battle, as you must copy the data still. Hmm.
Roger Pate
Yes, the problem is in recognizing that the constructor is being passed a static constant, and setting the internal storage pointer to it without copying. The standard type information doesn't go to that level of detail so it isn't possible to do without special compiler magic.
Mark Ransom
+2  A: 

It appears that you already know what the string literals will be at runtime, so you can set up an internal mapping between enumerated values and an array of strings. Then you would use the enumeration instead of an actual const char* literal in your code.

enum ConstStrings
{
    MAP_STRING,
    FOO_STRING,
    NUM_CONST_STRINGS
};

std::string constStrings[NUM_CONST_STRINGS];

bool InitConstStrings()
{
    constStrings[MAP_STRING] = "map";
    constStrings[FOO_STRING] = "foo";
}

// Be careful if you need to use these strings prior to main being called.
bool doInit = InitConstStrings();

const std::string& getString(ConstStrings whichString)
{
    // Feel free to do range checking if you think people will lie to you about the parameter type.
    return constStrings[whichString];
}

Then you would say map[getString(MAP_STRING)] or similar.

As an aside, also consider storing the return value by const reference rather than copy if you don't need to modify it:

const std::string& val = map["foo"];
Mark B
Why not just use an array of std::strings? http://codepad.org/6b3JkcJj
Roger Pate
i think because its harder to setup. i cannot go vec[i]="foo"; and expect the vector to grow. I have to know in advance how big it is. Of course the run time cost is higher
pm100
@pm100: You can use various code generation tricks to "know" the data in advance---similar to how the gettext i18n library can extract strings; however, this is a lot of work and you can simply manually put the strings into a header+implementation as required, as it's not something you should have to do often because we're talking about static string constants (string literals with special syntactic sugar, really).
Roger Pate
@Roger Pate: I actually hadn't thought about using the array template, and I do like that solution too. You aren't changing the configuration often enough to make it painful to update the N parameter.
Mark B
+4  A: 

It's simple: use

extern const std::string FOO;

in your header, and

const std::string FOO("foo");

in the appropriate .cpp file.

Vlad
what does 'approriate .cp file' mean?
pm100
For each your header (say, `foo.h`) one usually has an appropriate `.cpp` file (`foo.cpp` in our case). Headers can be included many times, so they usually contain data declarations (with usual exception of template classes, though). The `.cpp` file contains actual definitions.
Vlad