tags:

views:

477

answers:

8

I want to hold a bunch of const char pointers into an std::set container [1]. std::set template requires a comparator functor, and the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.

I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them, but is there a cleaner, 'standard' way of doing it?

Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.

1: The pointers are to static strings, so there is no problem with their lifetimes - they won't go away.

A: 

Depending on how big a "bunch" is, I would be inclined to store a corresponding bunch of std::strings in the set. That way you won't have to write any extra glue code.

Greg Hewgill
+3  A: 

The "optimized way"

If we ignore the "premature optimization is the root of all evil", the standard way is to add a comparator, which is easy to write:

struct MyCharComparator
{
   bool operator()(const char * A, const char * B) const
   {
      return (strcmp(A, B) < 0) ;
   }
} ;

To use with a:

std::set<const char *, MyCharComparator>

The standard way

Use a:

std::set<std::string>

It will work even if you put a static const char * inside (because std::string, unlike const char *, is comparable by its contents).

Of course, if you need to extract the data, you'll have to extract the data through std::string.c_str(). In the other hand, , but as it is a set, I guess you only want to know if "AAA" is in the set, not extract the value "AAA" of "AAA".

Note: I did read about "Please do not suggest creating std::strings", but then, you asked the "standard" way...

The "never do it" way

I noted the following comment after my answer:

Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.

This smells of C (use of the deprecated "static" keyword, probable premature optimization used for std::string bashing, and string comparison through their addresses).

Anyway, you don't want to to compare your strings through their address. Because I guess the last thing you want is to have a set containing:

{ "AAA", "AAA", "AAA" }

Of course, if you only use the same global variables to contain the string, this is another story.

In this case, I suggest:

std::set<const char *>

Of course, it won't work if you compare strings with the same contents but different variables/addresses.

And, of course, it won't work with static const char * strings if those strings are defined in a header.

But this is another story.

paercebal
That's right. I just would make the operator const. Hey, didn't I just type that?
xtofl
:-D ... you're right !
paercebal
"of course it won't work with static const char* strings if those strings are defined in a header." Why not? What's magic about headers?
fizzer
If a *static* string (say "A") is defined in a header, if that header is included in three different sources, and if you add in the set "A" once, in each source, then you'll have potentially 3 different "A", with different addresses, which I guess, was not the point of putting the string in the head
paercebal
Potentially, although modern linkers usually optimize identical constant strings (and the most aggressive one even substrings) to share the same storage.
florin
You're right, but this optimization depends on: 1 - You use the optimization, which in debug, usually, you won't (ie the code above won't work on debug). 2 - It is a built-in type, because objects have constructors/destructors. All in all, it is an optimization, not a feature we should rely upon.
paercebal
The conclusion is that we can't rely on potential optimization to have the compiler merge 3 similar strings defined in 3 different compilation units. And using static on a string defined on a header included in three sources is exactly that. This is a reason the "static" is a bad idea in this case.
paercebal
(of course, it was tested on g++ 4.2.4)
paercebal
A: 

Must the set contain const char*?

What immediately springs to mind is storing the strings in a std::string instead, and putting those into the std::set. This will allow comparisons without a problem, and you can always get the raw const char* with a simple function call:

const char* data = theString.c_str();
Matt Dillard
+5  A: 

The standard way of doing it, if you don't want to wrap them in std::strings, is to define a functor class:

struct ConstCharStarComparator
{
  bool operator()(const char *s1, const char *s2) const
  {
    return strcmp(s1, s2) < 0;
  }
}

typedef std::set<onst char *, ConstCharStarComparator> stringset_t;
stringset_t myStringSet;
Adam Rosenfield
That's right. I just would make the operator const.
xtofl
Whoops, thanks, fixed now.
Adam Rosenfield
A: 

Presumably you don't want to use std::string because of performance reasons.

I'm running MSVC and gcc, and they both seem to not mind this:

bool foo = "blah" < "grar";

EDIT: However, the behaviour in this case is unspecified. See comments...

They also don't complain about std::set<const char*>.

If you're using a compiler that does complain, I would probably go ahead with your suggested functor that casts the pointers to ints.

Edit: Hey, I got voted down... Despite being one of the few people here that most directly answered his question. I'm new to Stack Overflow, is there any way to defend yourself if this happens? That being said, I'll try to right here:

The question is not looking for std::string solutions. Every time you enter an std::string in to the set, it will need to copy the entire string (until C++0x is standard, anyway). Also, every time you do a set look-up, it will need to do multiple string compares.

Storing the pointers in the set, however, incurs NO string copy (you're just copying the pointer around) and every comparison is a simple integer comparison on the addresses, not a string compare.

The question stated that storing the pointers to the strings was fine, I see no reason why we should all immediately assume that this statement was an error. If you know what you're doing, then there are considerable performance gains to using a const char* over either std::string or a custom comparison that calls strcmp. Yes, it's less safe, and more prone to error, but these are common trade-offs for performance, and since the question never stated the application, I think we should assume that he's already considered the pros and cons and decided in favor of performance.

Andrew Top
fizzer
FWIW - your '<' example is unspecified behaviour in C++ (undefined in C), but the set is OK. I didn't vote you down though.
fizzer
Thanks for the comments. Bummer, but oh well... SO's evolution should be interesting.How about that unspecified behaviour. I guess I should warn about that. Thanks.
Andrew Top
I didn't give the downvote, but it was well deserved. Your solution fails to sort the input by anything relevant, and doesn't work for two identical strings that are at different addresses. Just because it compiles doesn't mean it works.
Mark Ransom
"The strings are static, so they can be compared for (in)equality based on their address". I understand that this may be a bad idea, but I was answering his question, not giving my opinion about best practices. Also, the concept of a set doesn't exist to sort data, it exists for membership testing.
Andrew Top
A: 

Either use a comparator, or use a wrapper type to be contained in the set. (Note: std::string is a wrapper, too....)

const char* a("a");
const char* b("b");

struct CWrap {
 const char* p;
 bool operator<(const CWrap& other) const{
  return strcmp( p, other.p ) < 0;
 }
 CWrap( const char* p ): p(p){}
};

std::set<CWrap> myset;
myset.insert(a);
myset.insert(b);
xtofl
A: 

Others have already posted plenty of solutions showing how to do lexical comparisons with const char*, so I won't bother.

Please do not suggest creating std::strings - it is a waste of time and space.

If std::string is a waste of time and space, then std::set might be a waste of time and space as well. Each element in a std::set is allocated separately from the free store. Depending on how your program uses sets, this may hurt performance more than std::set's O(log n) lookups help performance. You may get better results using another data structure, such as a sorted std::vector, or a statically allocated array that is sorted at compile time, depending on the intended lifetime of the set.

the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.

The strings are static, so they can be compared for (in)equality based on their address.

That depends on what the pointers point to. If all of the keys are allocated from the same array, then using operator< to compare pointers is not undefined behavior.

Example of an array containing separate static strings:

static const char keys[] = "apple\0banana\0cantaloupe";

If you create a std::set<const char*> and fill it with pointers that point into that array, their ordering will be well-defined.

If, however, the strings are all separate string literals, comparing their addresses will most likely involve undefined behavior. Whether or not it works depends on your compiler/linker implementation, how you use it, and your expectations.

If your compiler/linker supports string pooling and has it enabled, duplicate string literals should have the same address, but are they guaranteed to in all cases? Is it safe to rely on linker optimizations for correct functionality?

If you only use the string literals in one translation unit, the set ordering may be based on the order that the strings are first used, but if you change another translation unit to use one of the same string literals, the set ordering may change.

I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them

Casting the pointers to uintptr_t would seem to have no benefit over using pointer comparisons. The result is the same either way: implementation-specific.

bk1e
+1  A: 

Just go ahead and use the default ordering which is less<>. The Standard guarantees that less will work even for pointers to different objects:

"For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not."

The guarantee is there exactly for things like your set<const char*>.

fizzer
I think this shows why *everybody* should have a copy of the standard (or a good draft) which they can check. First answer in 6 that, IMHO, answers the question fully (+1)
Richard Corden
This is the correct answer (even though it does not solve my problem, since I have to use an older compiler which lacks the required specialization). I tried to downvote the wrong answers, but the wisdom of the crowds beat me...
florin