tags:

views:

135

answers:

4

I had a need to do a case insensitive find and found the following code which did the trick

bool ci_equal(char ch1, char ch2)
{
    return toupper((unsigned char)ch1) == toupper((unsigned char)ch2);
}

size_t ci_find(const string& str1, const string& str2)
{
    string::const_iterator pos = std::search(str1. begin ( ), str1. end ( ), str2.
    begin ( ), str2. end ( ), ci_equal);
    if (pos == str1. end ( ))
        return string::npos;
    else
        return pos - str1. begin ( );
}

That got me to wondering what it would take to make this a member function of 'string' so it could be called like this:

string S="abcdefghijklmnopqrstuv";
string F="GHI";

S.ci_find(F);

I realize that there are many problems with case conversions in non-English languages but that's not the question I'm interested in.

Being a neophyte, I quickly got lost among containers and templates.

Is there anyway to do this? Could someone point to me an example of something similar?

+1  A: 

std::string is not made to be extended.

You could encapsulate an std::string into a class of yours and set those member functions in that class.

Tom
+8  A: 

I think most more experienced C++ programmers would agree that this is poor idea. If anything, std::string already has way too many member functions, and adding still more will make a bad situation worse. Worse still, if you were going to do this, you'd probably do it by inheritance -- but std::string isn't designed to be used as a base class, and using it as one will lead to code that's fragile and error-prone.

For another idea of how to do this, you might want to read Guru of the Week #29. Do read the whole article though, to get an idea of both how to do it, and why you probably don't want to. Ultimately, what you have right now is probably the best option -- keep the case insensitive searching separate from std::string itself.

Jerry Coffin
Agreed that `std::basic_string` is not designed to be inherited from because it does not have a virtual destructor and so a polymorphic deletion of an instance of your extended class (call it `stringex`) using a `basic_string` pointer would leak because `stringex`'s destructor doesn't get called. But what if `stringex` didn't need, and doesn't even define a destructor, because all that this class contains are native types and additional member functions, no memory allocation. Is there still a problem in that case?
Praetorian
@Praetorian: it is still undefined behavior, and it is still unnecessary because the idiomatic way to achieve the same (while better preserving encapsulation) is to define non-member functions.
jalf
@jalf: What's undefined about it? (I'm not arguing with you, just trying to understand). In the scenario I described, the only destructor that needs to be called is `basic_string`'s, which should be called when the `stringex` object gets deleted using a `basic_string` pointer regardless of whether the latter has a `virtual` destructor or not. Or is the undefined part the usual use case where it is not defined whether `stringex`'s destructor will call `basic_string`'s destructor?
Praetorian
@Praetorian: It's undefined because the pointer passed to `operator delete` isn't the same as the one returned by `operator new` unless you either have a virtual destructor or delete the most-derived object.
Potatoswatter
Note that replacing the `char_traits` parameter *replaces* case-sensitive behavior with case-insensitive. To perform a case-sensitive `find` on a `ci_string`, a free function would be needed… full circle.
Potatoswatter
@praetorian: Re your last comment. After a bit of study, I more-or-less understand the discussion but how are the pointers different in these two cases? I suppose the base class instance is somehow embedded in the derived class instance and the two are different?
Mike D
@Mike D: I didn't know they were different either. Hopefully Potatoswatter will see this and shed some more light on the matter
Praetorian
@Mike, Praetorian: the undefined behavior arises if you have something like: `std::string *s = new stringex; delete s;` with a non-virtual dtor. You probably wouldn't do that intentionally, but if you use public inheritance, the compiler allows implicit conversion from the derived to the base class -- which makes an accident pretty easy. It's undefined because with it non-virtual, the compiler does like usual, and invokes the dtor based on the type of the pointer instead of the type of the object it points at. Trying to sort out when that would be safe wasn't worthwhile, so it's always UB.
Jerry Coffin
@Jerry et al: Sorry each glimmer of light leads to new ?s. (1) So I'd have the same problem with any class/derived class (even my own) if I had the same mismatch? (2) And things would be ok if this mismatch didn't occur? (3) IF the base class had a virtual dtor and the derived class had a dtor, would the base or derived dtor be called in your example? (4) Can the derived dtor call the base dtor after doing it's thing (or is this done automagically)? Or how can it know what needs to be cleaned up?
Mike D
@Mike: 1. Yes. 2. With most classes it would be safe, but with `string` it's harder to say -- being a heavily used library class, there's a lot of motivation to optimize it, possibly in ways that could cause problems you wouldn't normally encounter. 3. Both get called -- first derived, then base. 4. yes, it happens automatically.
Jerry Coffin
@Jerry: Thanks very much; I understand a few things better now. Amazing 4 someone who turned 70 last Sunday :-)
Mike D
+3  A: 

Perhaps following the Standard Library Algorithms <algorithm> methodology could be beneficial. It wouldn't surprise users as much. :-)

Algorithm Functions as an example.

JustBoo
Yeah, although as his implementation of `ci_find` shows, if he provided this as an algorithm it would just be a slight variant of `std::search` using a particular predicate.
Steve Jessop
A: 
rwong
So, I define a class qqq with a definition of the () operator, declare an instance ci_find and then call ci_find(s1,s2). Why is that any better than simply defining ci_find as I did? I guess I could encapsulate the compare function within the class. Or are you saying that it could be made part of string or basic_string?
Mike D
@Mike: I'm sorry for giving a wrong answer caused by me not reading the question carefully. Your question, in fact, is a extremely interesting one and warrants a more detailed response. I'm going to abandon my first answer and write up a second one. (Sorry it'll take a long time.) thanks.
rwong
@Mike: (1) if your platform or C++ library provides a case-insensitive substring search, use it. It might be named `stristr`, `wcsistr` etc. (2) if you're going to write your own, make sure you understand __Knuth-Morris-Pratt__ and __Boyer-Moore__ algorithms. (3) Also make sure you understand `char_traits`. As @Potatoswatter pointed out, your case-insensitive `char_traits` shall not be used to specialize `basic_string`, but rather is used to specialize your __KMP or BM__ implementation. It's non-trivial, so try harder looking for an existing implementation, or use your current one.
rwong
@rwong: I'm not really concerned about the specific task of a case insensitive find. Rather I was curious to know if I could somehow add functions to string or basic_string. I can see lots of uses for this and to me it's much more elegant than adding bare functions or classes. I have seen examples of functors used with things like multimap. I'll look forward to anything more you have to say on this issue.
Mike D
@Mike: as @SteveJessop pointed out, C++ programmers try to be as pragmatic as possible and are not addicted to syntactic sugar. They are much more concerned about preventing possible bugs that might be introduced by breaking the encapsulation, a.k.a. extending the `basic_string` class.
rwong
@Mike: thanks for letting me know ... I guess I won't be writing up the answer then ... sorry
rwong