views:

1109

answers:

5

My application is having memory problems, including copying lots of strings about, using the same strings as keys in lots of hashtables, etc. I'm looking for a base class for my strings that makes this very efficient.

I'm hoping for:

  • String interning (multiple strings of the same value use the same memory),
  • copy-on-write (I think this comes for free in nearly all std::string implementations),
  • something with ropes would be a bonus (for O(1)-ish concatenation).

My platform is g++ on Linux (but that is unlikely to matter).

Do you know of such a library?

+2  A: 

If most of your strings are immutable, the Boost Flyweight library might suit your needs.

It will do the string interning, but I don't believe it does copy-on-write.

Ferruccio
If I use it to wrap a std::string, it looks like it wouldn't harm the copy-on-write.
Paul Biggar
+6  A: 

copy-on-write (I think this comes for free in nearly all std::string implementations)

I don't believe this is the case any longer. Copy-on-write causes problems when you modify the strings through iterators: in particular, this either causes unwanted results (i.e. no copy, and both strings get modified) or an unnecessary overhead (since the iterators cannot be implemented purely in terms of pointers: they need to perform additional checks when being dereferenced).

Additionally, all modern C++ compilers perform NRVO and eliminate the need for copying return value strings in most cases. Since this has been one of the most common cases for copy-on-write semantics, it has been removed due to the aforementioned downsides.

Konrad Rudolph
Not to mention multithreading issues...
Nicolas Simonet
Also, some strings have a small string optimization, using not the heap but an embedded small buffer for storing the data. Surprisingly, though, GCC's default string class (libstdc++) does copy-on-write.
Johannes Schaub - litb
A: 

One way is to create an invariant key class which contains a pointer to the string and comparison operators to string. Those string pointers will need to be managed in some kind of global map or reference counted map.

A less laborious but more opaque solution may be to key upon boost::shared_pointer, in this way on the pointers will be copied rather than the strings.

You will need to be able to map std::string -> boost::shared_pointer during lookup, so I would make that a function with a map, in simple terms

boost::shared_pointer<std::string> getsharedstring(std::string &s)
{
    typedef std::map<std::string, boost::shared_pointer<std::string>> ptrmap_t;
    static ptrmap_t m; // just for convenience for the example

    ptrmap_t::iterator it = m.find(s);

    if (it != m.end())
        return it->second;

    boost::shared_pointer<std::string> ptr(new std::string(s));
    m[s] = ptr;
    return ptr;
}

I haven't compiled this so forgive any typos.

polyglot
This seems to be what flyweight provides
Paul Biggar
No argument there, however if these keys were ubiquitous, a class based approach may communicate the design.The tradeoff with all C++ is that templated code does not make for good application binary interfaces, so specifically older style Key class abstraction with Key factory method (prohibiting direct dynamic and static allocation) may be appropriate depending upon the larger requirements of the project.
polyglot
+1  A: 

Take a look at The Better String Library from legendary Paul Hsieh

Indeera
That looks quite nice but it sorely lacks iterators.
Konrad Rudolph
I wouldn't trust any library write who states that "Bstrlib is, by design, impervious to memory size overflow attacks. The reason is it is resiliant to length overflows is that bstring lengths are bounded above by INT_MAX, instead of ~(size_t)0.". Yet another amateur who thinks he's found a silver bullet, that's how it appears.
MSalters
It doesn't seem to offer anything I'm looking for...
Paul Biggar
+2  A: 

Andrei Alexandrescu's 'Policy Based basic_string implementation' may help.

graham.reeds
This doesn't help that much, but it is none-the-less awesome.
Paul Biggar
It's been a while since I read it, but it sprung to mind when I saw your post.
graham.reeds