views:

108

answers:

3

When looking at some of our logging I've noticed in the profiler that we were spending a lot of time in the operator<< formatting ints and such. It looks like there is a shared lock that is used whenever ostream::operator<< is called when formatting an int(and presumably doubles). Upon further investigation I've narrowed it down to this example:

Loop1 that uses ostringstream to do the formatting:

DWORD WINAPI doWork1(void* param)
{
    int nTimes = *static_cast<int*>(param);
    for (int i = 0; i < nTimes; ++i)
    {
        ostringstream out;
        out << "[0";
        for (int j = 1; j < 100; ++j)
            out << ", " << j; 
        out << "]\n";
    }
    return 0;
}

Loop2 that uses the same ostringstream to do everything but the int format, that is done with itoa:

DWORD WINAPI doWork2(void* param)
{
    int nTimes = *static_cast<int*>(param);
    for (int i = 0; i < nTimes; ++i)
    {
        ostringstream out;
        char buffer[13];
        out << "[0";
        for (int j = 1; j < 100; ++j)
        {
            _itoa_s(j, buffer, 10);
            out << ", " << buffer;
        }
        out << "]\n";
    }
    return 0;
}

For my test I ran each loop a number of times with 1, 2, 3 and 4 threads (I have a 4 core machine). The number of trials is constant. Here is the output:

doWork1: all ostringstream
n       Total
1         557
2        8092
3       15916
4       15501

doWork2: use itoa
n       Total
1         200
2         112
3         100
4         105

As you can see, the performance when using ostringstream is abysmal. It gets 30 times worse when adding more threads whereas the itoa gets about 2 times faster.

One idea is to use _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) as recommended by M$ in this article. That doesn't seem to help me. Here's another user who seem to be having a similar issue.

We need to be able to format ints in several threads running in parallel for our application. Given this issue we either need to figure out how to make this work or find another formatting solution. I may code up a simple class with operator<< overloaded for the integral and floating types and then have a templated version that just calls operator<< on the underlying stream. A bit ugly, but I think I can make it work, though maybe not for user defined operator<<(ostream&,T) because it's not an ostream.

I should also make clear that this is being built with Microsoft Visual Studio 2005. And I believe this limitation comes from their implementation of the standard library.

+1  A: 

Doesn't surprise me, MS has put "global" locks on a fair few shared resources - the biggest headache for us was the BSTR memory lock a few years back.

The best thing you can do is copy the code and replace the ostream lock and shared conversion memory with your own class. I have done that where I write the stream using a printf-style logging system (ie I had to use a printf logger, and wrapped it with my stream operators). Once you've compiled that into your app you should be as fast as itoa. When I'm in the office I'll grab some of the code and paste it for you.

EDIT: as promised:

CLogger& operator<<(long l)
{
 if (m_LoggingLevel < m_levelFilter)
  return *this;

 // 33 is the max length of data returned from _ltot
 resize(33);

 _ltot(l, buffer+m_length, m_base);
 m_length += (long)_tcslen(buffer+m_length);

 return *this;
};

static CLogger& hex(CLogger& c)
{
 c.m_base = 16;
 return c;
};

void resize(long extra)
{
 if (extra + m_length > m_size)
 {
  // resize buffer to fit.
  TCHAR* old_buffer = buffer;
  m_size += extra;
  buffer = (TCHAR*)malloc(m_size*sizeof(TCHAR));
  _tcsncpy(buffer, old_buffer, m_length+1);
  free(old_buffer);
 }
}

static CLogger& endl(CLogger& c)
{
 if (c.m_length == 0 && c.m_LoggingLevel < c.m_levelFilter)
  return c;

 c.Write();
 return c;
};

Sorry I can't let you have all of it, but those 3 methods show the basics - I allocate a buffer, resize it if needed (m_size is buffer size, m_length is current text length) and keep it for the duration of the logging object. The buffer contents get written to file (or OutputDebugString, or a listbox) in the endl method. I also have a logging 'level' to restrict output at runtime. So you just replace your calls to ostringstream with this, and the Write() method pumps the buffer to a file and clears the length. Hope this helps.

gbjbaanb
I'm struggling with how to make it as seamless as possible to do the formatting. A good chunk is for logs, though some for passing info back to the user.
Matt Price
+1  A: 

If the Visual Studio 2005's standard library implementation has bugs why not try other implementations? Like:

or even Dinkumware upon which Visual Studio 2005 standard library is based on, maybe the have fixed the problem since 2005.

Edit: The other user you mentioned used Visual Studio 2008 SP1, which means that probably Dinkumware has not fixed this issue.

Cristian Adam
+1  A: 

The problem could be memory allocation. malloc which "new" uses has an internal lock. You can see it if you step into it. Try to use a thread local allocator and see if the bad performance disappears.

leiz
We've run into new causing trouble and in this case it isn't a problem. I've set up the tests in a way that both use new in a similar way.
Matt Price