tags:

views:

542

answers:

8

This loop is slower than I would expect, and I'm not sure where yet. See anything?

I'm reading an Accces DB, using client-side cursors. When I have 127,000 rows with 20 columns, this loop takes about 10 seconds. The 20 columns are string, int, and date types. All the types get converted to ANSI strings before they are put into the ostringstream buffer.

void LoadRecordsetIntoStream(_RecordsetPtr& pRs, ostringstream& ostrm)
{
 ADODB::FieldsPtr pFields = pRs->Fields;
 char buf[80];
 ::SYSTEMTIME sysTime;
 _variant_t var;

 while(!pRs->EndOfFile) // loop through rows
 {
  for (long i = 0L; i < nColumns; i++)  // loop through columns
  {

   var = pFields->GetItem(i)->GetValue();

   if (V_VT(&var) == VT_BSTR)
   {
    ostrm << (const char*) (_bstr_t) var;   
   }
   else if (V_VT(&var) == VT_I4
   || V_VT(&var) == VT_UI1
   || V_VT(&var) == VT_I2
   || V_VT(&var) == VT_BOOL)
   {
    ostrm << itoa(((int)var),buf,10);
   }
   else if (V_VT(&var) == VT_DATE)
   {
    ::VariantTimeToSystemTime(var,&sysTime);
    _stprintf(buf, _T("%4d-%02d-%02d %02d:%02d:%02d"),
    sysTime.wYear, sysTime.wMonth, sysTime.wDay, 
    sysTime.wHour, sysTime.wMinute, sysTime.wSecond);

    ostrm << buf;
   }
  }

  pRs->MoveNext();
 }
}

EDIT: After more experimentation...

I know now that about half the time is used by this line:
var = pFields->GetItem(i)->GetValue();

If I bypass the Microsoft generated COM wrappers, will my code be faster? My guess is no.

The othe half of the time is spent in the statements which convert data and stream it into the ostringstream.

I don't know right now as I write this whether it's the conversions or the streaming that is taking more time.

Would it be faster if I didn't use ostringstream and instead managed my own buffer, with my own logic to grow the buffer (re-alloc, copy, delete)? Would it be faster if my logic made a pessimistic guesstimate and reserved a lot of space for the ostringstream buffer up front? These might be experiments worth trying.

Finally, the conversions themselves. None of the three stand out in my timings as being bad. One answer says that my itoa might be slower than an alternative. Worth checking out.

+2  A: 

I can't tell from looking at your code, someone more familiar with COM/ATL may have a better answer.

By trial n error I would find the slow code by commenting out inner loop operations out until you see perf spike, then you have your culprit and should focus on that.

CVertex
+2  A: 

I assume V_VT is a function - if so, then for each date value, V_VT(&var) is called 6 times. A simple optimisation is to locally store the value of V_VT(&var) to save up to save up to 5 calls to this function each time around the loop.

If you haven't already done so, re-order the if tests for the types to put the most common column types first - this reduces the number of tests required.

Kevin Haines
a macro, not a function, something like #define V_VT(X) X->vt
Corey Trager
+1  A: 

A good part of it is that Access is not a server database - all the file reads/writes, locking, cursor handling, etc. is taking place within the client application (across a network, right?) And needs to be so, if other users have the database open at the same time.

If not, you probably would be able to drop the cursor settings, and open the database read-only.

le dorfier
I might be mistaken, but I think that because it is a client-side cursor, when the loop starts, the data is already in a memory buffer under ADO's control, not Access's control. Jet is out of the picture.Just the loop through the rows takes almost no time at all.
Corey Trager
A: 

Try profiling. If you don't have a profiler a simple way could be to wrap all calls in your loop you think may take some time with something like the following:

#define TIME_CALL(x) \
do { \
  const DWORD t1 = timeGetTime();\
  x;\
  const DWORD t2 = timeGetTime();\
  std::cout << "Call to '" << #x << "' took " << (t2 - t1) << " ms.\n";\
}while(false)

So now you can say:

TIME_CALL(var = pFields->GetItem(i)->GetValue());
TIME_CALL(ostrm << (const char*) (_bstr_t) var);

and so on...

Andreas Magnusson
@Milan: Why did you edit my post? According to "Exceptional C++" by Herb Sutter one should strive to minimize the number of calls top operator<<() since it can throw an exception. Less calls, smaller chance of exceptions. Please stop editing posts to your own preference!
Andreas Magnusson
+1  A: 

Try commenting out the code in the for loop and comparing the time. Once you have a reading, start uncommenting various sections until you hit the bottle-neck.

ilitirit
+1  A: 

As a basic idea you should try to see the speed of the code when you have only VT_BSTR conversion, after that with VT_DATE and at last with the other types, see which is taking the most time.

The only observation I have is that itoa is not standard C. The implementation may be very slow as you can see from this article.

Thanks. After I wrote my question I did exactly that. I know now that "var = pFields->GetItem(i)->GetValue()" takes about half the time, and the conversions/streaming into ostrm takes the other half. None of the conversions indivudally stands out as being particularly bad. Thanks re itoa.
Corey Trager
A: 

You don't need the itoa - you're writing to a stream.

Sorry, my fault. I omitted some lines of code to make the example clearer, but there are delimiters in the stream too. I have to covert everything to ascii because so as not to collide with the delimiters.
Corey Trager
According to my testing, ostrm << itoa() is faster than ostrm << nMyInt
Corey Trager
A: 

To answer your new question I think you should use the fact that you can let the stream format your data instead of formatting it into a string and then pass that string to the stream, e.g.:

_stprintf(buf, _T("%4d-%02d-%02d %02d:%02d:%02d"),
                  sysTime.wYear, sysTime.wMonth, sysTime.wDay, 
                  sysTime.wHour, sysTime.wMinute, sysTime.wSecond);

ostrm << buf;

Turns into:

ostrm.fill('0');
ostrm.width(4);
ostrm << sysTime.wYear << _T("-");
ostrm.width(2);
ostrm << sysTime.wMonth;

And so on...

Andreas Magnusson