views:

448

answers:

5

In an ACM example, I had to build a big table for dynamic programming. I had to store two integers in each cell, so I decided to go for a std::pair<int, int>. However, allocating a huge array of them took 1.5 seconds:

std::pair<int, int> table[1001][1001];

Afterwards, I have changed this code to

struct Cell {
    int first;
    int second;
}

Cell table[1001][1001];

and the allocation took 0 seconds.

What explains this huge difference in time?

+18  A: 

std::pair<int, int>::pair() constructor initializes the fields with default values (zero in case of int) and your struct Cell doesn't (since you only have an auto-generated default constructor that does nothing).

Initializing requires writing to each field which requires a whole lot of memory accesses that are relatively time consuming. With struct Cell nothing is done instead and doing nothing is a bit faster.

sharptooth
does it take 1.5 seconds to set about 8 MB's to zero?
Etan
Don't forget cache misses.
sharptooth
No, the problem is the function calls to the constructor. If your array is global, it's initialized to zero anyway. Probably the compiler doesn't optimize out the constructor calls.
Johannes Schaub - litb
You need to call the constructor, find the memory, set it, etc... it's not just setting N bits of contigous memory to 0 via memset.
Lemurik
@Lemurik: I’m not convinced. All the `pair` constructor does is delegating to the constructors of its (POD) members. A compiler could (and IMHO, *should*) recognize this, elide the constructor call and simply zero out the memory. The analysis to do this surely cannot be hard?
Konrad Rudolph
… alas, it doesn’t (just tested it). :-(
Konrad Rudolph
+1  A: 

My guess it's the way std::pair is created. There is more overhead during invoking a pair constructor 1001x1001 times than when you just allocate a memory range.

Ashalynd
+1  A: 

it is really a good example about one should write C++ and use STL carefully. any thoughts?

my project is working on a C&C++ code level benchmark test tool in which we'll make lots of code samples to found out what is "good" code and what is a "bad" coding habit. see http://effodevel.googlecode.com to learn more about C9B.M. planning. anyone if you had experienced lots of such cases, please kindly join the project to help us.

EffoStaff Effo
This isn't an answer to the question.
Brian Neal
Any thoughts? Yes. Your project seems to buy into the general idea that optimization is all about *low-level optimization* (and big-O). I suggest the problem is much bigger than that, namely *galloping generality*, and that you might consider a broader scope.http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773
Mike Dunlavey
no plan to do a bigger one yet presently. i'd make progress step by step, such as code level, algo level and architecture level, and so on. we're understanding language and compilers now. thanks for your comments
EffoStaff Effo
Let me suggest, when you want to be more ambitious, you discard any prior ideas of what the higher levels might be. Rather, scrounge around your organization for real (large) code that has performance problems, and learn from them. That's how I found out that worrying about low-level optimization is just like looking for one's keys under the street light because that's where the light is.
Mike Dunlavey
i have no performance problems with my software (80K LOC); so your problem is not my point. what we are making is that a tool helping develops to aware of and then understand what is "good" code and what is a "bad" coding habit, say a coding guideline. that's all.
EffoStaff Effo
+6  A: 

The answers so far don't explain the full magnitude of the problem.

As sharptooth has pointed out, the pair solution initializes the values to zero. As Lemurik pointed out, the pair solution isn't just initializing a contiguous block of memory, instead it is calling the pair constructor for every element in the table. However, even that doesn't account for it taking 1.5 seconds. Something else is happening.

Here's my logic:

Assuming you were on an ancient machine, say running at 1.33ghz, then 1.5 seconds is 2e9 clock cycles. You've got 2e6 pairs to construct, so some how each pair constructor is taking 1000 cycles. It doesn't take 1000 cycles to call a constructor that just sets to integers to zero. I would believe it if the number was less than 100 cycles.

I thought it would be interesting to see where else all these CPU cycles are going. I used the crappiest oldest C++ compiler I could find to see if I could attain the level of wastage required. That compiler was VC++ v6. In debug mode, it does something I don't understand. It has a big loop that calls the pair constructor for each item in the table - fair enough. That constructor sets the two values to zero - fair enough. But just before doing that, it sets all the bytes in a 68 byte region to 0xcc. That region is just before the start of the the big table. It then overwrites the last element of that region with 0x28F61200. Every call of the pair constructor repeats this. Presumably this is some kind of book keeping by the compiler so it knows which regions are initialized when checking for pointer errors at run time. I'd love to know exactly what this is for.

Anyway, that would explain where the extra time is going. Obviously another compiler may not be this bad. And certainly an optimized release build wouldn't be.

Andrew Bainbridge
+1. congrats to retagging permissions
Etan
+1  A: 

These are all very good guesses, but as everyone knows, guesses are not reliable.

I would say just randomly pause it within that 1.5 seconds, but you'd have to be pretty quick. If you increased each dimension by a factor of about 3, you could make it take more like 10+ seconds, so it would be easier to pause.

Or, you could get it under a debugger, break it in the pair constructor code, and then single step to see what it is doing.

Either way, you would get a firm answer to the question, not just a guess.

Mike Dunlavey