views:

231

answers:

5

Hi,

Are the vectors defined in the C++ STL re-entrant or thread safe? Can I use two threads and work(in this case sort) on two halfs of a vector without using a mutex? For example:

int size = (Data_List.size())/2;

Thread A()
{

................ // Do we need to lock Data_list with a mutex
 sort(Data_List.begin(),Data_List.begin()+size,cmp);
}

Thread B()
{
....// Do we need to lock Data_list with a mutex
sort(Data_List.begin()+(size+1),Data_List.end(),cmp);
}

My Question is do we need to lock the access of Data_List using a mutex?

Note: the cmp function is a regular int comparision function.

+1  A: 

You shouldnt have any problems if the threads work on disjoint parts of the vector.

Tom
+5  A: 

As long as the threads are working on different regions of memory and your comparison function only works with that memory and local variables, you should be ok. Essentially, you are "locking" each half of the table by dividing the work between the threads and only letting the thread work on its half of the data.

tvanfosson
"only letting the thread work on its half of the data". Unless your architecture does RMW writes, and your partition of the vector straddles a word boundary. Then the two threads are accessing the same memory.
Steve Jessop
@Steve Jessop -- I'm assuming that the compiler would pad any user defined types to word boundaries, but for types that are smaller than the word size of the machine, you certainly do need to be more careful.
tvanfosson
+1  A: 

You can use the parallel mode of GCC (if you use GCC) for threaded versions of various algorithms. http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html

Tronic
+2  A: 

technically the standard says that these classes aren't thread safe, so if you're setting the data from things like the [] operator then technically that's multiple writes on one object. On the other hand, it's safe to operate on separate parts of a c array...so there seems to be a conflict here. If you want to be squeaky clean take the address of the first element and treat it as a c-array. A lot of answers here say it's fine as long as you're on separate parts of the array, and although that is probably true on many implementations

->I think it's important to note that an implementation is free not make this safe.

Chris H
+3  A: 

Nearly, but not quite. This code is in general not thread-safe, even assuming the implementation makes reasonable guarantees about the thread-safety of vector.

Suppose Data_List::value_type is a type for which your architecture does not provide atomic access. For example on x86, byte-writes and aligned word- and dword-writes are atomic, but short-writes are not, and writes of user-defined types aren't unless they happen to be a good size and alignment. If your UDT has size 3, you could have a problem.

To perform a non-atomic short write, an implementation might do two byte writes. Or it might load the word, modify two of the bytes, and write the word back. On architectures where a byte write is expensive, the latter is quite plausible.

Now, suppose your implementation does the latter. Suppose further that your division of the vector happens to leave the first half of a word in one portion, and the second half of the word in the other. Suppose finally that the two different threads both try to modify that word simultaneously. This can go wrong - they might both read the same value, both modify it, but then one write back will occur before the other, so one of the modifications will be overwritten and lost.

Atomic byte-access by default isn't universal, and I doubt any implementation does atomic bit-access by default. So a vector<bool> is a possible problem even if other vector types are safe: your division of the vector could go down the middle of a byte. Not that you'd generally sort bools this way, but there are other operations where you might try to divide vectors, or your code might be generic.

You can usually expect word-access to be atomic (and in C or C++, int access). But it's not guaranteed by the language standard, hence sig_atomic_t. You say your cmp is an int comparison function, which suggests that maybe your vector contains ints. But you can compare shorts perfectly well using an int comparison function, so I don't know.

What you actually need to do is check your implementation/platform very carefully, and see what it says about safely accessing memory from multiple threads. It's probably fine for a vector of int.

Steve Jessop
@Steve: thnx for that .. Really appreciate it!!! ... I decide to copy each offset in separate buckets and sort them individually. I was kinda comparing this problem with the Strtok ( the need for strtok_r).So I was wondering if the iterator will be corrupted by two threads if we dont push in a static variable.. After implementing , it doesn seem to be corrupted ... so i am good .. as for the sorting part... i need to test it !!
tomkaith13