views:

511

answers:

6

Hi,

I got a piece of code that loops through the array and looks for the similar and same strings in it - marking it whether it's unique or not.

loop X array for I 
 ( loop X array for Y
   (
     If X is prefix of Y do. else if x is same length as Y and it's prefix do something.
   )
Here is the code to finilize everything for I and corresponding (found/not found) matches in Y.
 )

I'd like to make this for dual-core to multithread it. To my knowledge it is not possible, but it's highly probable that you may have some idea.

+1  A: 

If the array sizes are considerable you could get some benefit of parallelizing, perhaps split the array in two and process each half in parallel. You should check out the Parallel Framework to do the multithreading.

Peter Lillevold
So I can parallel the inner loop without any problem?
A: 

I am not sure if Parallel Extensions to .net is your answer. You may check it out from Download page and Project's blog

Canton
+1  A: 

I understand your question that you would like to know, how to parallelize this code:

I think you would get much more speedup from using a better algorithm: e.g. sort the array (you could do this using mergesort in parallel) and only compare adjacent entries. You can then also to the comparison easily in parallel by processing each half of the array by a separate thread.

If you need more details just let us know ...

MartinStettner
A: 

In general the idea would be to have one thread handle half the data and the other thread handle the other half -- i.e., thread one does odd indices, thread two does even. Unfortunately there's not really enough information about your problem to give any sort of reasonable answer since we have no idea whether there are any dependencies between the various actions. Say, for instance, that if I do find a prefix match that means that I want to modify the next element in the array to remove any prefix in it. Clearly, this dependency will break the naive parallel implementation. If your actions on the data are independent, though, this should be reasonably easy to parallelize by simply dividing the work.

tvanfosson
+1  A: 

A parallel algorithm might look like this:

  • Sort the list of terms alphabetically using a parallel sort algorithm
  • Divide the sorted list of terms into chunks, such that all interesting terms are in the same chunk. If I'm not mistaken, so long as each chunk starts with the same character, you'll never have interesting matches across two chunks (matches and prefixes will always have the same first character, right?)
  • For each chunk, find terms that are prefixes and/or matches, and take appropriate action. This should be easy, since matching terms will be right next to each other (since the big list is sorted, each chunk will be too).

Some notes:

This requires a parallel sort algorithm. Apparently these exist, but I don't know much else about them, since I've never had to use one directly. Your Mileage May Vary.

The second step (split the workload into chunks) doesn't appear to itself be parallelizable. You could implement it with modified binary searches to find where the first character changes, so hopefully this part is cheap, but it might not be, and you probably won't know for sure until you measure.

If you end up with many chunks, and one is by far the largest, your performance will suck.

Have you considered keeping the algorithm single-threaded, but changing it so that the first step sorts the list?

Currently the algorithm described in the question is O(n^2), as it loops through the list once per element in the list. If the list is sorted, then duplicates can be found in one pass through the list (duplicates will be right next to each other) -- including the sort, this is a total cost of O(n log n). For large data sets, this will be much, much, faster. Hopefully it will be fast enough that you can avoid multiple threads, which will be a lot of work.

Sean Reilly
A: 

If the middle check is a long running process you might run it as separate thread, and then at the end just join all threads (since you got so many threads use thread pooling with 2 threads limit -you shouldn't not launch all of them run, wait finish one launch a new one etc.-).

At the end, just join() all threads, that's it.

dr. evil