views:

93

answers:

5

Hi there.

This might sound funny but I got a homework and I can`t make any sense of it. The statement sounds like this: "Find the 10 largest numbers in an array of 100.000 randomly generated integers. You will use threads to compare 2 numbers. A daemon thread will print at regular intervals the progress and the number of unchecked integers left."

I know its not appropriate to ask for help on the forum regarding a homework but I am really REALLY frustrated .... I just cant figure out why, and how, should I use threads to deal with number comparison ..... Especially when it is about 100.000 integers. Even if I go through the list with a simple for using a max variable and printing out all the values it only takes about 150 milliseconds, at most(i tried)!!

Could you at least give me a starting idea on it ???

Sorry for wasting your time!

--CONTINUED--

As I said in a reply, braking up the array into X chunks(no. of threads) would be a good idea if I would have to find only 1 element(the largest) but because I need to find the 10 largest elements, supposing one thread finds its max value in the chunk it is working on, and discards the rest, maybe one of the discarded ones would actually be larger than the rest of the elements in the other chunks. That is why I don`t think this would a good result.

Feel free to argue my point of view!

+1  A: 

The reason you need to do it with threads for this problem is not because you can't solve it without threads, but that it's a good example of a threadable problem (namely, can be parallelized); and a good teaching example since the business logic is so simple so you can concentrate on threading work.

DVK
+1  A: 

No matter how you slice it, finding the max in an unsorted array means a linear search. You could simply partition the data among the number of available threads, then find the max number among the values that the threads came up with.

Steven Sudit
+2  A: 

Each thread can iterate through 100,000 / X numbers (where X is the number of threads) and keep track of the top 10 numbers in that thread. Then, when all threads are done, you can merge the results.

skaz
You mean top X numbers, right?
Steven Sudit
Top 10 sounds right. You have to be prepared for one thread finding the ten largest numbers. The number of threads shouldn't really matter.
westsider
+2  A: 

Break the list of 100k numbers in to batches of some size. Then spawn a thread to do the checking on each of the batches. Then just merge the results.

The bonus part of this, is such a solution will easily scale to huge lists of numbers.

Jon
A: 

Well, you want to put that list of integers in a threadsafe queue. Each thread processes the numbers by pop'ing off the top.

This is the almost the same algorithm you already wrote, but it the key is the threadsafe queue, which lets the threads pull data off of it without clobbering each others data.

When each thread is complete, the main thread should take the results and find the largest numbers between the threads.


EDIT
If each thread gets the 10 largest numbers in its chunk, then it doesn't matter what is in the rest of the array, since the other threads will find the largest in their own chunk. for example:

Array : numbers between 1 and 99
Chunk 1 : 99 98 97 ... 50
Chunk 2 : 49 48 47 ... 1

Thread one result: 99 98 97 96 95 94 93 92 91 90
Thread two result: 49 48 47 46 45 44 43 42 41 40

Merged result: 99 98 97 96 95 94 93 92 91 90 49 48 47 46 45 44 43 42 41 40
Top 10 from merge: 99 98 97 96 95 94 93 92 91 90

See it doesn't matter that chunk 2 has no numbers larger than chunk one.

Byron Whitlock
Looking at the other answers, it is probably more effecient to just break up the list into x equal sized lists where x=numthreads. +1's all around.
Byron Whitlock
Yes, partitioning the data in advance allows the threads to work without communicating. This means they never have to block on a shared resource, so they go full speed.
Steven Sudit
That was my first idea too. But braking up the array into X(no. of threads) is no good because, suppose the first thread finds the 10 largest numbers in its chunk, eliminates the rest ... ok... but what if the eliminated(smaller) ones are larger than the rest of the elements in the array ? Wouldn`t that give me a wrong result ? Braking up the array would be fine if I would have to find only the largest number in the array, but not for 10
Liviu
Look at Byron's example closer. The number 89 is in the first chunk, is bigger than all the numbers in the second chunk, and is still discarded, but that doesn't matter because you still have the 10 largest numbers.
prelic