ansaurus

Question

Answer 1

+2 A:

The best way to wait for the Thread instances to finish is to call the .Join method. Take the following example

Public Sub ParseAll(ByVal ParamArray urls As Uri()) 
  Dim list as New List(Of Thread)
  For Each url in urls
    Dim thread = New Thread(AddressOf ProcessUrl)
    thread.Start(url)
    list.Add(thread)
  Next
  For Each thread in list
    thread.Join
  Next
End Sub

Though you may want to consider using the ThreadPool here. The ThreadPool is designed for spawning off lots of small tasks very efficiently.

JaredPar 2010-02-01 17:28:18

JaredPar depending on the size of the list, this could be very inefficient. Besides, the question asked to limit it to 3 threads.

Lucas B 2010-02-01 17:32:38

Let me see if I understand -- List is the list of possible threads; then for each url in url list, create a new thread and assign it the function processUrl, then start the thread and add it to the possible thread list. Then from what I have read .join blocks a new thread until the other is complete. But I want the threads to fetch the url, parse it and add to database concurrently to speed up process?

vbNewbie 2010-02-01 17:33:47

@Lucas B, essentially any hand management of threading can be inneficient. Scheduling of many threads is best left up to APIs like the ThreadPool but the OP asked for straight Thread instances.

JaredPar 2010-02-01 17:41:20

Thanks JaredPar, appreciate everyone else's responses as well. This seems to work ok for now.

vbNewbie 2010-02-01 18:18:37

Ok one issue has turned up. I have a logging class and functions were I write to a log file and error file. There was a clash with an error:"The process cannot access the file because it is being used by another process.

vbNewbie 2010-02-01 18:21:29

You're going to need to wrap the log access in a lock.

Steven Sudit 2010-02-01 18:30:11

This is my code to write the parsed content to a file and I am getting the collision error:Try SyncLock outfile outfile.WriteLine(link) End SyncLock outfile.Close() Catch ex As Exception execError = ex.Message End Try

vbNewbie 2010-02-01 19:10:53

@vbNewbie, to get around logging issues I recommend you use an aspect oriented solution, like log4net.

Lucas B 2010-02-01 20:25:07

Answer 2

+2 A:

You could use a synchronized Queue where u push the URLs to and every crawler takes the next URL it visits out of this Queue. When they detect new URLs, the push them into the Queue, too.

ZeissS 2010-02-01 17:28:46

This is a much better approach than hand-coding the threads.

Steven Sudit 2010-02-01 18:30:28

Answer 3

+1 A:

I recommend using a Background worker to accomplish this.

Lucas B 2010-02-01 17:30:50

Answer 4

+1 A:

Look into the Concurrency and Coordination Runtime (CCR). I have built a few crawlers based on that framework, and it makes things very easy once you understand how the CCR works.

Should take you a few hours to get up to speed with the CCR.

Bryan Batchelder 2010-02-01 17:34:15

I will look into this some other time, thanks. I barely understand threads right now. Excuse my lack of intelligence.

vbNewbie 2010-02-01 17:36:44

ansaurus

tags:

views:

answers:

implement multithreaded crawler

related questions