tags:

views:

80

answers:

1

I have tried to implement a multithreaded crawler and it seems to be working in fetching a list or urls concurrently without any issues. I tested each step and had the program write all html pulled to a text file. Now the rest of the program intends to take each html stored as a string and parse it for a list of urls from that page and then write this list to a database. This is where the errors start: First I have locked out the parsing process since it first caused errors by returning empty lists with the error ' property evaluation faled' Now I have lists being returned but I cannot write this to a database.

My question is, do I need to lock out everything and why? Can I not allow all threads to parse at the same time and each write to an arraylist? Will this all hinder performance?

Here is a sample of some of my code; first the call to go and parse a url:

If Not String.IsNullOrEmpty(html) Then
            'get all links first

            links = parser.GetLinks(fromUrl, html)

then to write to a database:

For Each link As String In links


          recordsAffected = _
                    Links_DBObj.insert_feedurls_link(link, feedlink, execError, connObj_Generic, commObj_Generic)
+1  A: 

Instead of using an ArrayList, I would use a Synchronized Queue. Each reading thread can Enqueue while each writing thread can Dequeue.

Jacob G
Thanks I will try that.
vbNewbie
Quick question, is using synclock at all effective, since I found that even when using it there is still a clash of threads trying to access it. So should I create a new queues at each function where I need file access?
vbNewbie
Everyone should access the same queue, I'd think:Process 1: Instantiate Queue, spin up crawlers and processors.Process 2 - n: Crawl, acquire content, Process, Enqueue.Process n+1: Check Queue, Dequeue, Save.
Jacob G