views:

196

answers:

2

I wrote logic for testing urls using threads. This works good for less number of urls and failing with more than 400 urls to check .

class URL extends Thread{
def valid def url

URL( url ) {
    this.url = url
}

void run() {
    try {
        def connection = url.toURL().openConnection()
    connection.setConnectTimeout(10000)
        if(connection.responseCode == 200 ){
            valid = Boolean.TRUE
        }else{
            valid = Boolean.FALSE
        }
    } catch ( Exception e ) {
        valid = Boolean.FALSE
    }
}
}



    def threads = [];
    urls.each { ur ->
       def reader = new URL(ur)
       reader.start()
       threads.add(reader);
    }

     while (threads.size() > 0) {
       for(int i =0; i < threads.size();i++) {
         def tr = threads.get(i);
            if (!tr.isAlive()) {
                if(tr.valid == true){
                  threads.remove(i);
                  i--; 
            }else{
              threads.remove(i);
              i--;
            }
        }
     }

Could any one please tell me how to optimize the logic and where i was going wrong .

thanks in advance.

+2  A: 

Have you considered using the java.util.concurrent helpers? It allows multithreaded programming at a higher level of abstraction. There's a simple interface to run parallel tasks in a thread pool, which is easier to manage and tune than just creating n threads for n tasks and hoping for the best.

Your code then ends up looking something like this, where you can tune nThreads until you get the best performance:

import java.util.concurrent.*

def nThreads = 1000
def pool = Executors.newFixedThreadPool(nThreads)
urls.each { url ->
    pool.submit(url)
}
def timeout = 60
pool.awaitTermination(timeout, TimeUnit.SECONDS)
ataylor
Thank you ataylor, But My bad. How can i call URL class from loop instead pool.submit(url).Could you please explain me with some code in loop. thanks
Srinath
Please fix the formatting on your code so I can see what you're trying to do.
ataylor
+1  A: 

Using ataylor's suggestion, and your code, I got to this:

import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit

class MyURL implements Runnable {
  def valid
  def url

  void run() {
    try {
      url.toURL().openConnection().with {
        connectTimeout = 10000
        if( responseCode == 200  ) {
          valid = true
        }
        else {
          valid = false
        }
        disconnect()
      }
    }
    catch( e ) {
      valid = false
    }
  }
}

// A list of URLs to check
def urls = [ 'http://www.google.com',
             'http://stackoverflow.com/questions/2720325/groovy-thread-for-urls',
             'http://www.nonexistanturlfortesting.co.ch/whatever' ]

// How many threads to kick off
def nThreads = 3
def pool = Executors.newFixedThreadPool( nThreads )

// Construct a list of the URL objects we're running, submitted to the pool
def results = urls.inject( [] ) { list, url ->
  def u = new MyURL( url:url )
  pool.submit u
  list << u
}

// Wait for the poolclose when all threads are completed
def timeout = 10
pool.shutdown()
pool.awaitTermination( timeout, TimeUnit.SECONDS )

// Print our results
results.each {
  println "$it.url : $it.valid"
}

Which prints out this:

http://www.google.com : true
http://stackoverflow.com/questions/2720325/groovy-thread-for-urls : true
http://www.nonexistanturlfortesting.co.ch/whatever : false

I changed the classname to MyURL rather than URL as you had it, as it will more likely avoid problems when you start using the java.net.URL class

tim_yates
thanks tim yates ,But i found most of the urls are showing null instead true or false . though all null urls are valid and working . could you please tell me why it is breaking. I made 500 urls to run.
Srinath
It must be timing out.Set the timeout variable higher
tim_yates
thanks a lot. yes i increased timeout from 10 to 60 and works good for urls bewtween 500-700. In my db i had 2700 urls and when tried this at a time, was failing and gives false results. Could you please tell me how much can give timeout ? also do we need to increase connectTimeout in URL class
Srinath
For the first part, to wait forever, change timeout to Long.MAX_VALUE, and change TimeUnit.SECONDS to TimeUnit.NANOSECONDS. The answer to the second part is probably 'that depends'
tim_yates
Hi tim, can you please tell me other alternative. My browser was hanging when used Long.MAX_VALUE for 2700 urls loop.
Srinath
You're running this in a browser? O_o
tim_yates
Sorry and yes my requirement changed and calling the script from browser and displaying urls on page with status. The only problem facing is with more no.of urls testing. Any thing wrong tim ?
Srinath
I'd do them as a batch. User submits urls, gets a ticket, checks a status page to see if their results are ready. Then just have a service reading a queue of url jobs in turn
tim_yates
okay,then there will be more changes in the above code while using batch ?
Srinath
no. just how you use it
tim_yates
My Bad couldn't get your recent comment.I would like to continue with same code, but the only final issue is with timeout for more urls . Is it you are suggesting other alternative or work around ?
Srinath
this is now a completely separate question to the one you asked originally
tim_yates
When inspect with println e.message in catch statement i found status messages like 1.Too many open files 2.Server redirected too many times (20). this was showing when execute for all 2700 urls .
Srinath
yes i agree , is it that the code will not work good from browser point ? sorry if i was confusing enough .
Srinath
The code will work fine, but it will wait until all the urls are scanned. If you submit thousands of urls, it will make the browser wait until they are all scanned. It's not the function that's at fault, it's the workflow. That's why I suggested a different workflow (with the ticket idea above).
tim_yates
yes.could you please provide me new solution using ticket workflow. I think we will use batch processing for inserting records in db.right. I have a button on page and when clicked on it will validate all the urls and display . how can we adopt the same here ? Your valuable comments are helping me a lot to grow. Sorry if i'm killing your time. thanks in advance
Srinath
Thanks a lot tim for all the support given to me.
Srinath