tags:

views:

37

answers:

2

I wrote a webcrawler which calls a web page in a do while loop amount 3 seconds

totally there are 7000 sites... i parse the data and save it in my DB.

sometimes because the script is loading for a long time, i got a timeout in browser,

but in background i continues. I see that on my database.

Can I prevent this?.. Now it's just possible if I stop webserver.

Thank you and best regards.

A: 

Your web page is kicking off a server-side process. Killing your browser or closing it is not going to stop this. It sounds to me like a web page to control this is the wrong approach, and you should be looking at a connected form of application like a WinForms/WPF app. There would be ways to get this to work with ASP.NET, but they are not going to be simple. I think you have just chosen the wrong technology.

David M
Ok, this would be a pain... I hope there is just one solution :)... But I have to do it with a webform because it have to start if I go to a specific url..
snarebold
In that case, I think you're going to have to get into something asynchronous on the server, and have web requests start and stop this asynchronous process. I told you it wouldn't be simple...
David M
A: 

Starting an intensive, long running process like this from a web page is almost never a good idea. There are lots of reasons, but the main ones are :

1) If you get a timeout in the browser (this is your scenario) the data you have harvested may not be displayed.

2) What happens if you hit refresh in the browser? Will it attepmt to start the whole process again? this is an easy target for an attacker, if he wants to tie up all your server resources.

3) Is the data you are crawling really likely to change to such an extent that you need "live" crawling? 99% of cases would be served just as well with a background timed job running the crawl, and your front end just displaying the contents of the database.

I would seriously recommend you rethink your crawling strategy to something more controllable and stable.

ZombieSheep
Ok thank you. It's in a closed area security is not much important in this case, but of course I agree.Why this snippet not work? if (!Response.IsClientConnected) return;
snarebold
HTTP is inherently stateless. The browser sends a request with *all* the data required for the server to understand and process it. The server then sends the response. Beyond that there is no relationship between a browser and the server. It's all smoke and mirrors. :)
ZombieSheep
ok, and why this property exists?
snarebold