Starting an intensive, long running process like this from a web page is almost never a good idea. There are lots of reasons, but the main ones are :
1) If you get a timeout in the browser (this is your scenario) the data you have harvested may not be displayed.
2) What happens if you hit refresh in the browser? Will it attepmt to start the whole process again? this is an easy target for an attacker, if he wants to tie up all your server resources.
3) Is the data you are crawling really likely to change to such an extent that you need "live" crawling? 99% of cases would be served just as well with a background timed job running the crawl, and your front end just displaying the contents of the database.
I would seriously recommend you rethink your crawling strategy to something more controllable and stable.