ansaurus

Question

Answer 1

+1 A:

You may want to elaborate on the crawling process.

I'm guessing it's a recursive crawl, where for each crawled page, you crawl all links on it, and repeat crawling all links on all those pages too.

If that's the case, you may want to do two things:

Create some sort of a depth limit, on each recursion you increment the counter and stop crawling if limit is reached
Detect circular linking, if you have a PAGE_A with a link to PAGE_B, and PAGE_B has a link to PAGE_A you'll be crawling until you run out of memory.

Other than that, you should look into using the standard timeout facility of the module you're using, if that's LWP::UserAgent you do LWP::UserAgent->new(timeout => 60)

miedwar 2010-07-13 14:43:42

I use the timeout for UserAgent, but this only applies to getting the page, not to after I get the page. I think the problem is occurring after I get the page.

2010-07-13 16:32:47

Answer 2

+4 A:

Depending on where exactly in the code it is getting stuck, you might be running into an issue with perl's safe signals. See the perlipc documentation on workarounds (E.g. Perl::Unsafe::Signals).

runrig 2010-07-13 15:56:30

I'm sorry I should have been more clear. Maybe scraping would be a better term instead of crawling. Basically I obtain applicable application from the content of the page, only going into urls that lead to more applicable content. Therefore I'm not going into many urls, if any so the depth limit is always 1.Could this actually be a REGEX issue where the results are endless and keep requesting more memory? This doesn't seem likely to me, but throwing it out there.Is there any way to exit a function based on how much memory the program is using?

2010-07-13 16:31:44

@user387049 Yes, this could totally be a regex. Safe signals mean that alarm won't interrupt a discrete Perl operation, such as a regex. See http://rt.perl.org/rt3//Public/Bug/Display.html?id=73464

Schwern 2010-07-13 18:38:17

Using Perl::Unsafe::Signals solved the problem. Some REGEX were locking up and the alarm couldn't interrupt. Thanks for the help!

2010-07-14 19:26:51

ansaurus

tags:

views:

answers:

Perl alarm working intermittently

related questions