tags:

views:

172

answers:

1

when looping through many web pages and calling something simple like below

manyhtmlpages.each do |page|

doc = Nokogiri::HTML(page) 

puts doc.xpath("/html/body/h2[1]","/html/body/a[1]").to_s

end

i observe that memory consumption continually goes up until the script terminates due to running out of memory.

when i remove the doc.xpath bit, this error above is not experienced.

A: 

I think the root of the problem lies in that the code is not garbage collected until both page and doc leaves the scope (correct me if I'm wrong).

A similar problem is described here.
This is a problem with libxml-ruby, but as far as I know, nokogiri actually build on libxml.

I'm sorry, but I don't know the exact details about this problem. It's just to point you in the right direction.

Styggentorsken
thank you i think this is the right direction.... i am going to try force garbage collection with gc.start
bbbnb