tags:

views:

85

answers:

3

as i run my ruby script, which is an very long series of loop. for each loop, some random html file is parsed via nokogiri.

top reveals that memory consumption % is incrementing via 0.1 along with cpu usage every few seconds.

eventually the ruby script crashes due to "not enough memory"

UPDATED to latest:

def extract(newdoc, newarray)
 doc = Nokogiri::HTML(newdoc) 
 collection = ''
 collection = newarray.map {|s| doc.xpath(s)}
 dd = ""; 


(0...collection.first.length).each do |i|
    (0...collection.length).each do |j|
      dd += collection[j][i].to_s
    end
end
 collection = ''
 newarray = ''
 doc = ''
 puts dd.chop + "\n"

end

for 1..100000
extract("somerandomHTMLfile", ["/html/body/p", "/html/body/h1"])
end
+1  A: 

I don't quite understand how you loop over your collection. I would rewrite this as follows:

collection.each do |coll_of_fields|
  coll_of_fields.each do |field|
    spliceElement(field, dd)
  end
  newrow = dd.chop() + "\n" 
end

Now you seem to be assuming that there will be at least as many elements of in each array as in the first array. Why not loop over all rows first, and then all elements in a row?

Also the return newrow is not quite clear to me? You stop after the first iteration through the outer-loop?

And why don't you use /html/body/h1/text() in the original array you pass a parameter?

Then your spliceElement could just work on the string directly. Or am i missing something?

nathanvda
+1  A: 

Based on your other questions, I'm wondering if you are saving the value of extract, or in some other way holding on to the reference to collection. I presume you want to start over with that each time?

In any case, in your other questions, there still seems to be some editing. You should definitely set anything you don't want to retain to nil between cycles.

If that's not good enough, you may need to do a sort of binary search through your logic, and disable half of your program in a converging set of edit-test runs until you see where the memory loss is happening.

DigitalRoss
do you mean i should set collection = nil before returning ? and setting collection = nil in the beginning of the definition extract()
joeyaa
i updated it with what i have right now. Memory % usage still continually rises even though I have set the stuff i dont want to retain to nil betwen cycles.
joeyaa
A: 

You could call GC.start after each extract, to explicitly start the garbage collection and clean up unused memory.

nathanvda