views:

191

answers:

2

I have this command in a Rails controller

  open(source) { |s| content = s.read }
  rss = RSS::Parser.parse(content, false)

and it is resulting in temporary files that are filling up the (scarce) disk space.

I have examined the problem to some extent and it turns out somewhere in the stack this happens:

io = Tempfile.new('open-uri')

but it looks like this Tempfile instance never gets explicitly closed. It's got a

def _close  # :nodoc:

method which might fire automatically upon garbage collection?

Any help in knowing what's happening or how to clean up the tempfiles would be helpful indeed.

+1  A: 

it looks like _close closes the file and then waits for garbage collection to unlink (remove) the file. Theoretically you could force unlinking immediately by calling the Tempfile's close! method instead of close, or to call close(true) (which calls close! internally).

edit: But the problem is in open-uri, which is out of your hands - and that makes no promises for cleaning up after itself: it just assumes that the garbage collector will finalize all Tempfiles in due time.

In such a case, you are left with no choice but to call the garbage collector yourself using ObjectSpace.garbage_collect (see here). This should cause the removal of all temp files.

Guss
so who calls _close? Without monkey patching I have no access to that tempfile, I think.
Yar
I see the problem now. I added some more info to the answer, hopefully it will work for you.
Guss
Thanks for that! I cannot imagine that calling the GC explicitly will resolve the problem. In my testing, the only way to get a tempfile to NOT get cleaned up is to interrupt the program. So I can't figure out what might be happening.
Yar
I think its a bug in `open-uri` (maybe not a bug per-se, but definitely mis-behavior) that it does not clean up temp files properly. That being said - it passes the buffers around so much that its hard to figure out where to close the Tempfile. I do believe that a lot of refactoring can solve the problem in `open-uri` but that would have to wait for another time :-)
Guss
A: 

Definitely not a bug, but faulty error handling the IO. Buffer.io is either StringIO if the @size is less than 10240 bytes or a Tempfile if over that amount. The ensure clause in OpenURI.open_uri() is calling close(), but because it could be a StringIO object, which doesn't have a close!() method, it just can't just call close!().

The fix, I think, would be either one of these:

The ensure clause checks for class and calls either StringIO.close or Tempfile.close! as needed.

--or--

The Buffer class needs a finalizer that handles the class check and calls the correct method.

Granted, neither of those fix it if you don't use a block to handle the IO, but I suppose in that case, you can do your own checking, since open() returns the IO object, not the Buffer object.

The lib is a big chunk of messy code, imho, so it could use a work-over to clean it up. I think I might do that, just for fun. ^.^

SynTruth
SynTruth, I get the idea, but in a duck-typing language, you don't check the class. You check whether it responds to a method :). Anyway, I am so far from this problem now that I wouldn't know if you fixed it. But report back if you do, just for curiosity's sake.
Yar