views:

244

answers:

1

In order not to block the reactor I would like to read files asynchronously, but I've found no obvious way of doing it using EventMachine. I've tried a few different approaches, but none of them feels right:

  • Just read the file, it'll block the reactor, but what the hell, it's not that slow (unless it's a big file, and then it definitely is).
  • Open the file for reading and read a chunk on each tick (but how much to read? too much and it'll block the reactor, too little and reading will get slower than necessary).
  • EM.popen('cat some/file', FileReader) feels really weird, but works better than the alternatives above. In combination with the LineAndTextProtocol it reads lines pretty swiftly.
  • EM.attach, but I haven't found any examples of how to use it, and the only thing I've found on the mailing list is that it's deprecated in favour of…
  • EM.watch, which I've found no examples of how to use for reading files.

How do you read files within a EventMachine reactor loop?

+2  A: 

EM.attach/watch cannot be used on files, as select/epoll on a disk-based file descriptor will always return readable.

Ultimately, it depends on what you're trying to do. If it's a small file, just File.read it. If it is larger, you can read small chunks over time. For example, EM::FileStreamer does this to send large file over the network.

Another common use-case is to tail a file and read in new contents when it changes. This can be achieved using EM.watch_file: http://github.com/jordansissel/eventmachine-tail

tmm1
Basically I want to read a few moderately large files (up to 10 Mb) in parallel and extract a piece of each line.
Theo
If the operation you need to perform is per-line, then reading a line of the file on each tick seems to make the most sense. You'd get the benefit of all of Ruby's line-based IO methods, your event blocks would most closely reflect your business logic, and doing less in each block simply means the ticks would happen faster.
SFEley
Reading a line on each tick is too slow because I spend time inside the reactor waiting for IO, and that's what I want to avoid, I want to do other things (like process the line) while waiting for IO.
Theo