views:

31

answers:

2

Hi,

I want to apply operations from the OpenCV computer vision library, in real time, to video captured from my computer display. The idea in this particular case is to detect interesting features during gameplay in a popular game and provide the user with an enhanced experience; but I could think of several other scenarios where one would want to have live access to this data as well. At any rate, for the development phase it might be acceptable using canned video, but for the final application performance and responsiveness are obviously critical.

I am trying to do this on Ubuntu 10.10 as of now, and would prefer to use a UNIX-like system, but any options are of interest. My C skills are very limited, so whenever talking to OpenCV through Python is possible, I try to use that instead. Please note that I am trying to capture NOT from a camera device, but from a live stream of display output; and I'm at a loss as to how to take the input. As far as I can tell, CaptureFromCAM works only for camera devices, and it seems to me that the requirement for real-time performance in the end result makes storage in file and reading back through CaptureFromFile a bad option.

The most promising route I have found so far seems to be using ffmpeg with the x11grab option to capture from an X11 display; (e.g. the command ffmpeg -f x11grab -sameq -r 25 -s wxga -i :0.0 out.mpg captures 1366x768 of display 0 to 'out.mpg'). I imagine it should be possible to treat the output stream from ffmpeg as a file to be read by OpenCV (presumably by using the CaptureFromFile function) maybe by using pipes; but this is all on a much higher level than I have ever dealt with before and I could really use some directions. Do you think this approach is feasible? And more importantly can you think of a better one? How would you do it?

A: 

I think the main challenge is the real-time requirement. I think you have to create some piece of software for OpenCv, inspired by the code for video grabbing in ffmpeg. but that for sure would involves C level coding.

My suggestion is to try to get your vision algorithm right first, by using the ffmpeg-captured video.

fabrizioM