views:

334

answers:

6

I'd like to write a program able to "use" other programs by taking control of the mouse/keyboard and being able to "see" what's on the screen.

I used AutoIt to do something similar, but I had to cheat sometimes because the language is not that powerful, or maybe it's just that I suck and I'm not able to do that much with it :P

So... I need to:

  • Take screenshots, then I will compare them to make the program "understand", but it needs to "see"
  • Use the mouse: move, click and release, it's simple, isn't it?
  • Using the keyboard: pressing some keys, or key combinations, including special keys like Alt,Ctrl ect...

How can I do that in python?
Does it works in both linux and windows? (this could be really really cool, but it is not necessary)

A: 

There are numerous software packages that already do this. I believe HP QuickTest Pro has this functionality. This seems to be in the category of automated regression testing.

Dave Jarvis
+1  A: 

You can use WATSUP under Windows.

swegi
A: 

I've used the Windows (only) Input API to write a VNC-like remote-control application in the past. It lets you fake keyboard and mouse input nicely at a system level (ie not just posting events to a single application).

If you're trying to do any sort of automated testing of whole systems at the GUI level, this excellent USENIX paper describing automated responsiveness testing is a must-read.

timday
+1  A: 

If you are comfortable with pascal, a really powerful keyboard/mouse/screen-reading program is SCAR: http://freddy1990.com/index.php?page=product&name=scar It can do OCR, bitmap finding, color finding, etc. It's often used for automating online games, but it can be used for any situation where you want to simulate a human reading the screen and giving input.

Jeremybub
+1  A: 

I've had some luck with similar tasks using PyWinAuto.

pywinauto is a set of python modules to automate the Microsoft Windows GUI. At it's simplest it allows you to send mouse and keyboard actions to windows dialogs and controls.

It also has some support for capturing images of dialogs and such using the Python Imaging Library PIL.

Jamie
+1  A: 

AutoIt is completely capable of doing everything you mentioned. When I'm wanting to do some automation but use the features of Python, I find it easiest to use AutoItX which is a DLL/COM control.

Taken from this answer of mine:

import win32com.client
oAutoItX = win32com.client.Dispatch( "AutoItX3.Control" )

oAutoItX.Opt("WinTitleMatchMode", 2) #Match text anywhere in a window title

width = oAutoItX.WinGetClientSizeWidth("Firefox")
height = oAutoItX.WinGetClientSizeHeight("Firefox")

print width, height
Therms