views:

331

answers:

2

Here's a tough one:

I need to be able to find a word's position and size (its frame) on the screen (its first occurence is enough, from there I should be able to get the next ones).

For example, I would like to be able to detect word positions in (but not limited to) Word, Excel and PowerPoint for Mac, as well as Safari and others.

The solution should be as fast as possible; I should be able to find at least 5-6 words per second and use as little CPU time as possible.

Here's what I thought of so far:

  • OCR in a window's screenshot / graphics context (any good Open Source framework that works on Mac OS X 10.4 and that can be used in a commercial product?). Evernote is very good at spotting words in images. I don't know if it uses a custom in-house engine or an Open Source / commercial one but that would be the kind of engine I would like to use if this is a "valid" solution. Ideally I would detect the word's frame in the active application's window (how to get the frame of another application?).
  • Getting some kind of "hook" on Quartz drawing of text and intercepting the location of the word when it's drawn (does not seem very feasible at first glance!).
  • AppleScript, but it depends a lot on what API the application offers (I don't think you can get a word's coordinates in a Word document from what I've seen) and it's slow.
  • ... out of ideas ...

My goal is to get all the word's frames in a paragraph in the right order based on a string containing the text of the paragraph.

Thanks in advance for any hints!

+2  A: 

As a starting place, you may want to take a look at QuickCursor's code. It retrieves text from many different applications through the AX Accessibility APIs. Now, it won't grab the pixel placement of the word, but it will at least return the NSString associated with the text in that UI element. Of course this means that the app in question has to support these APIs; I don't know if the MS Office suite would. In addition, it only supports editable elements, so an un-editable webpage in Safari won't work either. But it may give you a starting point for some ideas.

Take a look at the QCUIElement.{m,h}, and then the implementation in the QCAppDelegate.m (beginQuickCursorEdit:)... the implementation of his abstracted QCUIElement seems to be as simple as:

QCUIElement *focusedElement = [QCUIElement focusedElement];
id value = focusedElement.value;

Edit: Aha! Check out the Accessibility Inspector Sample code: UIElementInspector. It can actually get the AXPosition of elements on a page. Now, it's not word-by-word, but we're getting closer. It'll tell you the x,y placement of a textblock, as well as the words contained in the textblock.

Matt B.
Thanks! I looked at the Accessibility APIs before but encountered the same limitation as you did. There doesn't seem to be a way to get a word's position within the AXTextArea in many applications. Office 2004 does not seem to be using Cocoa controls so there's no Accessibility element for the document.. :( Unfortunately this is one app suite I must absolutely support.
Form
There doesn't seem to be a way to reliably get a word's position on screen, so I guess this is the most appropriate answer. That would be used to get the position of a text field on the screen.
Form
A: 

This is possible, but very hard to get working reliably. You can play with Spell Catcher's Direct Connect feature to see an example.

Nicholas Riley