views:

282

answers:

1

We are having to read text off of an existing VB6 application. So we use the methods FindWindow, GetWindowText, and EnumChildWindows out of kernel32 and can enumerate and read the displayed text in this process.

We are able to read 90% of the text with our method, but there is a specific control (or box) in general that we cannot read.

We cannot target the text we need to read with UI spy-type programs, so I assume they must be rendering it directly to the screen with GDI/GDI+. They cannot be using a control or window to render the text we need.

Is there a way to determine how they are rendering the text, and possibly read it?

We do not want to grab the hDC of the window and render it onto a bitmap and somehow reverse-CAPTCHA the text... that could be a nightmare.

SOLUTION: We discovered it is possible for use to merely look for 2-3 phrases in this box versus actually OCR-ing the text. So we are going to render it to a bitmap and compare it with 2-3 pre-stored bitmaps so we can merely compare pixel by pixel.

Top answer brought us to this solution.

+1  A: 

If they're drawing direct to a surface, there's no way to get the text without some weird OCR stuff.

Update: after thinking about your problem, I think that doing what you describe (grabbing the window's hDC and creating a bitmap from it) would be a relatively easy task (relative to trying to intercept the API calls that were rendering the text in the first place).

It wouldn't be as difficult as doing OCR on handwriting, for example. As long as you can determine the font used by the Visual Basic 6 application to draw the text, and as long as the text you want to scrape is drawn to the same location on the form each time, it would be relatively easy to break the drawn text up into discrete characters (as tiny little bitmaps) and then compare each one to a pre-generated collection of characters that you've drawn with the same font at the same size. The characters would match perfectly on a pixel-by-pixel basis.

There might be a problem if the program runs on different systems and draws the text with different fonts.

MusiGenesis
Is there a way to tell how they're rendering it? I have tried a vb6 decompiler, but it spits out some unreadable assembler.
Jonathan.Peppers
They're probably using Win32 API calls. It *might* be possible to intercept these calls somehow, but that would be some seriously low-end stuff, and I would bow down before you. :)
MusiGenesis
I can see where getting the text from a bitmap is possible, but I would guess this could be time consuming and possibly a project in itself. If there is no other way, I can see that. I will mark you as the answer if no one has a better idea for the next few days.
Jonathan.Peppers
@Jonathan: if the VB6 app is using the Win32 API to draw text onto a surface (via its hDC), then the only possibilities are to a) somehow intercept that call as it's being made and get the text that way, or b) do the OCR thing I described. Once the string is drawn, it doesn't exist anywhere as a string anymore - it's just a bunch of pixels.
MusiGenesis
If you knew the location where the text is drawn on the Bitmap, it would take me less than a day to write a C# method that would take a Bitmap, Font and ForeGround and BackGround colors as the input parameters and return the text drawn on it (assuming it's just ordinary ASCII). Probably more like 2 hours with a gun to my head. You guys hire freelancers?
MusiGenesis
We don't hire freelancers. We are weighing the options of not even needing to read the text from this box versus coming up with a way to do so. If reading the text was more simple, we would be more apt to develop it.
Jonathan.Peppers