views:

85

answers:

1

In my project we need to use a virtual printer and then catch the file (most of the times its bitmap) and extract data from it. and transform it into xml like so .

<document name="file://C:\DOCUME~1\ilanit\LOCALS~1\Temp\p0129600584.htm">

<lineXY x="0" y="0" height="1656" width="2275" />
A: 

Is it something like Redmon you are looking for (used in conjunction with output to file and the launch an application)? If so you can use it or there are others out there too. Redmon is a little dated and depending on the OS you might have issues. If you can, add more detail and specifics to your question as it's a bit confusing.

UPDATE (based on comments): If the source is PDF or some other document (ie: Word) that has actual text and not just graphics (scan/image) type data you could use a Postscript driver (type 1 might work best) and then extract the text after you capture the print file. If you are not going to use the print file for actual output and just need the data, you can always try the Generic Text driver in Windows as it will ignore graphcis and just put the text in the output file. As long as the output is consistent and a little Regex should be able to pull out what you need.

If the data is graphical in nature such as a scanned image that you are printing, you will need to capture the print job, turn it into a graphic image (as it will be a print file with PCL or Postscript etc.) and then run it through an OCR engine to pull out what you need.

Douglas Anderson
well. lets say we have an image of some kind (pdf,jpeg,bitmap) and we need to extract data from this image (some number). our first thoughts were to get the data sent to the printer (can be virtual), now we are thinking maybe parsing the image file. your thought please.
guy
@guy: That's a little different from what you describe at first (imo). Are you looking for an OCR algorithm?
Bobby