ocr

any OCR techniques for java

hii I have MCA final year project to extract data from image(jpg,gif,etc) I want to recognize data from image i have used java ocr but it is not working is there any open source libraries which can help me ...

Extracting code from photograph of T-shirt via OCR

I recently saw someone with a T-shirt with some Perl code on the back. I took a photograph of it and cropped out the code: Next I tried to extract the code from the image via OCR, so I installed Tesseract OCR and the Python bindings for it, pytesser. Pytesser only works on TIFF images, so I converted the image in Gimp and entered the...

Tesseract.NET in C#

Hi, Do you know of step by step guide of how to use bins and dlls in http://www.pixel-technology.com/freeware/tessnet2/ I spent 2 days trying to use this by when compiling i am being asked for a dll that do not exist in the zip file i downloaded from the site. Any help will be greatly appreciated. ...

How to get text from the screen in KDE development

I want to get text from screen (like Babylon). Is there any thing to be helpful in KDE developement. Any api or library? Thanks in advance ...

OCR: How to improve accuracy - existing libraries for removing non-text 'furniture', shapes, etc to avoid confusing OCR?

I want to remove rectangles etc that enclose text in a screenshot image, so that I can perform optical character recognition to get accurate text from the screenshot. Background: I doing this to extract data from a legacy application for use with other applications. This is the only way to get at this data as associated files are in a ...

OCR with Neural network: data extraction

I'm using the AForge library framework and its neural network. At the moment when I train my network I create lots of images (one image per letter per font) at a big size (30 pt), cut out the actual letter, scale this down to a smaller size (10x10 px) and then save it to my harddisk. I can then go and read all those images, creating my ...

using MODI in C# to read image - numbers with a length of 1 is missing

I am about building an C#-application in which I am trying to read text from an gif-image (OCR) - I am using MODI and the images are a bit like a lotto coupon (random numbers in rows and columns). I now got the following code which read all numbers except single numbers (1, 2, 3...) MODI.Document objModi = new MODI.Document(); objModi.C...

Form Scanning into SQL

I have been looking for a solution to scan (and OCR) a paper survey type form into a SQL Server database. I have looked at TeleForm and ReadSoft which are very high end and expensive solutions. I have tried to research other solutions but have come up dry. Is anyone doing this with a light-weight package? Note: I am not looking for a pl...

Java OCR package

what is the best OCR package you've used with java? I need to parse mainly numbers in a single font and am looking for a well architected, lite weight library to use. thanks. ...

OCR library for photos, not scanned images

Does anyone know of an OCR library that can handle colored photos (as opposed to scanned pages)? It seems to me that most libraries out there work on B&W images and expect them to come from a scanner. I need something that can take a colored photo of, say, a billboard, and extract text from it. I'm currently considering converting the...

Most accurate open-source OCR for Japanese?

From your experience, what is the most accurate open-source Optical Character Recognition (OCR) library/software to read Japanese text? I just tried nhocr, its mistake rate is over 2% even on an extremely clean high-definition document. Keywords: kanji, hiragana, katakana, scan, recognize, 光学式文字読取り装置, 光学的文字認識 ...

Most accurate open-source OCR for handwritten numbers?

My software needs to read a fixed-length handwritten number. While I could use a general-purpose library like Tesseract, I am sure there is something smarter. Tesseract will probably misinterpret some of the 1 or 7 as I or l, whereas a software that expects only numbers would not. Knowing that there are only numbers (American-English w...

Using a Custom Dictionary with Microsoft's MODI

I am currently using Microsoft's MODI (Microsoft Office Document Imaging) to read text in an image in C#. Everything is working fine, except some of the words I want to read are not real English words. Is there any way to use a custom dictionary when using MODI or add words to the regular English dictionary that it uses? ...

Text detection of image

I got grayscale images made by cheap camera and I need to make a program OCR. The main problem is noise or objects that are not text but they present in binary image. Now I think of text extraction from image. I need some good algorithm for that. Can you suggest any really good one? For example if image contains black color text and so...

Image improvement methods for OCR Engine

Hello every one, We are working on a software that uses OPENOCR engine to do some OCR on given images, given we are using .NET framework , i was wondering if anyone knows about any good possible filters or sharpening methods that can be applied to the image prior to sending it to OCR engine. I have found for example a grayscaled imag...

character matching in grayscale image

I made patterns: images with the "A" letter of different sizes (from 12 to 72: 12, 14, .., 72) And I tested the method of pattern matching and it gave a good results. One way to select text regions from image is to run that algorithm for all small and big letters and digits of different sizes. And fonts! I don't like it. Instead of it I ...

Can not recognize pdf scanned page with greek words by using PB , EZTWAIN and TOCR 3.0

Hi, Iam using PB 10.5.2 and EZTwain 3.30.0.28, XDefs 1.36b1 by Dosadi for scanning. Also Iam using the TOCR 3.0 for OCR management. In a function we use the following among all others : ... Long ll_acquire (as_path_filename is a function argument) ... ... TWAIN_SetAutoOCR(1) ll_acquire = TWAIN_AcquireMultipageFile(0, as_path_fil...

Adobe acrobat 8 command line switches to recognize ocr text

I want to use command line, to execute licensed adobe acrobat 8 to recognize OCR text a already scanned pdf document and make it fully searchable pdf? Do you know what is the command line switch, parameter? Thanks in advance! ...

How to segment text images using MATLAB?

It's part of the process of OCR,which is : How to segment the sentences into words,and then characters? What's the candidate algorithm for this task? ...

the best method for google indexing text content in images?

Hi everybody, I have a webpage where I put 1 image once in a while, this is just like xkcd.com I would like to know how to let google know the text in my images. My approach is to put the text in alt html attribute, like this: <img src="http://myapokalips.com/public/cartoons/021_Robot_Tattoo.png" alt="RETARD - aw, that's a sick tatto...