ansaurus

Question

Answer 1

A:

First, I want to mention that I know nothing about image processing in general, and about OCR in particular.

Still, a very simple heuristic comes to my mind:

Separate the pixels in the image to connected components.
For each connected component decide if it is a line or not using one or more of the following heuristics:
1. Is it longer that the average letters length?
2. Does it appear near other letters? (To remove ink bloats or artifacts).
3. Does its X gradient and Y gradient large enough? This could make sure that this connected component contains more than just horizontal line.

The only problem I can see is, if somebody writes letters on a horizontal line, like so:

   /\     ___
  /  \   /   \
  |__|   |___/
 -|--|---|---|------------------
  |  |    \__/

In that case the line would remain, but you have to handle this case anyhow.

As I mentioned, I'm by no means an image processing expert, but sometimes very simple tricks work.

Elazar Leibovich 2010-06-29 13:50:34

How to detect and remove guide lines from a scanned image/document efficiently ?