views:

286

answers:

8

I am trying to teach my camera to be a scanner: I take pictures of printed text and then convert them to bitmaps (and then to djvu and OCR'ed). I need to compute a threshold for which pixels should be white and which black, but I'm stymied by uneven illumination. For example if the pixels in the center are dark enough, I'm likely to wind up with a bunch of black pixels in the corners.

What I would like to do, under relatively simple assumptions, is compensate for uneven illumination before thresholding. More precisely:

  • Assume one or two light sources, maybe one with gradual change in light intensity across the surface (ambient light) and another with an inverse square (direct light).

  • Assume that the white parts of the paper all have the same reflectivity/albedo/whatever.

  • Find some algorithm to estimate degree of illumination at each pixel, and from that recover the reflectivity of each pixel.

  • From a pixel's reflectivity, classify it white or black

I have no idea how to write an algorithm to do this. I don't want to fall back on least-squares fitting since I'd somehow like to ignore the dark pixels when estimating illumination. I also don't know if the algorithm will work.

All helpful advice will be upvoted!


EDIT: I've definitely considered chopping the image into pieces that are large enough so they still look like "text on a white background" but small enough so that illumination of a single piece is more or less even. I think if I then interpolate the thresholds so that there's no discontinuity across sub-image boundaries, I will probably get something halfway decent. This is a good suggestion, and I will have to give it a try, but it still leaves me with the problem of where to draw the line between white and black. More thoughts?


EDIT: Here are some screen dumps from GIMP showing different histograms and the "best" threshold value (chosen by hand) for each histogram. In two of the three a single threshold for the whole image is good enough. In the third, however, the upper left corner really needs a different threshold:

+1  A: 

Well. Usually the image processing I do is highly time sensitive, so a complex algorithm like the one you're seeking wouldn't work. But . . . have you considered chopping the image up into smaller pieces, and re-scaling each sub-image? That should make the 'dark' pixels stand out fairly well even in an image of variable lighting conditions (I am assuming here that you are talking about a standard mostly-white page with dark text.)

Its a cheat, but a lot easier than the 'right' way you're suggesting.

Arkenian
Thanks for suggestion; I am talking about white with dark text (sometimes also red handwritten text, but that's a detail!). There is often junk around the edges but as long as it's dark it's not a problem: http://tinyurl.com/yh3pczg (URL should be valid approximately October-December of most years).
Norman Ramsey
The trick to your white vs. black issue is to seek the edges, and determine the gradient on that edge. How easy this will be will depend a lot on the print quality of the original. Laser on high-quality paper, you should be able to do it pretty easily. If the lighting isn't too bad, to find an edge start by finding something blacker than 50% of the pixels, and then look for something whiter than 50%. You might also consider a histogram projection, although if you've got black/white hopefully the histogram will be a "two hump" sort of affair.
Arkenian
Arkenian, I hope to be able to follow up on your idea next week. Meanwhile I've posted some histograms with tiny thumbnails. (Full images contain copyrighted text, so I'm reluctant to post them.)
Norman Ramsey
Looking at the histograms you have, you want to actually go to a much lower percentage. I grant, freely, that basically I'm suggesting a very cheap edge detection by setting an arbitrary threshold for what constitutes black, and then using what you find in the doing of this to refine your threshold values. But in general, if you take a section you 'know' is black, and watch how the pixel values vary as you move to a section you 'know' is white, setting the threshold for that area is usually pretty easy in the OCR situations you're dealing with. Unless you've got a bad xerox from the old days
Arkenian
+1  A: 

This might be horrendously slow, but what I'd recommend is to break the scanned surface into quarters/16ths and re-color them so that the average grayscale level is similar across the page. (Might break if you have pages with large margins though)

Simon Righarts
+1 although I think average grayscale is not going to work at any of the edges---lots of junk around the edges.
Norman Ramsey
+1  A: 

i would recommend calibrating the camera. considering that your lighting setup is fixed (that is the lights do not move between pictures), and your camera is grayscale (not color).

take a picture of a white sheet of paper which covers the whole workable area of your "scanner". store this picture, it tells what is white paper for each pixel. now, when you take take a picture of a document to scan, you can reload your "white reference picture" and even the illumination before performing a threshold.

let's call the white reference REF, the picture DOC, the even illumination picture EVEN, and the maximum value of a pixel MAX (for 8bit imaging, it is 255). for each pixel:

EVEN = DOC * (MAX/REF)

notes:

  • beware of the parenthesis: most image processing library uses the image pixel type for performing computation on pixel values and a simple multiplication will overload your pixel. eventually, write the loop yourself and use a 32 bit integer for intermediate computations.
  • the white reference image can be smoothed before being used in the process. any smoothing or blurring filter will do, and don't hesitate to apply it aggressively.
  • the MAX value in the formula above represents the target pixel value in the resulting image. using the maximum pixel value targets a bright white, but you can adjust this value to target a lighter gray.
Adrien Plisson
Sorry, but the whole point of my project is to be able to recover text from crappy images taken under uncontrolled conditions. If it can be done on an iPhone (http://tinyurl.com/clldjk) then it ought to be possible to code something from a higher-quality image.
Norman Ramsey
that's why i made some assumptions in the first paragraph. indeed, this method does not work at all under uncontrolled conditions.
Adrien Plisson
A: 

You could try using an edge detection filter, then a floodfill algorithm, to distinguish the background from the foreground. Interpolate the floodfilled region to determine the local illumination; you may also be able to modify the floodfill algorithm to use the local background value to jump across lines and fill boxes and so forth.

comingstorm
Actually distinguishing background from foreground is really hard. I'm hoping I might be able to repurpose djvu. The rest of your answer is a little too hard for me to follow.
Norman Ramsey
+1  A: 

I assume that you are taking images of (relatively) small black letters on a white background.

One approach could be to "remove" the small black objects, while keeping the illumination variations of the background. This gives an estimate of how the image is illuminated, which can be used for normalizing the original image. It is often enough to subtract the illumination estimate from the original image and then do a threshold based segmentation. This approach is based on gray scale morphological filters, and could be implemented in matlab like below:

img = imread('filename.png');
illumination = imclose(img, strel('disk', 10)); 
imgCorrected = img - illumination; 
thresholdValue = graythresh(imgCorrected); 
bw = imgCorrected > thresholdValue;

For an example with real images take a look at this guide from mathworks. For further reading about the use of morphological image analysis this book by Pierre Soille can be recommended.

midtiby
I think if I could identify and remove the small black objects and just get the background, my problem would be solved. Your mathworks example is interesting but when I get to the part about 'Morphological Operations' it might as well say 'black magic'. +1 for the book.
Norman Ramsey
Well morphological image analysis is some king of magic ;-)All morphological operations are based on a structuring element (SE), which is a group of nearby pixels (could be a 3x3 pixel box). To determine the pixel values of the new image, the structuring element is overlaid a each pixel position and the resulting pixel value is the maximum pixel value of the original image inside the structuring element. This operation is a dilation, if the maximum is exchanged with a minimum an operation known as erosion appears. Morphological closing is a combination of first a dilation and an erosion.
midtiby
A: 

You could also try a Threshold Hysteresis with a rate of change control. Here is the link to the normal Threshold Hysteresis. Set the first threshold to a typical white value. Set the second threshold to less than the lowest white value in the corners.

The difference is that you want to check the difference between pixels for all values in between the first and second threshold. Ideally if the difference is positive, then act normally. But if it is negative, you only want to threshold if the difference is small.

This will be able to compensate for lighting variations, but will ignore the large changes between the background and the text.

Ed_S
This method looks promising except for that "lowest white value in the corners". I think that part is the problem I'm trying to solve :-)
Norman Ramsey
The lowest white value can be less than the highest black value, and this method will still work. You may need to do some global equalization on the image first. This would make the brightest value 255 and the darkest value 0. Scale the intermediate values accordingly. Then set the first threshold to 255. Set the second threshold to 1 or even 0. You are really dependent on the white part being connected and on the illumination changing gradually. If that is true, then the size of the step when going from text to paper will larger than illumination changes. So the text can be ignored.
Ed_S
If you could send me the images, I have this algorithm available to me. I could process the images and send you the results.
Ed_S
A: 

Two algorithms come to my mind:

  • High-pass to alleviate the low-frequency illumination gradient
  • Local threshold with an appropriate radius
Cecil Has a Name
The examples at http://local.wasp.uwa.edu.au/~pbourke/miscellaneous/imagefilter/ suggest that a high-pass filter will keep the edges but will eliminate the distinction between black and white. Other pages on high-pass filtering also suggest this is a way to bring out edges and detail. Maybe the cutoff is just wrong---the idea sounds promising---but unless I find an FFT off the shelf I'm not going to mess with it.
Norman Ramsey
Oh ah. No FFT necessary to produce a high-pass effect, you can achieve high-pass in GIMP and any other image-processing program that supports layers (or do it yourself) by computing the difference between the original image and a blurred version of the image. In GIMP, duplicate the layer. Blur and invert the top layer, and set the transparency to 50%. Remember: Original signal - signal lowpass result = signal highpass result. The blur can be computed using Gauss, Butterworth, box or median algorithms.
Cecil Has a Name
A: 

Why don't you use simple opening and closing operations? Try this, just lool at the results: src - cource image

src - open(src) close(src) - src

and look at the close - src result using different window size, you will get backgound of the image. I think this helps.

andrew