So I have what is essentially a spreadsheet in TIFF format. There is some uniformity to it...for example, all the column widths are the same. I want to de-limit this sheet by those known-column widths and basically create lots of little graphic files, one for each cell, and run OCR on them and store it into a database. The problem is that the horizontal lines are not all the same height, so I need to use some kind of graphics library command to check if every pixel across is the same color (i.e. black). And if so, then I know I've reached the height-delimiter for a cell. How would I go about doing that? (I'm using RMagick)
Use image#get_pixel
: http://www.simplesystems.org/RMagick/doc/image2.html#get_pixels
Warning: Those docs are old, so it may have changed in the newer versions. Look at your own rdocs using $ gem server
, assuming they have rdocs.
image#rows
gives you the height of the image, then you can do something like (untested):
def black_line?(pixels)
pixels.each do |pixel|
unless pixel.red == 0 && pixel.green == 0 && pixel.blue == 0
return false
end
end
true
end
black_line_heights = []
height = image.rows
width = image.columns
height.times do |y|
pixels = image.get_pixel(0,y,width,1)
black_line_heights << y if black_line?(pixels)
end
Please keep in mind that I'm not sure about the api. Looking at older docs, and I can't test it now. But it looks like the general approach you would take. BTW, it assumes the row borders are 1 pixel thick. If not, change the 1
to the actual thickness and that might be enough to make it work like you expect.
Ehsanul had it almost right...the call is get_pixels, which takes in as arguments x,y,w,h and returns an array of those pixels. If the dimension is 1 thick, you'll get a nice one-d array.
Since the black in a document can vary, I altered Ehsanul's method a little bit to detect whether consecutive pixels were roughly the same color. AFter a 100 or so pixels, it's probably a line:
def solid_line?(pixels, opt={}, black_val = 10)
last_pixel = nil
thresh = opt[:threshold].blank? ? 4 : opt[:threshold]
pixels.each do |pix|
pixel = [pix.red, pix.green, pix.blue]
if last_pixel != nil
return false if pixel.reject{|p| (p-last_pixel[pixel.index(p)]).abs < thresh && p < black_val}.length > 0
end
last_pixel = pixel
end
true
end