views:

430

answers:

3

Good night :)

I am currently playing with the DevIL library that allows me to load in image and check RGB values per pixel. Just as a personal learning project, I'm trying to write a very basic OCR system for a couple of images I made myself in Photoshop.

I am successfully able to remove all the distortions in the image and I'm left with text and numbers. I am currently not looking for an advanced neural network that learns from input. I want to start out relatively easy and so I've set out to identify the individual characters and count the pixels in those characters.

I have two problems:

  • Identifying the individual characters.
  • Most importantly: I need an algorithm to count connected pixels (of the same color) without counting pixels I've previously counted. I have no mathemathical background so this is the biggest issue for me.

Any help in the matter is appreciated, thanks.

edit:

I have tagged this question as C++ because that is what I am currently using. However, pseudo-code or easily readable code from another language is also fine.

+1  A: 

Not sure this helps, but there is a GPL OCR lib called gocr.

Gian Paolo
+2  A: 

The flood fill algorithm will work for counting the included pixels, as long as you have the images filtered down to simple black & white bitmaps.

Having said that, you can perform character recognition by comparing each character to a set of standard images of each character in your set, measuring the similarity, and then choosing the character with the highest score.

Take a look at this question for more information.

e.James
This looks interesting, I'll have a look. Thanks!
Daniel
No problem. Good luck!
e.James
+1  A: 

Apologies if this is too far off-topic, but IMHO Vigra (not the other one!) is a much better image processing library for C++ than DevIL.

Gian Paolo
I was almost going to mark this answer as spam ;-)
LeopardSkinPillBoxHat
Yeah, thought it best to clarify ... you don't know how many wtf moments I've had seeing this link in my bookmarks :)
Gian Paolo
Why would somebody name their product like that?
Naveen