views:

37

answers:

1

Which approach would you suggest for automatically classifying type found in images? The samples are likely large, with black text on a white background.

The categories are defined here, with some examples on each (Google Books link): http://bit.ly/9Mnu7P This is an extended version of the VOX-ATypI classification system.

My initial thoughts on this were to train the system with lots of single character samples from each category, but I'm wondering if there's a better way that would eliminate the need to do the comparison one letter at a time.

+2  A: 

First, you need to extract features for classification. Typefaces are generally distinguished by the thickness of lines, the presence of serifs, "circularity" of character parts. Thus, the possible features are:

  • The fraction of the number of black pixels on the fixed area.
  • Try to apply math morphology erosion few times (and/or use different masks) and compute this fraction
  • Compute the mean compactness of a character: perimeter^2 / area
  • After applying erosion, count the number of connected components for a character
  • Compute the elongation and other image moments, also the direction
  • etc

I see two options here: either compute mean features for all characters, or try to classify letters first, and than classify the font based on some specific letters (so, you train the different classifier for a different letter). It's hard to say which one is better in your case.

As for specific learning algorithm, Random Forest seems to be a good place to start. There's an implementation in the OpenCV library.

overrider