views:

40

answers:

1

I have a typical yearbook with photos and a name beneath each photo. Is there a programmatic way to scan all of the photos and save them with the name beneath the photo?

+1  A: 

Yes - but unless your 'typical' school has > 1000 students in the year it's going to be easier to type the names in manually.

Finding the name box in the scan, isolating the text, ocr'ign it and then hooking up all the software to crop and save the photos manually is going to take you a lot longer than the 2-3 seconds it takes to type a name.

edit - I don't know of any scanning software that does this - there might be something for newspapers.
If the layout of the year books is consistent (at least across the same book) you could scan a page and have either batch mode in your favourite image app, or some command line tool split it out into separate images based on the pixel coordinates. You could then extract just the name box into a separate image and do ocr on that. If they are relatively modern and were layed out in a DTP package with clean fonts this shoudl work well - older books with typewriter captions and paste markup might be harder

Another alternative - depending on privacy issues - would be to crowd surf the problem.
Since presumably you aren't just doing this for your own amusement and want people from the school to be interested.
- Create a facebook/myspace/flickr (or whatever the cool kids are using this hour) for your school.
- Post each picture (or class shot) and ask people to enter the name - either from recognising the person or by reading the caption.
- Another approach is to post the pictures on your site as PDFs and have google index it and do the OCR for you.

Martin Beckett
Thanks mgb. I want to do this with many yearbooks and my school has over 2,000 students. I do not want to scan each image individually or crop each image out of the page image. I was hoping there could be some sophisticated scanning software that can detect the images and names and pull them for me.
Bryan
edit - suggestion too long = added to answer
Martin Beckett
Google will index and OCR individual photos with a name below? Amazing suggestions.
Bryan
See http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/ you can also download the tesseract library that google use but then you would have to write some code
Martin Beckett