One challenging topic in computer vision is processing document scans. Typically this involves a number of steps, like noise removal, color analysis, binarization, text block identification, OCR, and then maybe some context analysis and correction.
I'm curious if anyone understands, knows or can point me to literature on how Google identifies text blocks prior to the OCR stage. Any insights?