I am looking for a way to determine the most "different" or "recognizable" N ASCII characters... For example, if N = 10, what would be the most different N characters in the ASCII set from 0x21 to 0x7E? Obviously, the character "X" is very different than "O" (the letter), but "O" (the letter) is very similar to "0" (zero). Assuming a restricted OCR character subset, such that zero and the letter O would be detected as one or the other only, and one didn't have to worry about whether it was a zero or a letter O, what would be the most different N characters that typical OCR engines (for example Tesseract) recognize easily from a poor quality input image? Assumptions. such as "+" and "t" could widely be mistaken for one another. can be made, and thus each input character, whether it's "+" or "t" would only correspond to one or the other.
Thanks, Ben