ansaurus

Question

Extracting code from photograph of T-shirt via OCR

Answer 1

+4 A:

Hmm perhaps you need to process the image, i.e. put it though some filters like 'edge detection', emboss/engrave or a noise filter...

mouters 2010-03-10 16:50:56

... or better yet, get a REAL OCR engine that does that for you.

Otávio Décio 2010-03-10 16:55:50

it was just a suggestion as to how you might get better results from 'Tesseract'... I guess in an ideal world a REAL OCR engine that worked perfectly everytime would do the trick... @Otavio if your gona make a comment like that at least suggest a REAL OCR engine... :)

mouters 2010-03-10 16:57:23

Already then, two good ones - Abbyy and Oce'.

Otávio Décio 2010-03-10 17:05:06

Answer 2

+2 A:

Good OCRs are strongly guided by redundancies in natural languages to yield a subset for "what might be the next character". Perl code gives no such aid to the OCR. Type it in by hand.

msw 2010-03-10 16:56:02

I'm guessing that it does some transcription. e.g. http://www.techcuriosity.com/resources/bioinformatics/dna2rna.php

msw 2010-03-10 16:59:04

Answer 3

+6 A:

If I were you I'd start by cleaning up the image as much as possible, using a picture-manipulation program (GIMP, for example) so that the input for the OCR would be more easily understandable.

If possible, aim for creating a black-and-white only image.

egarcia 2010-03-10 16:58:53

Answer 4

+18 A:

You can probably type faster than you can clean up images and install OCR engines:

#!/usr/bin/perl
(my$d=q[AA                GTCAGTTCCT
  CGCTATGTA                 ACACACACCA
    TTTGTGAGT                ATGTAACATA
      CTCGCTGGC              TATGTCAGAC
        AGATTGATC          GATCGATAGA
          ATGATAGATC     GAACGAGTGA
            TAGATAGAGT GATAGATAGA
              GAGAGA GATAGAACGA
                TC GATAGAGAGA
                 TAGATAGACA G
               ATCGAGAGAC AGATA
             GAACGACAGA TAGATAGAT
           TGAGTGATAG    ACTGAGAGAT
         AGATAGATTG        ATAGATAGAT
       AGATAGATAG           ACTGATAGAT
     AGAGTGATAG             ATAGAATGAG
   AGATAGACAG               ACAGACAGAT
  AGATAGACAG               AGAGACAGAT
  TGATAGATAG             ATAGATAGAT
  TGATAGATAG           AATGATAGAT
   AGATTGAGTG        ACAGATCGAT
     AGAACCTTTCT   CAGTAACAGT
       CTTTCTCGC TGGCTTGCTT
         TCTAA CAACCTTACT
           G ACTGCCTTTC
           TGAGATAGAT CGA
         TAGATAGATA GACAGAC
       AGATAGATAG  ATAGAATGAC
     AGACAGAGAG      ACAGAATGAT
   CGAGAGACAG          ATAGATAGAT
  AGAATGATAG             ACAGATAGAC
  AGATAGATAG               ACAGACAGAT
  AGACAGACTG                 ATAGATAGAT
   AGATAGATAG                 AATGACAGAT
     CGATTGAATG               ACAGATAGAT
       CGACAGATAG             ATAGACAGAT
         AGAGTGATAG          ATTGATCGAC
           TGATTGATAG      ACTGATTGAT
             AGACAGATAG  AGTGACAGAT
               CGACAGA TAGATAGATA
                 GATA GATAGATAG
                    ATAGACAGA G
                  AGATAGATAG ACA
                GTCGCAAGTTC GCTCACA
])=~s/\s+//g;%a=map{chr $_=>$i++}65,84,67,
71;$p=join$;,keys%a;while($d=~/([$p]{4})/g
){next if$j++%96>=16;$c=0;for$d(0..3){$c+=
$a{substr($1,$d,1)}*(4**$d)}$perl.=chr $c}
             eval $perl;

Edit: typo.

RegDwight 2010-03-10 17:00:10

+1 but you've got the A and the G the wrong way round lol

mouters 2010-03-10 17:02:18

Thanks for retyping it, but I am quite sure it is executable Perl code, and when I run it I get several errors: http://pastebin.com/2QNYVJ26

BioGeek 2010-03-10 17:08:03

+1 for the effort! :D

egarcia 2010-03-10 17:10:24

Now it works. Sweet :)

BioGeek 2010-03-10 17:11:17

So what does it do?

Paul McGuire 2010-03-10 17:20:25

@Paul, The output is: Just another genome hacker.

Nadia Alramli 2010-03-10 17:24:46

Answer 5

+8 A:

pre-processing will definitely yield a more workable image.

For example, here is the result of Gimp "Levels", "Difference-of-Gaussians", and "Levels" filters on the image.

alt text

Joe Koberg 2010-03-10 17:01:29

The link to your image doesn't work.

BioGeek 2010-03-10 17:09:57

Hopefully corrected.

Joe Koberg 2010-03-10 17:26:30

Now it is visible. Thanks.

BioGeek 2010-03-10 17:38:41

Answer 6

+5 A:

Just a few small typos in RedDwight code.

#!/usr/bin/perl
(my $d=q[AA                GTCAGTTCCT
  CGCTATGTA                 ACACACACCA
    TTTGTGAGT                ATGTAACATA
      CTCGCTGGC              TATGTCAGAC
        AGATTGATC          GATCGATAGA
          ATGATAGATC     GAACGAGTGA
            TAGATAGAGT GATAGATAGA
              GAGAGA GATAGAACGA
                TC GATAGAGAGA
                 TAGATAGACA G
               ATCGAGAGAC AGATA
             GAACGACAGA TAGATAGAT
           TGAGTGATAG    ACTGAGAGAT
         AGATAGATTG        ATAGATAGAT
       AGATAGATAG           ACTGATAGAT
     AGAGTGATAG             ATAGAATGAG
   AGATAGACAG               ACAGACAGAT
  AGATAGACAG               AGAGACAGAT
  TGATAGATAG             ATAGATAGAT
  TGATAGATAG           AATGATAGAT
   AGATTGAGTG        ACAGATCGAT
     AGAACCTTTCT   CAGTAACAGT
       CTTTCTCGC TGGCTTGCTT
         TCTAA CAACCTTACT
           G ACTGCCTTTC
           TGAGATAGAT CGA
         TAGATAGATA GACAGAC
       AGATAGATAG  ATAGAATGAC
     AGACAGAGAG      ACAGAATGAT
   CGAGAGACAG          ATAGATAGAT
  AGAATGATAG             ACAGATAGAC
  AGATAGATAG               ACAGACAGAT
  AGACAGACTG                 ATAGATAGAT
   AGATAGATAG                 AATGACAGAT
     CGATTGAATG               ACAGATAGAT
       CGACAGATAG             ATAGACAGAT
         AGAGTGATAG          ATTGATCGAC
           TGATTGATAG      ACTGATTGAT
             AGACAGATAG  AGTGACAGAT
               CGACAGA TAGATAGATA
                 GATA GATAGATAG
                    ATAGACAGA G
                  AGATAGATAG ACA
                GTCGCAAGTTC GCTCACA
])=~s/\s+//g;%a=map{chr $_=>$i++}65,84,67,
71;$p=join$;,keys%a;while($d=~/([$p]{4})/g
){next if$j++%96>=16;$c=0;for$d(0..3){$c+=
$a{substr($1,$d,1)}*(4**$d)}$perl.=chr $c}
             eval $perl;

that when executed produces:

Just another genome hacker.

dtmilano 2010-03-10 17:17:13

Answer 7

+2 A:

Try emailing the image to [email protected] - you will get OCR results back by email (btw, free API for this is also available, http://bit.ly/ocr_api )

Eugene Osovetsky 2010-03-11 01:53:31

ansaurus

tags:

views:

answers:

Extracting code from photograph of T-shirt via OCR

related questions