views:

97

answers:

4

Intro

I am doing a kind of OCR application that should recognize characters based on pre-saved .bmp pictures of each character.

Now, for a given part of a screenshot of the screen, where I know there will be a character, I want to pass the current picture to a CharacterFactory, that will return a Character object:

class CharacterFactory : ICharacterFactory {
    private Collection<Bitmap> aPictures = new HashTable<Bitmap>();
    private Collection<Bitmap> bPictures = new HashTable<Bitmap>();
    private Collection<Bitmap> cPictures = new HashTable<Bitmap>();
    ...

    public CharacterFactory() {
        LoadAllPictures();
    }

    ...    

    public Character GetCharacter(Bitmap characterToRecognize) {
        if (aPictures.Contains(characterToRecognize)) return new ACharacter();
        if (bPictures.Contains(characterToRecognize)) return new BCharacter();
        if (cPictures.Contains(characterToRecognize)) return new BCharacter();
        ...
    }
}

My question is

how to Unit Test this class? The only way I can see for testing the class is indeed to save a couple of Bitmaps to pass in the characterToRecognize argument and compare them to the list of pre-saved pictures my program has. That has the problem, of course, of taking some time to load the pictures and some other time for running the GetCharacter() algorithm.

I could of course wrap each of my CharacterFactory's xPictures Collection in a new class, but I'd just be pushing the problem to that new class.

How to deal with this kind of situations?

+1  A: 

A test that depends on any external data is not a unit test. It is an integration test.

Inject the data dependency you require, so that the test is unwired from any data, and then pass/inject/fake this data into your unit test.

Mitch Wheat
+1  A: 

Don't LoadAllPictures() in the constructor - it's a violation of SRP. You don't want to have a single class responsible for both loading data and runtime logic.

If you need to dynamically load stuff at runtime, hide it behind another interface, and then stub it out for tests.

kyoryu
+1  A: 

The problem of course, is that BitMap (like many .Net classes) is sealed. Which makes it hard for you to create a mock. What I typically do in these cases is make a very thin wrapper for these classes, something that takes a BitMap for a constructor and then mimics the methods and delegates to the BitMap. then you can either make a mock of this class, or extract a interface for it. Then make the class you want to test take your new "IBitMap" and mock it, inject your data ext just as others have said.

My projects often end up with a small collection of these "Microsoft Wrappers". It would be so much easier if they would just include interfaces for these classes, or at least not seal them .

ryber
A: 

This isn't directed at the original question, but this approach to OCR will have some problems. If you change the approach, it eliminates the need for breaking up this specific class for testing, so I thought it was worth posting as an answer.

  1. Bitmap does not override Equals to compare image data; Contains() will be based on reference equality, which is probably not what you're after. You would need to provide your own implementation if you want to perform image contents equality.

  2. A direct comparison will likely not work due to anti-aliasing. You'll also need to decide how to locate the characters you wish to match.

If you're interested, I recommend learning more about template-based pattern recognition. You can find more information here. The technical term is cross-correlation; the process has its theoretical roots in signal analysis and is related to locating a particular audio 'waveform' in the midst of other noise signals. This technique accomplishes what you're after (compares a particular image against a template character), but does it in a way that's more robust against possible noise in the image. It's still a very challenging problem and there's a number of things to be learned along the way.

If you're just after an implementation and not looking to learn more about how the template matching works, AForge.NET includes a simple exhaustive template matching algorithm. There are numerous improvements that could be made for a faster search, but it may be sufficient for your needs.

Dan Bryant