tags:

views:

815

answers:

3

I've written an OCR wrapper library around the Microsoft Office Document Imaging COM API, and in a Console App running locally, it works flawlessly, with every test.

Sadly, things start going badly when we attempt to integrate it with a WCF service running as an ASP.Net Web Application, under IIS6. We had issues around trying to free up the MODI COM Objects, and there were plenty of examples on the web that helped us.

However, problems still remain. If I restart IIS, and do a fresh deployment of the web app, the first few OCR attempts work great. If I leave it for 30 minutes or so, and then do another request, I get server failure errors like this:

The server threw an exception. (Exception from HRESULT: 0x80010105 (RPC_E_SERVERFAULT)): at MODI.DocumentClass.Create(String FileOpen)

From this point on, every request will fail to do the OCR, until I reset IIS, and the cycle begins again.

We run this application in it's own App Pool, and it runs under an identity with Local Admin rights.

UPDATE: This issue can be solved by doing the OCR stuff out of process. It appears as though the MODI library doesn't play well with managed code, when it comes to cleaning up after itself, so spawning new processes for each OCR request worked well in my situation.

Here is the function that performs the OCR:

    public class ImageReader : IDisposable
{
    private MODI.Document _document;
    private MODI.Images _images;
    private MODI.Image _image;
    private MODI.Layout _layout;
    private ManualResetEvent _completedOCR = new ManualResetEvent(false);

    // SNIP - Code removed for clarity

    private string PerformMODI(string fileName)
    {
        _document = new MODI.Document();
        _document.OnOCRProgress += new MODI._IDocumentEvents_OnOCRProgressEventHandler(_document_OnOCRProgress);
        _document.Create(fileName);

        _document.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
        _completedOCR.WaitOne(5000);
        _document.Save();
        _images = _document.Images;
        _image = (MODI.Image)_images[0];
        _layout = _image.Layout;
        string text = _layout.Text;
         _document.Close(false);
        return text;
    }

    void _document_OnOCRProgress(int Progress, ref bool Cancel)
    {
        if (Progress == 100)
        {
            _completedOCR.Set();
        }
    }
    private static void SetComObjectToNull(params object[] objects)
    {
        for (int i = 0; i < objects.Length; i++)
        {
            object o = objects[i];
            if (o != null)
            {
                Marshal.FinalReleaseComObject(o);
                o = null;
            }
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public void Dispose()
    {
        SetComObjectToNull(_layout, _image, _images, _document);
        GC.Collect();
        GC.WaitForPendingFinalizers();
    }
}

I then instantiate an instance of ImageReader inside a using block (which will call IDisposable.Dispose on exit)

Calling Marshal.FinalReleaseComObject should instruct the CLR to release the COM objects, and so I'm at a loss to figure out what would be causing the symptoms we have.

For what it's worth, running this code outside of IIS, in say a Console App, everything seems bullet proof. It works every time.

Any tips that help me diagnose and solve this issue would be an immense help and I'll upvote like crazy! ;-)

Thanks!

+1  A: 

MODI is incredibly wonky when it comes to getting rid of itself, especially running in IIS. In my experience, I've found that although it slows everything down, the only way to get rid of these errors is to add a GC.WaitForPendingFinalizers() after your GC.Collect() call. If you're interested, I wrote an article about this.

AJ
Excellent article, thanks for referencing it here.I'll look at implementing this suggestion soon, and advise the outcome.
Scott Ferguson
Thanks very much. Hopefully it helps you out with your current situation.
AJ
Sadly my problem still remains. I'll update my original post with the new source code, and see what the community thinks. Thanks for trying anyway!
Scott Ferguson
MODI gave me many headaches. First it was versioning issues, then it was file corruption issues when annotating faxes on network drives. "Wonky" is a good term for it! ;)
TrueWill
+1  A: 

Can you replicate the problem in a small console application? Perhaps leaving it sleep for 30 mins and coming back to it?

Best way to solve things like this is to isolate it down totally. I'd be interested to see how that works.

Noon Silk
That's a really good idea, thank you Silky. My console test app never fails, but I never have it sitting idle either. It just runs the tests, and exits, presumably correctly throwing away references to the COM objects. I'll modify the console app, and let you know how things turn out. Cheers
Scott Ferguson
My only real guess is that somehow you are running another version of some incompatible exe, (for example, running two different .net versions in the same app pool) and that is somehow corrupting the dll. This would explain the reason it works after a restart.
Noon Silk
While I haven't quite solved this yet, your comment was incredibly helpful. I managed to replicate the problem in a long running Console App, so I can now eliminate IIS as being the cause of the problem. Thanks again for the hint.
Scott Ferguson
I wonder if there is some way to add some tracing to watch what the garbage collector is doing (like another posted hinted at). Perhaps it's expiring objects that the COM needs. Just wildly guessing now, but if there is an option to enable verbose GC output, maybe try and enable it. I take it you've tried to replicate on other machines with the same result?
Noon Silk
Hang on. I just noticed you have a dispose that tries to delete some things. Have you always had that? Or is it new? I think it's new, but I'm not sure it's a good idea ...
Noon Silk
Hi Silky, tracing to watch when and how the GC is doing it's magic is a good idea. Might have to look into that.Yes, I'm attempting to use the IDisposable interface as a way of tidying up the COM references. Before implementing that, we ran into trouble with the MODI library not letting go of the TIF images it was reading. For now, I'm trying Sam Saffrons suggestion of doing the OCR stuff out of process, because if it fails, I can always spin up another process, which presumably won't be broken. It's a workaround at best, but it might do the trick. Thanks for all your help though.
Scott Ferguson
If it failed in the console app it'll certainly fail in a new process, every 10 mins. I'm truly interested in why it's crashing after that time. I'd also be interested; what if you make consistant calls every 5 minutes, to process an image. Does it fail after 10 still?
Noon Silk
+2  A: 

Have you thought of hosting the OCR portion of your app out-of-process.

Having a service can give you tons of flexibility:

  1. You can define a simple end point for your web application, and access it via remoting or WCF.
  2. If stuff is pear shape and the library is all dodge, you can have the service launch a separate process every time you need to perform OCR. This gives you extreme safety, but involves a small extra expense. I would assume that OCR is MUCH more expensive than spinning up a process.
  3. You can keep an instance around of the COM object, if memory starts leaking you can restart yourself without impacting the web site (if you are careful).

Personally I have found in the past the COM interop + IIS = grief.

Sam Saffron
Hi Sam, yes, this is something I tried just this week. I put the OCR stuff into a separately hosted Windows Service using WCF (and NetTCPBinding). I still had very similar symptoms to when I was running it under IIS. Based on a hint from silky, I tried a long running console app (as opposed to the short running version I had written), and managed to replicate the issue within 10 minutes.However, regardless of what the problem turns out to be, I will +1 your answer, because it makes much more sense to do this stuff out of process, for exactly the reasons you've outlined. Thank you.
Scott Ferguson
Oh, in addition, I didn't think of spinning up a new process, that's an excellent idea too. In fact, that might just be a good work-around, because I can catch the COM Interupt exception, and spin up a new process... cool, I'm excited now. I'll try it out, and report back. thanks again.
Scott Ferguson
This worked a treat. I wrote a windows service, hosting a WCF service that spins up a a new process wrapping the OCR stuff for every request. Yep, that sounds expensive, but as you pointed out, it's insignificant next to the cost of doing the actual OCR itself. Closing the process correctly cleans up the 'wonky' MODI interop stuff, and everything works just like it should.Thanks Sam and everyone involved. Much appreciated.
Scott Ferguson