I've written an OCR wrapper library around the Microsoft Office Document Imaging COM API, and in a Console App running locally, it works flawlessly, with every test.
Sadly, things start going badly when we attempt to integrate it with a WCF service running as an ASP.Net Web Application, under IIS6. We had issues around trying to free up the MODI COM Objects, and there were plenty of examples on the web that helped us.
However, problems still remain. If I restart IIS, and do a fresh deployment of the web app, the first few OCR attempts work great. If I leave it for 30 minutes or so, and then do another request, I get server failure errors like this:
The server threw an exception. (Exception from HRESULT: 0x80010105 (RPC_E_SERVERFAULT)): at MODI.DocumentClass.Create(String FileOpen)
From this point on, every request will fail to do the OCR, until I reset IIS, and the cycle begins again.
We run this application in it's own App Pool, and it runs under an identity with Local Admin rights.
UPDATE: This issue can be solved by doing the OCR stuff out of process. It appears as though the MODI library doesn't play well with managed code, when it comes to cleaning up after itself, so spawning new processes for each OCR request worked well in my situation.
Here is the function that performs the OCR:
public class ImageReader : IDisposable
{
private MODI.Document _document;
private MODI.Images _images;
private MODI.Image _image;
private MODI.Layout _layout;
private ManualResetEvent _completedOCR = new ManualResetEvent(false);
// SNIP - Code removed for clarity
private string PerformMODI(string fileName)
{
_document = new MODI.Document();
_document.OnOCRProgress += new MODI._IDocumentEvents_OnOCRProgressEventHandler(_document_OnOCRProgress);
_document.Create(fileName);
_document.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
_completedOCR.WaitOne(5000);
_document.Save();
_images = _document.Images;
_image = (MODI.Image)_images[0];
_layout = _image.Layout;
string text = _layout.Text;
_document.Close(false);
return text;
}
void _document_OnOCRProgress(int Progress, ref bool Cancel)
{
if (Progress == 100)
{
_completedOCR.Set();
}
}
private static void SetComObjectToNull(params object[] objects)
{
for (int i = 0; i < objects.Length; i++)
{
object o = objects[i];
if (o != null)
{
Marshal.FinalReleaseComObject(o);
o = null;
}
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
public void Dispose()
{
SetComObjectToNull(_layout, _image, _images, _document);
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
I then instantiate an instance of ImageReader inside a using block (which will call IDisposable.Dispose on exit)
Calling Marshal.FinalReleaseComObject should instruct the CLR to release the COM objects, and so I'm at a loss to figure out what would be causing the symptoms we have.
For what it's worth, running this code outside of IIS, in say a Console App, everything seems bullet proof. It works every time.
Any tips that help me diagnose and solve this issue would be an immense help and I'll upvote like crazy! ;-)
Thanks!