views:

39

answers:

1

Intro:

I have a bottleneck in my C# application where I need to load a page as a bitmap from a PDF or Tiff file and process this bitmap while in memory. Tiff files load fairly fast, as well as first-party PDFs (we can read our own). The bottleneck comes in when the PDF file is third-party and we need to parse the PDF page and turn it into bitmap. This is costly, 500 times slower than first-party PDFs to get an idea. Some of these PDF files get very large, so we avoid loading the whole document into memory first.

Hypothesis:

The work being done on the page is done in a seperate process (magically) while my application waits for it to be done. It is because of this I believe if I load a small buffer (say 5 pages at a time) Asynchronously it will speed up the execution of these third-party PDF files.

Psuedo (C#-ish):

IntPtr[] dibbuffer = new IntPtr[5];
dibbuffer[0] = LoadPage(0); //pre-emptive first page
BeginAsyncFillBuffer(dibbuffer);

for (i=0; i<NUM_PAGES; ++i)
{
    IntenseProcessing(dibbuffer[current_page_index_in_buffer]);
}

EndAsyncFillBuffer();

Problems:

  • Will this really speed up the application? (some of the machines it will be running on are single core)
  • Is this worth the hastle of trying to synchronize and sort the buffer on
    the processing thread?
  • Any tips for synchronizing the process are welcome. I am using C# so any .Net conventions or data-structures can be used.
  • Adendum: I would like it to be as lazy as possible (only load next page when there is room free in the buffer
A: 

This is what I ended up with. I wish instead of polling every X milliseconds it was more "lazy" and only fills the buffer on the seperate thread when needed. If anyone can refine this please do.

class MyGhettoBuffer
{
    Target _target = null; //contains info on the file @ hand
    Queue _q = null;
    Queue _synchQ = null;
    Thread _loop = null;
    ManualResetEvent _throttle = new ManualResetEvent(false);
    int _curpage = 0;

    private MyGhettoBuffer() { }
    public MyGhettoBuffer(Target target)
    {
        _target = target;
        _q = new Queue();
        _synchQ = Queue.Synchronized(_q);
        _loop = new Thread(MainLoop);
        _loop.Start();
    }

    public bool HasPagesLeft //determine when to stop processing queue
    {
        get
        {
            if (_curpage >= _target.NumPages &&
                _synchQ.Count == 0)
                return false;
            else
                return true;
        }
    }
    //if the buffer hasnt caught up load the page on the processing thread
    public IntPtr GetNextPage()
    {
        lock (this)
        {
            if (_synchQ.Count == 0) 
            {
                IntPtr dib =
                    LoadDib(_target.FullPath, _curpage);
                _curpage++;
                return dib;
            }
            else
            {
                object o = _synchQ.Dequeue();
                if (o is IntPtr)
                {
                    return (IntPtr)o;
                }
                else
                {
                    throw new InvalidCastException("Object in page queue is not an IntPtr");
                }
            }
        }
    }

    private void MainLoop()
    {
        while (true)
        {
            if (_curpage < _target.NumPages)
            {
                if (_synchQ.Count < 5)
                {
                    lock (this)
                    {
                        IntPtr dib =
                            LoadDib(_target.FullPath, _curpage);
                        _synchQ.Enqueue(dib);
                        _curpage++;
                    }
                }
            }
            else
            {
                return;
            }
            _throttle.WaitOne(100, false); //dont use a %@#! ton of cpu cycles
        }
    }
}

then, in my processing thread I do something like this:

MyGhettoBuffer buffer = new MyGhettoBuffer(target);
while (buffer.HasPagesLeft)
{
    IntPtr dib = GetNextPage();
    //Process the dib here
    FreeDib(dib);
}
Tom Fobear